Artificial Intelligence Notes
Artificial Intelligence Notes
defines “man-made,” and intelligence defines “thinking power”, hence AI means “a man-
made thinking power.”
❖ Definition of AI
It is a branch of computer science by which we can create intelligent machines which can
behave like a human, think like humans, and able to make decisions.”
Artificial Intelligence exists when a machine can have human based skills such as learning,
reasoning, and solving problems.
With Artificial Intelligence you do not need to pre-program a machine to do some work, despite
that you can create a machine with programmed algorithms which can work with own
intelligence, and that is the awesomeness of AI.
It is believed that AI is not a new technology, and some people says that as per Greek myth,
there were Mechanical men in early days which can work and behave like humans.
❖ Applications of AI
Artificial Intelligence has various applications in today's society. It is becoming essential for
today's time because it can solve complex problems with an efficient way in multiple industries,
such as Healthcare, entertainment, finance, education, etc. AI is making our daily life more
comfortable and faster.
Following are some sectors which have the application of Artificial Intelligence:
1. AI in Astronomy
Artificial Intelligence can be very useful to solve complex universe problems. AI technology
can be helpful for understanding the universe such as how it works, origin, etc.
2. AI in Healthcare
In the last, five to ten years, AI becoming more advantageous for the healthcare industry and
going to have a significant impact on this industry.
Healthcare Industries are applying AI to make a better and faster diagnosis than humans. AI
can help doctors with diagnoses and can inform when patients are worsening so that medical
help can reach to the patient before hospitalization.
3. AI in Gaming
AI can be used for gaming purpose. The AI machines can play strategic games like chess,
where the machine needs to think of a large number of possible places.
4. AI in Finance
AI and finance industries are the best matches for each other. The finance industry is
implementing automation, chatbot, adaptive intelligence, algorithm trading, and machine
learning into financial processes.
5. AI in Data Security
The security of data is crucial for every company and cyber-attacks are growing very rapidly
in the digital world. AI can be used to make your data more safe and secure. Some examples
such as AEG bot, AI2 platform are used to determine software bug and cyber-attacks in a better
way.
6. AI in Social Media
Social Media sites such as Facebook, Twitter, and Snapchat contain billions of user profiles,
which need to be stored and managed in a very efficient way. AI can organize and manage
massive amounts of data. AI can analyze lots of data to identify the latest trends, hashtag, and
requirement of different users.
7. AI in Travel & Transport
AI is becoming highly demanding for travel industries. AI is capable of doing various travel
related works such as from making travel arrangement to suggesting the hotels, flights, and
best routes to the customers. Travel industries are using AI-powered chatbots which can make
human-like interaction with customers for better and fast response.
8. AI in Automotive Industry
Some Automotive industries are using AI to provide virtual assistant to their user for better
performance. Such as Tesla has introduced TeslaBot, an intelligent virtual assistant.
Various Industries are currently working for developing self-driven cars which can make your
journey more safe and secure.
9. AI in Robotics:
Artificial Intelligence has a remarkable role in Robotics. Usually, general robots are
programmed such that they can perform some repetitive task, but with the help of AI, we can
create intelligent robots which can perform tasks with their own experiences without pre-
programmed.
Humanoid Robots are best examples for AI in robotics, recently the intelligent Humanoid robot
named as Erica and Sophia has been developed which can talk and behave like humans.
10. AI in Entertainment
We are currently using some AI based applications in our daily life with some entertainment
services such as Netflix or Amazon. With the help of ML/AI algorithms, these services show
the recommendations for programs or shows.
11. AI in Agriculture
Agriculture is an area which requires various resources, labor, money, and time for best result.
Now a day's agriculture is becoming digital, and AI is emerging in this field. Agriculture is
applying AI as agriculture robotics, solid and crop monitoring, predictive analysis. AI in
agriculture can be very helpful for farmers.
12. AI in E-commerce
AI is providing a competitive edge to the e-commerce industry, and it is becoming more
demanding in the e-commerce business. AI is helping shoppers to discover associated products
with recommended size, color, or even brand.
13. AI in education:
AI can automate grading so that the tutor can have more time to teach. AI chatbot can
communicate with students as a teaching assistant.
AI in the future can be work as a personal virtual tutor for students, which will be accessible
easily at any time and any place.
❖ A boom of AI (1980-1987)
• Year 1980: After AI winter duration, AI came back with "Expert System". Expert
systems were programmed that emulate the decision-making ability of a human expert.
• In the Year 1980, the first national conference of the American Association of Artificial
Intelligence was held at Stanford University.
❖ Structure of an AI Agent
To understand the structure of Intelligent Agents, we should be familiar with Architecture and
Agent programs. Architecture is the machinery that the agent executes on. It is a device with
sensors and actuators, for example, a robotic car, a camera, and a PC. An agent program is an
implementation of an agent function. An agent function is a map from the percept sequence
(history of all that an agent has perceived to date) to an action.
Agent = Architecture + Agent Program
There are many examples of agents in artificial intelligence. Here are a few:
• Intelligent personal assistants: These are agents that are designed to help users with
various tasks, such as scheduling appointments, sending messages, and setting
reminders. Examples of intelligent personal assistants include Siri, Alexa, and Google
Assistant.
• Autonomous robots: These are agents that are designed to operate autonomously in the
physical world. They can perform tasks such as cleaning, sorting, and delivering goods.
Examples of autonomous robots include the Roomba vacuum cleaner and the Amazon
delivery robot.
• Gaming agents: These are agents that are designed to play games, either against human
opponents or other agents. Examples of gaming agents include chess-playing agents
and poker-playing agents.
• Fraud detection agents: These are agents that are designed to detect fraudulent
behaviour in financial transactions. They can analyse patterns of behaviour to identify
suspicious activity and alert authorities. Examples of fraud detection agents include
those used by banks and credit card companies.
• Traffic management agents: These are agents that are designed to manage traffic flow
in cities. They can monitor traffic patterns, adjust traffic lights, and reroute vehicles to
minimize congestion. Examples of traffic management agents include those used in
smart cities around the world.
• A software agent has Keystrokes, file contents, received network packages that act as
sensors and displays on the screen, files, and sent network packets acting as actuators.
• A Human-agent has eyes, ears, and other organs which act as sensors, and hands, legs,
mouth, and other body parts act as actuators.
• A Robotic agent has Cameras and infrared range finders which act as sensors and
various motors act as actuators.
Utility-Based Agents
The agents which are developed having their end uses as building blocks are called utility-
based agents. When there are multiple possible alternatives, then to decide which one is best,
utility-based agents are used. They choose actions based on a preference (utility) for each state.
Sometimes achieving the desired goal is not enough. We may look for a quicker, safer, cheaper
trip to reach a destination. Agent happiness should be taken into consideration. Utility describes
how “happy” the agent is. Because of the uncertainty in the world, a utility agent chooses the
action that maximizes the expected utility. A utility function maps a state onto a real number
which describes the associated degree of happiness.
Learning Agent
A learning agent in AI is the type of agent that can learn from its past experiences or it has
learning capabilities. It starts to act with basic knowledge and then is able to act and adapt
automatically through learning. A learning agent has mainly four conceptual components,
which are:
1. Learning element: It is responsible for making improvements by learning from the
environment.
2. Critic: The learning element takes feedback from critics which describes how well the
agent is doing with respect to a fixed performance standard.
3. Performance element: It is responsible for selecting external action.
4. Problem Generator: This component is responsible for suggesting actions that will lead
to new and informative experiences.
Multi-Agent Systems
These agents interact with other agents to achieve a common goal. They may have to coordinate
their actions and communicate with each other to achieve their objective.
A multi-agent system (MAS) is a system composed of multiple interacting agents that are
designed to work together to achieve a common goal. These agents may be autonomous or
semi-autonomous and are capable of perceiving their environment, making decisions, and
taking action to achieve the common objective.
MAS can be used in a variety of applications, including transportation systems, robotics, and
social networks. They can help improve efficiency, reduce costs, and increase flexibility in
complex systems. MAS can be classified into different types based on their characteristics,
such as whether the agents have the same or different goals, whether the agents are cooperative
or competitive, and whether the agents are homogeneous or heterogeneous.
In a homogeneous MAS, all the agents have the same capabilities, goals, and behaviours.
In contrast, in a heterogeneous MAS, the agents have different capabilities, goals, and
behaviours.
This can make coordination more challenging but can also lead to more flexible and robust
systems.
Cooperative MAS involves agents working together to achieve a common goal, while
competitive MAS involves agents working against each other to achieve their own goals. In
some cases, MAS can also involve both cooperative and competitive behaviour, where agents
must balance their own interests with the interests of the group.
MAS can be implemented using different techniques, such as game theory, machine learning,
and agent-based modelling. Game theory is used to analyse strategic interactions between
agents and predict their behaviour. Machine learning is used to train agents to improve their
decision-making capabilities over time. Agent-based modelling is used to simulate complex
systems and study the interactions between agents.
Overall, multi-agent systems are a powerful tool in artificial intelligence that can help solve
complex problems and improve efficiency in a variety of applications.
❖ Well-defined Problems
A problem can be defined formally by four components:
▪ The that the agent starts in.
▪ A description of the possible available to the agent.
– Commonly done using a Successor function. Given a particular state x,
SUCCESSOR-FN(x) returns a set of (action_successor) ordered pairs.
– The initial state and success or function implicitly define the state space of the
problem - the set of all states reachable from the initial state.
▪ The determines whether a given state is a goal state.
– Explicit set of possible goal states or specified by an abstract property
▪ A function that assigns a numeric cost to each path.
– Agent chooses a cost function that reflect sits own performance measure
– Agent attempts to minimize the cost function
❖ Problem Formulation
A to a problem is a path from the initial state to a goal state.
• Solution quality is measured by the path cost.
• An has the lowest path cost among all solutions.
• Abstraction: process of removing detail from representation.
Initial State
– Description of all pertinent aspects of the state in which the agent starts the search.
Goal Test
– Conditions the agent is trying to meet.
Goal State
– Any state which meets the goal condition.
Problem Formulation
– Describe a general problem as a search problem.
Solution
– Sequence of actions that transitions the world from the initial state to a goal state.
Search
– Process of looking for a solution.
– Search algorithm takes problem as input and returns solution
– We are searching through a space of possible states
Execution
– Process of executing sequence of actions (solution)
❖ Examples
❖ Searching for Solutions
The problem thus defined can be solved by searching the state space.
• Search for goal states in a search tree generated from the initial state using the successor
function.
• Search strategy: Choice (!) of which action to be taken up in order to continue search
for the goal state
– Or, which node to expand next in the collection of nodes that have been
generated (fringe)
• Information in search node: state; parent node; action; path-cost; depth.
• Branching factor: Maximum number of successor nodes on any node.
• Effectiveness of search can be determined by:
– Search cost: Time taken to reach the goal state
– Total cost
S---> A--->B---->C--->D---->G--->H--->E---->F---->I---->K
Time Complexity: Time Complexity of BFS algorithm can be obtained by the number of
nodes traversed in BFS until the shallowest Node. Where the d= depth of shallowest solution
and b is a node at every state.
T (b) = 1 + b2 + b3 + ....... + bd= O (bd)
Space Complexity: Space complexity of BFS algorithm is given by the Memory size of
frontier which is O(bd).
Completeness: BFS is complete, which means if the shallowest goal node is at some finite
depth, then BFS will find a solution.
Optimality: BFS is optimal if path cost is a non-decreasing function of the depth of the node.
More Examples:
Problem 1 Solution
Path Traversed = [A – B – C- D – E – F- G – H]
Problem 2 Solution
Path Traversed = [S - A – B – C- D – G – H - E – F-
I-K]
2. Depth-first Search
• Depth-first search is a recursive algorithm for traversing a tree or graph data structure.
• It is called the depth-first search because it starts from the root node and follows each
path to its greatest depth node before moving to the next path.
• DFS uses a stack data structure for its implementation.
• The process of the DFS algorithm is similar to the BFS algorithm.
Advantage:
• DFS requires very less memory as it only needs to store a stack of the nodes on the path
from root node to the current node.
• It takes less time to reach to the goal node than BFS algorithm (if it traverses in the
right path).
Disadvantage:
• There is the possibility that many states keep re-occurring, and there is no guarantee of
finding the solution.
• DFS algorithm goes for deep down searching and sometime it may go to the infinite
loop.
Example:
In the below search tree, we have shown the flow of depth-first search, and it will follow the
order as:
Root node--->Left node ----> right node.
It will start searching from root node S, and traverse A, then B, then D and E, after traversing
E, it will backtrack the tree as E has no other successor and still goal node is not found. After
backtracking it will traverse node C and then G, and here it will terminate as it found goal node.
Completeness: DFS search algorithm is complete within finite state space as it will expand
every node within a limited search tree.
Time Complexity: Time complexity of DFS will be equivalent to the node traversed by the
algorithm. It is given by:
T(n)= 1+ n2+ n3 +.........+ nm=O(nm)
Where, m= maximum depth of any node and this can be much larger than d (Shallowest
solution depth)
Space Complexity: DFS algorithm needs to store only single path from the root node, hence
space complexity of DFS is equivalent to the size of the fringe set, which is O(bm).
Optimal: DFS search algorithm is non-optimal, as it may generate a large number of steps or
high cost to reach to the goal node.
Example:
Problem 1: Solution
Path Traversed = [A - B – D – G - E – C – F - H]
Problem 2: Solution
Path Traversed = [A - B – D – G - E – C – F - H]
3. Depth-Limited Search Algorithm:
A depth-limited search algorithm is similar to depth-first search with a predetermined limit.
Depth-limited search can solve the drawback of the infinite path in the Depth-first search. In
this algorithm, the node at the depth limit will treat as it has no successor nodes further.
Depth-limited search can be terminated with two Conditions of failure:
• Standard failure value: It indicates that problem does not have any solution.
• Cutoff failure value: It defines no solution for the problem within a given depth limit.
Advantages:
Depth-limited search is Memory efficient.
Disadvantages:
o Depth-limited search also has a disadvantage of incompleteness.
o It may not be optimal if the problem has more than one solution.
Example:
Completeness: DLS search algorithm is complete if the solution is above the depth-limit.
Time Complexity: Time complexity of DLS algorithm is O(bℓ).
Space Complexity: Space complexity of DLS algorithm is O(b×ℓ).
Optimal: Depth-limited search can be viewed as a special case of DFS, and it is also not
optimal even if ℓ>d.
Problem Solution
Time Complexity:
Let C* is Cost of the optimal solution, and ε is each step to get closer to the goal node. Then
the number of steps is = C*/ε+1. Here we have taken +1, as we start from state 0 and end to
C*/ε.
Hence, the worst-case time complexity of Uniform-cost search is O(b1 + [C*/ε]).
Space Complexity:
The same logic is for space complexity so, the worst-case space complexity of Uniform-cost
search is O(b1 + [C*/ε]).
Optimal:
Uniform-cost search is always optimal as it only selects a path with the lowest path cost.
Problem 1 Solution
Advantages:
• It combines the benefits of BFS and DFS search algorithm in terms of fast search and
memory efficiency.
Disadvantages:
• The main drawback of IDDFS is that it repeats all the work of the previous phase.
Example:
Following tree structure is showing the iterative deepening depth-first search. IDDFS algorithm
performs various iterations until it does not find the goal node. The iteration performed by the
algorithm is given as:
1'st Iteration-----> A
2'nd Iteration----> A, B, C
3'rd Iteration------>A, B, D, E, C, F, G
4'th Iteration------>A, B, D, H, I, E, C, F, K, G
In the fourth iteration, the algorithm will find the goal node.
Completeness:
This algorithm is complete is ifthe branching factor is finite.
Time Complexity:
Let's suppose b is the branching factor and depth is d then the worst-case time complexity is
O(bd).
Space Complexity:
The space complexity of IDDFS will be O(bd).
Optimal:
IDDFS algorithm is optimal if path cost is a non- decreasing function of the depth of the node.
Problem Solution
2nd Iteration, d = 1, [S – A - C]
3rd Iteration, d = 2, [S – A – D – B – C – E - G]
Usi
Performs a depth-limited search by gradually increasing the depth_limit until the goal at the
shallowest depth is found.
1. Perform Depth limited search for depth_limit = 0
2. If solution is found exit, else go to 3
3. Increase depth_limit by 1
4. Go back to 1
Problem 1 Solution
Problem 2
Solution
Path Traversed = [A – B – D – C – E – F – H – J – L – I – Y – K - G]
Heuristics function: Heuristic is a function which is used in Informed Search, and it finds the
most promising path. It takes the current state of the agent as its input and produces the
estimation of how close agent is from the goal. The heuristic method, however, might not
always give the best solution, but it guaranteed to find a good solution in reasonable time.
Heuristic function estimates how close a state is to the goal. It is represented by h(n), and it
calculates the cost of an optimal path between the pair of states. The value of the heuristic
function is always positive.
Admissibility of the heuristic function is given as:
h(n) <= h*(n)
Here h(n) is heuristic cost, and h*(n) is the estimated cost. Hence heuristic cost should be less
than or equal to the estimated cost.
In this search example, we are using two lists which are OPEN and CLOSED Lists. Following
are the iteration for traversing the above example.
Problem 1 Solution
Problem 2 Solution
Path Traversed =
[S – B - F - G]
2. A* Search Algorithm:
A* search is the most commonly known form of best-first search. It uses heuristic function
h(n), and cost to reach the node n from the start state g(n). It has combined features of UCS
and greedy best-first search, by which it solve the problem efficiently. A* search algorithm
finds the shortest path through the search space using the heuristic function. This search
algorithm expands less search tree and provides optimal result faster. A* algorithm is similar
to UCS except that it uses g(n)+h(n) instead of g(n).
In A* search algorithm, we use search heuristic as well as the cost to reach the node. Hence,
we can combine both costs as following, and this sum is called as a fitness number.
Algorithm of A* search:
Step1: Place the starting node in the OPEN list.
Step 2: Check if the OPEN list is empty or not, if the list is empty then return failure and stops.
Step 3: Select the node from the OPEN list which has the smallest value of evaluation function
(g+h), if node n is goal node then return success and stop, otherwise
Step 4: Expand node n and generate all of its successors, and put n into the closed list. For
each successor n', check whether n' is already in the OPEN or CLOSED list, if not then
compute evaluation function for n' and place into Open list.
Step 5: Else if node n' is already in OPEN and CLOSED, then it should be attached to the back
pointer which reflects the lowest g(n') value.
Step 6: Return to Step 2.
Advantages:
• A* search algorithm is the best algorithm than other search algorithms.
• A* search algorithm is optimal and complete.
• This algorithm can solve very complex problems.
Disadvantages:
• It does not always produce the shortest path as it mostly based on heuristics and
approximation.
• A* search algorithm has some complexity issues.
• The main drawback of A* is memory requirement as it keeps all generated nodes in the
memory, so it is not practical for various large-scale problems.
Example:
In this example, we will traverse the given graph using the A* algorithm. The heuristic value
of all states is given in the below table so we will calculate the f(n) of each state using the
formula f(n)= g(n) + h(n), where g(n) is the cost to reach any node from start state.
Here we will use OPEN and CLOSED list.
Solution:
Problem 1 Solution
Problem Solution
Tree
Bound 1: 16 [S – B – C- G]
Bound 2: 13 [S – B – D - G]
Optimal path
❖ Beam Search
It is a heuristic search algorithm that explores a graph by expanding the most promising
node in a limited set. Beam search is an optimization of best-first search that reduces its
memory requirements. Best-first search is a graph search which orders all partial solutions
(states) according to some heuristic. But in beam search, only a predetermined number of
best partial solutions are kept as candidates. It is thus a greedy algorithm.
Beam search uses breadth-first search to build its search tree. At each level of the tree, it
generates all successors of the states at the current level, sorting them in increasing order
of heuristic cost. However, it only stores a predetermined number, 𝛽 , of best states at each
level (called the beam width). Only those states are expanded next. The greater the beam
width, the fewer states are pruned. With an infinite beam width, no states are pruned and
beam search is identical to breadth-first search. The beam width bounds the memory
required to perform the search. Since a goal state could potentially be pruned, beam search
sacrifices completeness (the guarantee that an algorithm will terminate with a solution, if
one exists). Beam search is not optimal (that is, there is no guarantee that it will find the
best solution).
Example:
2. Plateau: A plateau is the flat area of the search space in which all the neighbor states of the
current state contains the same value, because of this algorithm does not find any best direction
to move. A hill-climbing search might be lost in the plateau area.
Solution: The solution for the plateau is to take big steps or very little steps while searching,
to solve the problem. Randomly select a state which is far away from the current state so it is
possible that the algorithm could find non-plateau region.
3. Ridges: A ridge is a special form of the local maximum. It has an area which is higher than
its surrounding areas, but itself has a slope, and cannot be reached in a single move.
Solution: With the use of bidirectional search, or by moving in different directions, we can
improve this problem.
In the above figure, the buying of a car may be broken down into smaller problems or tasks
that can be accomplished to achieve the main goal in the above figure, which is an example
of a simple AND-OR graph. The other task is to either steal a car that will help us accomplish
the main goal or use your own money to purchase a car that will accomplish the main goal.
The AND symbol is used to indicate the AND part of the graphs, which refers to the need
that all subproblems containing the AND to be resolved before the preceding node or issue
may be finished.
The start state and the target state are already known in the knowledge-
based search strategy known as the AO* algorithm, and the best path is identified by
heuristics. The informed search technique considerably reduces the algorithm’s time
complexity. The AO* algorithm is far more effective in searching AND-OR trees than the
A* algorithm.
Working of AO* algorithm:
The evaluation function in AO* looks like this:
f(n) = g(n) + h(n)
f(n) = Actual cost + Estimated cost
here,
f(n) = The actual cost of traversal.
g(n) = the cost from the initial node to the current node.
h(n) = estimated cost from the current node to the goal state.
❖ Simulated Annealing:
A hill-climbing algorithm which never makes a move towards a lower value guaranteed to be
incomplete because it can get stuck on a local maximum. And if algorithm applies a random
walk, by moving a successor, then it may complete but not efficient. Simulated Annealing is
an algorithm which yields both efficiency and completeness.
In mechanical term Annealing is a process of hardening a metal or glass to a high temperature
then cooling gradually, so this allows the metal to reach a low-energy crystalline state. The
same process is used in simulated annealing in which the algorithm picks a random move,
instead of picking the best move. Simulated annealing can be used to find solutions to
optimization problems by slowly changing the values of the variables in the problem until a
solution is found. If the random move improves the state, then it follows the same path.
Otherwise, the algorithm follows the path which has a probability of less than 1 or it moves
downhill and chooses another path.
The advantage of simulated annealing over other optimization methods is that it is less likely
to get stuck in a local minimum, where the solution is not the best possible but is good enough.
This is because simulated annealing allows for small changes to be made to the solution, which
means that it can escape from local minima and find the global optimum.
Simulated annealing is not a guaranteed method of finding the best solution to an optimization
problem, but it is a powerful tool that can be used to find good solutions in many cases.
Benefits of simulated annealing
Simulated annealing is a powerful tool for solving optimization problems. It is especially well-
suited for problems that are difficult to solve using traditional methods, such as those with
many local optima.
Simulated annealing works by starting with a random solution and then slowly improving it
over time. The key is to not get stuck in a local optimum, which can happen if the search moves
too slowly.
The benefits of using simulated annealing include:
1. The ability to find global optima.
2. The ability to escape from local optima.
3. The ability to handle constraints.
4. The ability to handle noisy data.
5. The ability to handle discontinuities.
6. The ability to find solutions in a fraction of the time required by other methods.
7. The ability to find solutions to problems that are difficult or impossible to solve using other
methods.
❖ Min-Max Algorithm
• Mini-max algorithm is a recursive or backtracking algorithm which is used in decision-
making and game theory. It provides an optimal move for the player assuming that
opponent is also playing optimally.
• Mini-Max algorithm uses recursion to search through the game-tree.
• Min-Max algorithm is mostly used for game playing in AI. Such as Chess, Checkers,
tic-tac-toe, go, and various tow-players game. This Algorithm computes the minimax
decision for the current state.
• In this algorithm two players play the game, one is called MAX and other is called
MIN.
• Both the players fight it as the opponent player gets the minimum benefit while they
get the maximum benefit.
• Both Players of the game are opponent of each other, where MAX will select the
maximized value and MIN will select the minimized value.
• The minimax algorithm performs a depth-first search algorithm for the exploration of
the complete game tree.
• The minimax algorithm proceeds all the way down to the terminal node of the tree, then
backtrack the tree as the recursion.
❖ Alpha-beta pruning
✓ Knowledge-Based Systems
• A knowledge-based system is a system that uses artificial intelligence techniques to store
and reason with knowledge. The knowledge is typically represented in the form of rules
or facts, which can be used to draw conclusions or make decisions.
• One of the key benefits of a knowledge-based system is that it can help to automate
decision-making processes. For example, a knowledge-based system could be used to
diagnose a medical condition, by reasoning over a set of rules that describe the symptoms
and possible causes of the condition.
• Another benefit of knowledge-based systems is that they can be used to explain their
decisions to humans. This can be useful, for example, in a customer service setting, where
a knowledge-based system can help a human agent understand why a particular decision
was made.
• Knowledge-based systems are a type of artificial intelligence and have been used in a
variety of applications including medical diagnosis, expert systems, and decision support
systems.
• A knowledge base inference is required for updating knowledge for an agent to learn
with experiences and take action as per the knowledge.
• Inference means deriving new sentences from old. The inference-based system allows us
to add a new sentence to the knowledge base. A sentence is a proposition about the world.
The inference system applies logical rules to the KB to deduce new information.
• The inference system generates new facts so that an agent can update the KB. An
inference system works mainly in two rules which are given:
• Forward chaining
• Backward chaining
2. Logical level
At this level, we understand that how the knowledge representation of knowledge is stored.
At this level, sentences are encoded into different logics. At the logical level, an encoding of
knowledge into logical sentences occurs. At the logical level we can expect to the automated
taxi agent to reach to the destination B.
3. Implementation level
This is the physical representation of logic and knowledge. At the implementation level agent
perform actions as per logical and knowledge level. At this level, an automated taxi agent
actually implement his knowledge and logic so that he can reach to the destination.
Knowledge-based agents have explicit representation of knowledge that can be reasoned.
They maintain internal state of knowledge, reason over it, update it and perform actions
accordingly. These agents act intelligently according to requirements.
Knowledge based agents give the current situation in the form of sentences. They have
complete knowledge of current situation of mini-world and its surroundings. These agents
manipulate knowledge to infer new things at “Knowledge level”.
✓ Knowledge-based system has following features
• Knowledge base (KB): It is the key component of a knowledge-based agent. These
deal with real facts of world. It is a mixture of sentences which are explained in
knowledge representation language.
• Inference Engine (IE): It is knowledge-based system engine used to infer new
knowledge in the system.
If a percept is given, agent adds it to KB, then it will ask KB for the best action and then tells
KB that it has in fact taken that action.
❖ Propositional Logic
What is Logic?
Logic is the basis of all mathematical reasoning, and of all automated reasoning. The rules
of logic specify the meaning of mathematical statements. These rules help us understand and
reason with statements such as –
Which in Simple English means “There exists an integer that is not the sum of two squares”.
Importance of Mathematical Logic The rules of logic give precise meaning to mathematical
statements. These rules are used to distinguish between valid and invalid mathematical
arguments. Apart from its importance in understanding mathematical reasoning, logic has
numerous applications in Computer Science, varying from design of digital circuits, to the
construction of computer programs and verification of correctness of programs.
The above sentences are not propositions as the first two do not have a truth value, and the
third one may be true or false. To represent propositions, propositional variables are used.
By Convention, these variables are represented by small alphabets such as 𝑝, 𝑞, 𝑟, 𝑠. The area
of logic which deals with propositions is called propositional calculus or propositional
logic. It also includes producing new propositions using existing ones. Propositions
constructed using one or more propositions are called compound propositions. The
propositions are combined together using Logical Connectives or Logical Operators.
Truth Table
The truth value of a proposition in all possible scenarios, we consider all the possible
combinations of the propositions which are joined together by Logical Connectives to form the
given compound proposition. This compilation of all possible scenarios in a tabular format is
called a truth table. Most Common Logical Connectives-
• Negation: If p is a proposition, then the negation of p is denoted by ¬𝒑, which when
translated to simple English means - “It is not the case that p” or simply “not p”. The
truth value of ¬𝒑 is the opposite of the truth value of p. The truth table of ¬𝒑 is-
Example, the negation of “It is raining today”, is “It is not the case that is raining today” or
simply “It is not raining today”.
Example, The conjunction of the propositions p – “Today is Friday” and q – “It is raining
today”, p\wedge q is “Today is Friday and it is raining today”. This proposition is true only
on rainy Fridays and is false on any other rainy day or on Fridays when it does not rain.
Hence from the above truth table, we can prove that P → Q is equivalent to ¬ Q → ¬ P, and
Q→ P is equivalent to ¬ P → ¬ Q.
Example:
Statement-1: "If I am sleepy then I go to bed" ==> P→ Q
Statement-2: "I am sleepy" ==> P
Conclusion: "I go to bed." ==> Q.
Hence, we can say that, if P→ Q is true and P is true then Q will be true.
Proof by Truth table:
2. Modus Tollens:
The Modus Tollens rule state that if P→ Q is true and ¬ Q is true, then ¬ P will also true. It
can be represented as:
Statement-1: "If I am sleepy then I go to bed" ==> P→ Q
Statement-2: "I do not go to the bed."==> ~Q
Statement-3: Which infers that "I am not sleepy" => ~P
Proof by Truth table:
3. Hypothetical Syllogism:
The Hypothetical Syllogism rule state that if P→R is true whenever P→Q is true, and Q→R is
true. It can be represented as the following notation:
Example:
Statement-1: If you have my home key then you can unlock my home. P→Q
Statement-2: If you can unlock my home then you can take my money. Q→R
Conclusion: If you have my home key then you can take my money. P→R
4. Disjunctive Syllogism:
The Disjunctive syllogism rule state that if P∨Q is true, and ¬P is true, then Q will be true. It
can be represented as:
Example:
Statement-1: Today is Sunday or Monday. ==>P∨Q
Statement-2: Today is not Sunday. ==> ¬P
Conclusion: Today is Monday. ==> Q
Proof by truth-table:
5. Addition:
The Addition rule is one the common inference rule, and it states that If P is true, then P∨Q
will be true.
Example:
Statement: I have a vanilla ice-cream. ==> P
Statement-2: I have Chocolate ice-cream.
Conclusion: I have vanilla or chocolate ice-cream. ==> (P∨Q)
Proof by Truth-Table:
6. Simplification:
The simplification rule state that if P∧ Q is true, then Q or P will also be true. It can be
represented as:
Proof by Truth-Table:
7. Resolution:
The Resolution rule state that if P∨Q and ¬ P∧R is true, then Q∨R will also be true. It can be
represented as
Proof by Truth-Table:
To represent the above statements, PL logic is not sufficient, so we required some more
powerful logic, such as first-order logic.
First-Order logic:
Atomic sentences:
• Atomic sentences are the most basic sentences of first-order logic. These sentences are
formed from a predicate symbol followed by a parenthesis with a sequence of terms.
• We can represent atomic sentences as Predicate (term1, term2, ......, term n).
Complex Sentences:
• Complex sentences are made by combining atomic sentences using connectives.
First-order logic statements can be divided into two parts:
• Subject: Subject is the main part of the statement.
• Predicate: A predicate can be defined as a relation, which binds two atoms together in
a statement.
Consider the statement: “x is an integer.”, it consists of two parts, the first part x is the
subject of the statement and second part "is an integer," is known as a predicate.
Universal Quantifier:
Universal quantifier is a symbol of logical representation, which specifies that the statement
within its range is true for everything or every instance of a particular thing.
The Universal quantifier is represented by a symbol ∀, which resembles an inverted A.
If x is a variable, then ∀x is read as:
• For all x
• For each x
• For every x.
Example:
• Question: All man drink coffee.
• Answer: ∀x man(x) → drink (x, coffee).
• It will be read as: There are all x where x is a man who drink coffee.
Existential Quantifier:
Existential quantifiers are the type of quantifiers, which express that the statement within its
scope is true for at least one instance of something.
It is denoted by the logical operator ∃, which resembles as inverted E. When it is used with a
predicate variable then it is called as an existential quantifier.
If x is a variable, then existential quantifier will be ∃x or ∃(x). And it will be read as:
• There exists a 'x.'
• For some 'x.'
• For at least one 'x.'
Example:
• Question: Some boys are intelligent.
• Answer: ∃x: boys(x) ∧ intelligent(x).
• It will be read as: There are some x where x is a boy who is intelligent.
Points to remember:
• The main connective for universal quantifier ∀ is implication →.
• The main connective for existential quantifier ∃ is and ∧.
Properties of Quantifiers:
• In universal quantifier, ∀x∀y is similar to ∀y∀x.
• In Existential quantifier, ∃x∃y is similar to ∃y∃x.
• ∃x∀y is not similar to ∀y∃x.
A. Forward Chaining
Forward chaining is also known as a forward deduction or forward reasoning method when
using an inference engine. Forward chaining is a form of reasoning which start with atomic
sentences in the knowledge base and applies inference rules (Modus Ponens) in the forward
direction to extract more data until a goal is reached.
The Forward-chaining algorithm starts from known facts, triggers all rules whose premises are
satisfied, and add their conclusion to the known facts. This process repeats until the problem is
solved.
Properties of Forward-Chaining:
• It is a down-up approach, as it moves from bottom to top.
• It is a process of making a conclusion based on known facts or data, by starting from
the initial state and reaches the goal state.
• Forward-chaining approach is also called as data-driven as we reach to the goal using
available data.
• Forward chaining approach is commonly used in the expert system, such as CLIPS,
business, and production rule systems.
Consider the following famous example which we will use in both approaches:
Example:
“As per the law, it is a crime for an American to sell weapons to hostile nations. Country
A, an enemy of America, has some missiles, and all the missiles were sold to it by Robert,
who is an American citizen.”
Prove that “Robert is criminal.”
To solve the above problem, first, we will convert all the above facts into first-order definite
clauses, and then we will use a forward-chaining algorithm to reach the goal.
Solution:
• It is a crime for an American to sell weapons to hostile nations. (Let's say p, q, and r
are variables)
American (p) ∧ weapon(q) ∧ sells (p, q, r) ∧ hostile(r) → Criminal(p) ...(1)
• Country A has some missiles. ?p Owns(A, p) ∧ Missile(p). It can be written in two
definite clauses by using Existential Instantiation, introducing new Constant T1.
Owns(A, T1) ......(2)
Missile(T1) .......(3)
• All of the missiles were sold to country A by Robert.
?p Missiles(p) ∧ Owns (A, p) → Sells (Robert, p, A) ......(4)
• Missiles are weapons.
Missile(p) → Weapons (p) .......(5)
• Enemy of America is known as hostile.
Enemy(p, America) →Hostile(p) ........(6)
• Country A is an enemy of America.
Enemy (A, America) .........(7)
• Robert is American
American(Robert). ..........(8)
❖ Backward Chaining:
Backward-chaining is also known as a backward deduction or backward reasoning method
when using an inference engine. A backward chaining algorithm is a form of reasoning, which
starts with the goal and works backward, chaining through rules to find known facts that
support the goal.
Example:
In backward-chaining, we will use the same above example, and will rewrite all the rules.
Reasoning in artificial intelligence has two important forms, Inductive reasoning, and
Deductive reasoning. Both reasoning forms have premises and conclusions, but both reasoning
are contradictory to each other. Following is a list for comparison between inductive and
deductive reasoning:
• Deductive reasoning uses available facts, information, or knowledge to deduce a valid
conclusion, whereas inductive reasoning involves making a generalization from
specific facts, and observations.
• Deductive reasoning uses a top-down approach, whereas inductive reasoning uses a
bottom-up approach.
• Deductive reasoning moves from generalized statement to a valid conclusion, whereas
Inductive reasoning moves from specific observation to a generalization.
• In deductive reasoning, the conclusions are certain, whereas, in Inductive reasoning,
the conclusions are probabilistic.
• Deductive arguments can be valid or invalid, which means if premises are true, the
conclusion must be true, whereas inductive argument can be strong or weak, which
means conclusion may be false even if premises are true.
Causes of uncertainty:
Following are some leading causes of uncertainty to occur in the real world.
1. Information occurred from unreliable sources.
2. Experimental Errors
3. Equipment fault
4. Temperature variation
5. Climate change.
Probabilistic reasoning:
Probabilistic reasoning is a way of knowledge representation where we apply the concept of
probability to indicate the uncertainty in knowledge. In probabilistic reasoning, we combine
probability theory with logic to handle the uncertainty.
We use probability in probabilistic reasoning because it provides a way to handle the
uncertainty that is the result of someone's laziness and ignorance.
In the real world, there are lots of scenarios, where the certainty of something is not confirmed,
such as “It will rain today,” “behavior of someone for some situations,” “A match between two
teams or two players.” These are probable sentences for which we can assume that it will
happen but not sure about it, so here we use probabilistic reasoning.
Conditional probability:
Conditional probability is a probability of occurring an event when another event has already
happened.
Let’s suppose, we want to calculate the event A when event B has already occurred, "the
probability of A under the conditions of B”, it can be written as:
It can be explained by using the below Venn diagram, where B is occurred event, so sample
space will be reduced to set B, and now we can only calculate event A when event B is already
occurred by dividing the probability of P(A⋀B) by P(B).
Example:
In a class, there are 70% of the students who like
English and 40% of the students who likes English and
mathematics, and then what is the percent of students
those who like English also like mathematics?
Solution:
Let, A is an event that a student likes Mathematics
B is an event that a student likes English.
Hence, 57% are the students who like English also like Mathematics.
Example: If cancer corresponds to one's age then by using Bayes' theorem, we can determine
the probability of cancer more accurately with the help of age.
Bayes' theorem can be derived using product rule and conditional probability of event A with
known event B:
As from product rule we can write:
• P(A ⋀ B)= P(A|B) P(B) or
Similarly, the probability of event B with known event A:
• P(A ⋀ B)= P(B|A) P(A)
Equating right hand side of both the equations, we will get:
The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is basic of
most modern AI systems for probabilistic inference.
It shows the simple relationship between joint and conditional probabilities. Here,
P(A|B) is known as posterior, which we need to calculate, and it will be read as Probability of
hypothesis A when we have occurred an evidence B.
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we calculate
the probability of evidence.
P(A) is called the prior probability, probability of hypothesis before considering the evidence
P(B) is called marginal probability, pure probability of an evidence.
In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule can
be written as:
Where A1, A2, A3, ..., An is a set of mutually exclusive and exhaustive events.
Applying Bayes' rule:
Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and P(A).
This is very useful in cases where we have a good probability of these three terms and want to
determine the fourth one. Suppose we want to perceive the effect of some unknown cause, and
want to compute that cause, then the Bayes' rule becomes:
Example-1:
Question: what is the probability that a patient has diseases meningitis with a stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it occurs
80% of the time. He is also aware of some more facts, which are given as follows:
o The Known probability that a patient has meningitis disease is 1/30,000.
o The Known probability that a patient has a stiff neck is 2%.
Let a be the proposition that patient has stiff neck and b be the proposition that patient has
meningitis, so we can calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02
Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a stiff neck.
Example-2:
Question: From a standard deck of playing cards, a single card is drawn. The probability
that the card is king is 4/52, then calculate posterior probability P(King|Face), which
means the drawn face card is a king card.
Solution:
Problem:
Calculate the probability that alarm has sounded, but there is neither a burglary, nor an
earthquake occurred, and David and Sophia both called the Harry.
Solution:
• The Bayesian network for the above problem is given below. The network structure is
showing that burglary and earthquake is the parent node of the alarm and directly
affecting the probability of alarm's going off, but David and Sophia's calls depend on
alarm probability.
• The network is representing that our assumptions do not directly perceive the burglary
and also do not notice the minor earthquake, and they also not confer before calling.
• The conditional distributions for each node are given as conditional probabilities table
or CPT.
• Each row in the CPT must be sum to 1 because all the entries in the table represent an
exhaustive set of cases for the variable.
• In CPT, a boolean variable with k boolean parents contains 2K probabilities. Hence, if
there are two parents, then CPT will contain 4 probability values
We can write the events of problem statement in the form of probability: P[D, S, A, B, E], can
rewrite the above probability statement using joint probability distribution:
P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]
=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]
= P [D| A]. P [ S| A, B, E]. P[ A, B, E]
= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]
= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]
Let's take the observed probability for the Burglary and earthquake component:
P(B= True) = 0.002, which is the probability of burglary.
P(B= False)= 0.998, which is the probability of no burglary.
P(E= True)= 0.001, which is the probability of a minor earthquake
P(E= False)= 0.999, Which is the probability that an earthquake not occurred.
We can provide the conditional probabilities as per the below tables: