Aids I Book Sem 6
Aids I Book Sem 6
S= TechKnowledge
“FP publications
——$_
AGA 0K
Mumbai University
Third Year of Information Technology (2019 Cours e J
Subject Code Subject Name Credits
ITC604 ‘
Artificial Intelligence and Data Science 03
- 1
Farneoen
Course Objectives :
—_
1. | To introduce the students’ with different issues involved in trying to define
and simulate
intelligence,
2.
ae
To familiarize the students’ with specific, well known Artificial . ;
Intelligence methods, algorithms
and knowledge representation schemes.
3 To introduce students’ different techniques which will help them build simple intelligent
systems
based on AI/IA concepts.
4,
.
To introduce students to data science and problem solving with . vgs
data science and statistics.
5
To enable students to choose appropriately from a wider range of exploratory . .
and inferential
methods for analyzing data, and interpret the results contextually.
0 Prerequisite
Nil
Introduction to AI
Introduction to DS
Introduction and Evolution of Data Science, Data Science Vs. Business Analytics
Vs. Big Data, Data Analytics, Lifecycle, Roles in Data Science Projects. 04
(Refer Chapter 5)
Introduction to ML
(Refer Chapter 6)
Total 39
ooo
7
:
WF Aland DS-1 (MU) 1 Table of Contents
wy TechKnowledge
Puoticatiaqas
||
WF Aland ps - 1 (MU)
Pay
2
¢
2.4.1 Concept ‘a
2.4.2 Implementation
ae
24.3 Algorithm as
2.4.4 Performance Evaluation ag
2.5 Breadth First Search (BFS) 2-4
2.5.1 2-5
Concept
2-5
2.5.2 Process
2.5.3 Implementation 2-5
2.5.4 Algorithm
2-5
2.5.5 2-5
Performance Evaluation
2.6 2-5
Uniform Cost Search (UCS)
2-6
2.6.1 Concept
2.6.2 Implementation 2-6
2-6
2.6.3 Algorithm
2-6
2.6.4 Performance Evaluation
26
2.7 Depth Limited Search (DLS)
-
2.7.1 Concept
2-7
2.7.2 Process
2.7.3 2-7
EmplementatlOnt roniennnnnniitninninnanninitinapeans
innni 2.7
2.7.4 MBO wenn
a 2-7
8G OE meena
it
oOerrerr—rvrrvrcrv 2-8
titi 2-8
Se TechKrowledge
Publicatrans
We Aland DS - 1 (MU) 3 Table of Contents
2.8 Iterative Deepening DFS (IDDFS)
2-8
2.8.1 Concept
a6
2.8.2 Process
a8
2.8.3 Implementation
#10
2.8.4 Algorithm
2-10
2.8.5 Pseudo Code
2-40
2.8.6 Performance Evaluation
2-10
2.9 Bidirectional Search
2-11
2.9.1 Concept
2-11
2.9.2 Process 2-11
2.9.3 Implementation
2-11
2.9.4 Performance Evaluation
2-11
2.9.5 Pros of Bidirectional Search
2-12
2.9.6 Cons of Bidirectional Search
2-12
2.10 Comparing Different Techniques
2-12
2.10.1 Difference between BFS and DFS
2-12
2.11 Informed Search Techniques
2-13
2.12 Heuristic Function
2-14
2.12.1 Example of 8-puzzle Problem
2-15
2.12.2 Example of Block World Problem
2-15
2.12.3. Properties of Good Heuristic Function
2-17
2.13 Best First Search
2-17
2.13.1 Concept
2-17
2.13.2 Implementation
2-18
2.13.3. Algorithm : Best First Search
2-18
2.13.4 Performance Measures for Best First Search
2-19
2.13.5 Greedy Best First Search
2-19
2.13.6 Properties of Greedy Best-first Search
2-20
2.14 A* Search
2-20
2.14.1 Concept
2-20
2.14.2. Implementation
2-20
2.14.3. Algorithm (A*)
2-21
2.14.4 Behaviour of A* Algorithm
2-22
2.14.5 Admissibility of A*
2-23
2.14.6 | Monotonicity
2-24
2.14.7 Properties of A*
2-24
2.14.8 Example: 8 Puzzle Problem using A* Algorithm
2-24
Wee
Wal and DS -4 (MU) A Table of Contents
214.9 — Caparison among Best First Search, A* search and Greedy Best First Search ...mormsumuninnnnnnnnnenrssn 2-27
2.15 Memory Rounded Heuristic Searches nnn.
ea]
2.15.1 Iterative Deepening A* (IDA*) {Self Study Topics} wou 2-27
2.15.2 Simplified Memory-Bounded A* (SMA‘) {Self Study Topic} ae)
| 2.15.3 Advantages of SMA* over A* and IDA‘ el
2.15.4 Limitation of SMA* th
| 2.16 Local Search Algorithms and Optimization Problems
2:31
| 2.16.1 Hill Climbing
231
2.16.1(A) Simple Hill Climbing ost
2.16.1(B) Steepest Ascent Hill Climbing 2-33
2.16.1(C) Limitations of Hill Climbing 2-38
2.16.1(D) Solutions on Problems in Hill Climbing. 2-35
2.16.2 Simulated Annealing 2-35
2.16.2(A) Comparing Simulated Annealing with Hill Climbing 2-36
2.16.3 Local Beam Search
237
. 2.17 Crypto-Arithmetic Problem
2-39
z 2.18 Constraint Satisfaction Problem 2-41
2.18.1 Graph Coloring 2-42
2.18.2 Varieties of CSPs 2-43
| 2.18.3 Varieties of Constraints 2-43
| 2.18.4 — Backtracking in CSPs 2-43
| 2.18.5 Improving Backtracking Efficiency 2-44
2.18.6 Water Jug
2-47
2.19 Adversarial Search 2-49
2.20 Environment Types 2-49
a 2.21 Al Game - Features
2-49
2.21.1 ZeroSum Game 2-50
2.21.2 Non-Zero Sum Game 2-50
2.21.2(A) Positive Sum Game 2-50
2.21.2(B) Negative Sum Game 2-51
2.22 Relevant Aspects of Al Game 2-51
} 2.23 Game Playing
2-51
2.23.1 Type of Games 2-52
2.23,1(A) Chess 2-53
| 2.23.1(B) Checkers 2-54
| 2.23.2 Whatis Game Tree? 2-54
| Fees
|
|
\
WF Aland DS - 1 (MU) 5 Table of Contents
eee eee
Syllabus : Introduction and Evolution of Data Science, Data Science Vs, Business Analytics Vs. Big Data, Data Analytics, Lifecycle,
Roles in Data Science Projects, Self-Learning Topics : Applications and Case Studies of Data Science in various Industries.
Syllabus : Introduction to exploratory data analysis, Typical data formats. Types of EDA, Graphical/Non graphical Methods,
Univariate/multivariate methods Correlation and covariance, Degree of freedom, Statistical Methods for Evaluation including
ANOVA.
Self-Learning Topics: Implementation of graphical EDA methods,
Syllabus : Introduction to Machine Learning, Types of Machine Learning: Supervised (Logistic Regression, Decision Tree,
Support Vector Machine) and Unsupervised (K Means Clustering, Hierarchical Clustering, Association Rules) Issues in Machine
learning, Application of Machine Learning Steps in developing a Machine Learning Application.
Self-Learning Topics : Real world case studies on machine learning
6.1 Introduction to Machine Learning
6-1
6.2 Types of Machine Learning
6-1
6.2.1 Supervised Learning 6-2
6.2.1(A) Logistic Regression 6-2
=e 6.2.1(B) Decision Tree
6-3
Ee 6.2.1(C) Support Vector Machine 6-4
6.2.2 Unsupervised Learning 6-5
6.2.2(A) K Means Clustering 6-6
6.2.2(B) Hierarchical Clustering 6-8
6.2.2(C) Association Rules 6-9
6.3 Issues In Machine Learning 6-10
6.3.1 Application of Machine Learning 6-12
6.3.2 Steps in developing a Machine Learning Application 6-15
6.4 Self-Learning Topics : Real world case studies on machine learning
6-16
000
INTRODUCTION TO AI
1.2 Foundations of Al
University Question
University Question
Q. Explain Turing test designed for satisfactory operational definition of Al. MU - May 16 |
BF Aland Ds <1 (MU) 1.2 Introduction to Ay
=<
Definition 1: “The art of creating machines that perform functions that pesjulves taletiigente Wheh performed
by people.” (Kurzweil, 1990)
| Definition 2: "The study of how to make computers do things at which, at the moment, people are better.” (Rich
and Knight, 1991)
| ® — To judge whether the system can act like a human, Sir Alan Turing had designed a test known as Turing test.
i ‘ As shown in Fig. 1.2.1, in Turing test, a computer needs to interact with a human pee ae by answering his
| questions in written format. Computer passes the test if a human interrogator, cannot identify whether the
written responses are from a person or a computer. Turing test is valid even after 60 year of research.
Al System
Interrogator
e For this test, the computer would need to possess the following capabilities :
1. Natural Language Processing (NLP) : This unit enables computer to interpret the English language and
communicate successfully.
2. Knowledge Representation : This unit is used to store knowledge gathered by the system through input
devices.
3. Automated Reasoning : This unit enables to analyze the knowledge stored in the system and makes new
inferences to answer questions.
Machine Learning : This unit learns new knowledge by taking current input from the environment and
adapts to new circumstances, thereby enhancing the knowledge base of the system.
To pass total Turing test, the computer will also need to have computer vision, which is required to perceive
| | objects from the environment and Robotics, to manipulate those objects.
Natural
|
i Knowledge Automated Machine |
Language Computer . ted
Processing Representation| | Reasoning Leaming vision Repetics
Fig. 1.2.2 lists all the capabilities a computer needs to have in order to exhibit artificial intelligence. Mentioned
above are the six disciplines which implement most of the artificial intelligence,
4 Aland DS - 1 (MU) 1-3 Introduction to Al
Definition 1: "The exciting new effort to make computers think ... machines with minds, in the full and literal
sense”, (Haugeland, 1985)
Definition 2 : “The automation of activities that we associate with human thinking, activities such as decision
making, problem solving, learning ...” (Hellman, 1978)
¢ Cognitive science : It is inter disciplinary field which combines computer models from Artificial Intelligence
with the techniques from psychology in order to construct precise and testable theories for working of human
mind.
e In order to make machines think like human, we need to first understand how human think. Research showed
that there are three ways using which human’s thinking pattern can be caught.
1, Introspection through which human can catch their own thoughts as they go by.
2. Psychological experiments can be carried out by observing a person in action.
3. Brain imaging can be done by observing the brain in action.
* — By catching the human thinking pattern, it can be implemented in computer system as a program and if the
program's input output matches with that of human, then it can be claimed that the system can operate like
humans.
1.2.3. Thinking Rationally : The “Laws of Thought” Approach
Definition 1 : “The study of mental faculties through the use of computational models”. (Charniak and
McDermott, 1985)
Definition 2 : “The study of the computations that make it possible to perceive, reason, and act”.
‘ @ The laws of thought are supposed to implement the operation of the mind and their study initiated the field
called logic. It provides precise notations to express facts of the real world.
¢ — It also includes reasoning and “right thinking” that is irrefutable thinking process. Also computer programs
based on those logic notations were developed to create intelligent systems.
There are two problems in this approach:
1. This approach is not suitable to use when 100% knowledge is not available for anywy prablewe.
2. As vast number of computations was required even to implement a simple human reasoning process; practically,
all problems were not solvable because even problems with just a few hundred facts can exhaust the
computational resources of any computer.
1.2.4 Acting Rationally : The Rational Agent Approach
Definition 1 : “Computational Intelligence is the study of the design of intelligent agents”. (Poole et at, 1998)
Rational Agent
e Agents perceive their environment through sensors over a prolonged time period and adapt to change to create
and pursue goals and take actions through actuators to achieve those goals. A rational agent is the one that does
“right” things and acts rationally so as to achieve the best outcome even when there is uncertainty in knowledge.
e The rational-agent approach has two advantages over the other approaches
1. As compared to other approaches this is the more general approach as, rationality can be achieved by
selecting the correct inference from the several available.
W TechKuowledge
PuGlicadlions
Intro0duction ntotoA Al
___ 1-4
BF‘ Al and Ds - 1 (MU)
letely general and can be
is mathematic ally well defined and comp
2. Rationality has specific standards and hand, is very subjective and
eve it. Human behav ior, on the other
used to develop agent designs that achi
cannot be proved mathematically.
on the reasoning expected from
y and thinking rationally are based
| e The two approaches namely, thinking humanl intelligent
while ; the other two actin g huma nly and acting rationall y are based on the
| intelligent syste ms
behaviour expected from them.
nally approach.
In our syllabus we are going to study acting ratio
Super intelligence ranges from a machine which is just a little smarter than a human to a machine that is trillion
times smarter. Artificial super intelligence is the ultimate power of Al.
1.4 Components of AI
Al is a vast field for research and it has got applications in almost all possible domains. By keeping this in mind,
components of Al can be identified as follows : (refer Fig. 1.4.1)
1. Perception
2. Knowledge representation
3. Learning
4. Reasoning
5. Problem Solving
6. Natural Language Processing (language-understanding).
OF lectt awmiedse
pubticartons
BF Aland DS- 1 (MU) 1: Introduction to AI
Va
Intelligence
Linguistic intelligence
a
Leaming
isa
Perception
&
Problem solving
Perception
In order to work in the environment, intelligent agents need to scan the environment and the various objects in it. Agent
scans the environment using various sens‘e organs like camera, temperature sensor, etc. This is called as perception.
After capturing various scenes, perceiver analyses the different objects in it and extracts their features and relationships
among them.
Knowledge representation
The information obtained from environment through sensors may not be in the format required by the system. Hence, it
need to be represented in standard formats for further processing like learning various patterns, deducing inference,
comparing with past objects, etc. There are various knowledge representation techniques like Prepositional logic and
first order logic.
Learning
Learning is a very essential part of Al and it happens in various forms. The simplest form of learning is by trial
and
error. In this form the program remembers the action that has given desired output and discards the other trial
actions and learns by itself. It is also called as unsupervised learning. In case of rote learning, the program simply
remembers the problem solution pairs or individual items. In other case, solution to few of the problems is given
as
input to the system, basis on which the system or program needs to generate solutions for new problems.
This is
known as supervised learning.
Reasoning
Reasoning is also called as logic or generating inferences form the given set of facts. Reasoning
is carried out
based on strict rule of validity to perform a specified task. Reasoning can be of two types, deductive
or inductive.
The deductive reasoning is in which the truth of the premises guarantees the truth of the conclusion
while, in
case of inductive reasoning, the truth of the premises supports the conclusion, but it cannot
be fully dependent
on the premises. In programming logic generally deductive inferences are used. Reasoning
involves drawing
inferences that are relevant to the given problem or situation.
Problem-solving
Al addresses huge variety of problems. For example, finding out winning moves on the board games,
planning
actions in order to achieve the defined task, identifying various objects from given images,
etc. As per the types
of problem, there is variety of problem solving strategies in Al. Problem solving methods are
mainly divided into
general purpose methods and special purpose methods. General purpose methods are applicable to
wide range
of problems while, special purpose methods are customized to solve particular type of problems.
W TechKnewledge
Publications
WF Aland ps1 (MU) 1-6 Introduction ¢ oAl
=
=e
Cl concentrates on low level cognitive function | Al concentrates of high level cognitive structure design.
implementation.
* — Biological-inspired AI Techniques :
o Neural networks
Oo Genetic algorithms .
© Reinforcement learning
Let us understand them one by one,
1. Describe and match:
In this technique, system's behaviour is explained
in terms of a finite state model and computation
model.
* Finite state model : It consists of a set
of States, a set of input events and the
Based on the current state and an input relations between them.
event, the next state of the model can be
determined.
* Computation model ; It is a finite state machine
which in cludes a set of States, a start state, a transi
function and an input alphabet. The transition funct tion
ion provides mapping between input symbols
current states to a next and
state,
8 Transition relation : If a pair of states (S, S')
is such that one move takes the system from S
transition relation to S', then the
is represented by $ > S’, State-transition system is called
deterministic if every state has
at most one successor; It is called non-deterministic
if at least one state has more than one successor.
at)
/
et Lt+— LLL 2) 1)
/
Likumea am
INPb
when — yy
[] (0) [1,2] {1} () [2] (1] [2] {] O [1,2] []
Fig. 1.5.2
¢ In this problem, the optimal solution is the sequence of transitions from the start state downward to the
extreme left branch of the transition tree.
2. Tree searching
* Tree searching is a very commonly used technique to solve many problems. Goal reduction, constraint
networks are few of the areas. Tree is searched through many nodes to obtain the goal node and it gives the
path from source node to destination node.
* — Ifeach node of the entire tree is explored while search for the goal node, it is called as exhaustive
search. All
the searching techniques of Al are broadly classified as uninformed searching and informed searching.
Ww Tech Knowledge
Puolicatians
WF Aland Ds <1 (Mu) 1-8 Introduction tq Al
<==
3. Goal reduction
In goal reduction technique, the main goal is divided into sub goals. [t requires some procedures to follow.
Goal reduction procedures are an alternative to declarative and logic based representations.
Goal reduction process includes hierarchical sub division of goals into sub goals is carried out until the sub
goals have an immediate solution. An AND-OR tree/graph structure can represent relations between goals
and sub-goals, alternative sub-goals and conjoint sub-goals.
There are different goal levels. Higher-level goals are higher in the tree, and lower level goals are lower in
the tree. There are directed arcs from a higher level to lower level nodes, representing the reduction of
higher level goal to lower level sub goals. Nodes are at the leaf nodes of the tree. These goals cannot be
divided further.
Example
An AND-OR tree structure to represent facts such as “enjoyment”, “earning/save money”, “old age” etc.
Fig. 1.5.3
The AND-OR tree structure describes following things :
Hierarchical relationships between goals and sub-goals
The “Earn more money”, is a sub-goal of “Improve
standard of living’, which in turn is a sub-goal of
“Improve
enjoyment of life”.
Alternative ways of trying to solve a goal
The “Go on strike” and “Improve productivity”
are alternative ways of trying to “Earn more money
”,
Conjoint sub-goals
These are the goal which depends on more than one
sub-goals,
To “Provide for old age”, not only need to “Earn
more money”, but also need to “Save money”,
4. Constraint satisfaction
Constraint satisfaction is a process of gener
ating solution that Satisfies all the speci
fied constraints for a
given problem. There are variable which needs to get assigned value
s from a specific domain.
To generate the solution it uses backtracking mecha
nism. The optimal solution is the one which is
generated in minimum number of backtracks
and satisfies all the constraints, thereby assig
values to each of the variable. ning proper
Ww Tech Knowledge
Publications
BF Aland DS- 1 (MU) 1-9 Introduction to Al
There are multiple fields having application of constraint satisfaction. Artificial Intelligence, Programming
Languages, Symbolic Computing, and Computational Logic are few of them to name.
Example : N-qucen problem, as in this problem the queens are the variables to whom a position in an n*n
matrix need to be assigned, which is a value for those variables, Also the problem states some conditions on
placement of those queens (variable), ic. no two queens can clash either horizontally, vertically or
diagonally.
5. Generate and test
It is the simplest form of searching technique. Generate-and-test method first generates a node in the
search tree and then checks for whether it’s a goal node. It involves two processes as shown in Fig. 1.5.4.
1. Generate the successor node.
2. Test to match it with each of the proposed goal node.
As there are many unfruitful nodes gets generated during this process, it's not a very efficient technique.
Example : Problem of opening a combination lock by trial and error technique without knowing the
combination.
Goal Test ; =
Rule based systems are the simplest and most useful systems. In its simplest form they have set of rules, an
interpreter and input from the environment. Rules are of the form IF <condition> THEN <action>.
a - Interpreter.
Conditions
WF TechKnowledge
Publications
e
WF Aland bs-1(MU) 1-10 _
J j
arroduction =
to A
While using a particular rule, first the conditions are matched to the current observations and i
satisfied, then rule may be fired.
3. Interpreter : It performs the matching of the rule with respect to the current precepts observed in the
environment, It has three repetitive tasks to follow : Retrieval of the matching rule, refinement of the
rule and execution of the rule by performing the corresponding action.
University Question
Q. Explain how you will formulate search problem.
CECE)
* — Given a goal to achieve; problem formulation is the process of deciding what states to be considered and what
actions to be taken to achieve the goal. This is the first step to be taken by any problem solving agent
* State space : The state space of a problem is the set of all states reachable from the initial state by executing any
sequence of actions. State is representation of all possible outcomes.
* The state space specifies the relation among various problem states thereby, forming a directed network or
graph in which the nodes are states and the links between nodes represent actions.
° State Space Search: Searching in a given space of states pertaining to a problem under consideration is called a
State space search.
¢ Path: A path is a sequence of states connected by a sequence of actions, ina given state space.
University Question
Q. Explain steps in problem formulation with example.
1, Initial state : The initial state is the one in which the agent starts in.
2. Actions : It is the set of actions that can be executed or applicable in
all possible states. A description of what
each action does; the formal name for this is the transition model.
3. Successor function : It is a function that returns a state on executi
ng an action on the current state.
4. Goal test : It is a test to determine whether the current state is a goal state.
In some problems the goal test can
be carried out just by comparing current state with the defined goal state,
called as explicit goal test. Whereas,
in some of the problems, state cannot be defined explicitly but needs to be
generated by carrying out some
computations, it is called as implicit goal test.
For example : [n Tic-Tac-Toe game making diagonal or vertical or horizont
al combination declares the winning
state which can be compared explicitly; but in the case of chess game, the goal
state cannot be predefined but it’s
a scenario called as “Checkmate”, which has to be evaluated implicitl
y.
W Tech Knowledge
Puplicacions
*
5. Path cost: Itis simply the cost associated with each step to be taken to reach to the goal state. To determine the
cost to reach to cach state, there is a cost function, which is chosen by the problem solving agent.
Problem solution : A well-defined problem with specification of initial state, goal test, successor function, and
path cost. It can be represented as a data structure and used to implement a program which can search for the
goal state. A solution to a problem is a sequence of actions chosen by the problem solving agent that leads from
the initial state to a goal state, Solution quality is measured by the path cost function.
Optimal solution : An optimal solution is the solution with least path cost among all solutions.
A gencral sequence followed by a simple problem solving agent is, first it formulates the problem with the goal
to be achieved, then it searches for a sequence of actions that would solve the problem, and then executes the actions
one ata time.
1 2 3 1 2 3
4 8 - 4 5 6
7 6 5 7 8 -
e The problem statement as discussed in the previous section. Let's formulate the problem first.
W TechKnowladge
Pudlicatians
a
AF Aland Ds -1 (MU) 1-12 Introduction ty Al
a
* States +: In this problem, state can be data structure having triplet (1, J. k) representing the number of
missionaries, cannibals, and canoes on the left bank of the river respectively.
n the left bank of the river,
1, Initial state: It is (3, 3, 1), as all missionaries, cannibals and canoes are 0
2. Actions: Take x number of missionaries a ndy number of cannibals
r will have two
3. Successor function : If we take one missionary, one cannibal the other side of the rive
missionaries and two cannibals left.
(3,3,1) ~(2,2,0) —(3,2,1) -(3,0,0) -(3,1,1) —{ 1,1,0) (2,2,1) —(0,2,0) —(0,3,1) -(0,1,0) —(0,2,1) —{0,0,0)
Cost = 11 crossings
States : In vacuum cleaner problem, state can be represented as [<block>, clean] or [<block>, dirty]. The agent
can be in one of the two blocks which can be either clean or dirty. Hence there are total 8 states in the vacuum cleaner
world.
1. Initial State : Any state can be considered as initial state. For example, [A, dirty]
2. Actions: The possible actions for the vacuum cleaner machine are left, right, absorb, idle.
3. Successor function : Fig. 1.6.2 indicating all possible states with actions and the next state.
R
| |=))a
82 | 8B I] 8B)
i( jad | lel)
wys ers
4. Goal Test ; The aim of the vacuum cleaner Is to clean both the blocks. Hence the goal test if [A,
Clean] and [B,
Clean].
5. Path Cost : Assuming that each action/ step costs 1 unit cost. The path cost is number of actions/ steps taken.
rar TechKnowledge
Publications
14 Aland DS - 1 (MU) 1-13 Introduction to Al
1.6.5 Example of Real Time Problems
* There are varieties of real time problems that can be formulated and solved by searching. Robot
Navigation,
Rout Finding Problem, Travelling Salesman Problem (TSP), VLSI design problem, Automatic
Assembly
Sequencing, etc. are few to name.
* There are number of applications for route finding algorithms. Web sites, car navigation systems
that provide
driving directions, routing video streams in computer networks, military operations planning, and airline travel-
planning systems are few to name. All these systems involve detailed and complex specifications.
¢ For now, let us consider a problem to be solved by a travel planning web site; the airline travel problem.
* State : State is represented by airport location and current date and time. In order to calculate the path cost
state may also record more information about previous segments of flights, their fare bases and their status
as
domestic or international,
1. _ Initial state : This is specified by the user's query, stating initial location, date and time.
2. Actions : Take any flight from the current location, select seat and class, leaving after the current
time,
leaving enough time for within airport transfer if needed,
3. Successor function : After taking the action i.e. selecting fight, location, date, time; what is the next
location
date and time reached is denoted by the successor function. The location reached is considered
as the
current location and the flight's arrival time as the current time.
4. Goal test : Is the current location the destination location?
Path cost : In this case path cost is a function of monetary cost, waiting time, flight time, customs
and
immigration procedures, seat quality, time of day, type of airplane, frequent-flyer mileage awards
and so
on.
W Tech Knowledge
Publications
oe
WF Aland DS - 1 (MU) 1-14 Introduction to Ay
= LSS ==
After understanding what an agent is, let’s try to figure out sensor and actuator for a robotic agent, can you think
of sensors and actuators in case of a robotic agent?
The robotic agent has cameras, infrared range finders, scanners, etc. used as sensors, while various types of
motors, screen, printing devices, etc. used as actuators to perform action on given input.
Human
hands, legs, mouth, and other
body parts for actuators =
Aeon
cameras and infrared
Tange finders for sensors; ,
Robotic
various motors
{ie }
= ye foractuators 9,
¢ The agent function is the description of what all functionalities the agent is supposed to do. The agent
function provides mapping between percept sequences to the desired actions. It can be represented as
[f: P* => A]
e Agent program is a computer program that implements agent function in an architecture suitable language.
Agent programs needs to be installed on a device in order to run the device accordingly. That device must have
some form of sensors to sense the environment and actuators to act upon it. Hence agent is a combination of the
architecture hardware and program software.
Agent = Architecture + Program
e Take a simple example of vacuum cleaner agent. You might have seen vacuum cleaner agent in “WALL-
E"(animated movie). Let's understand how to represent the percept's (input) and actions (outputs) used in case
of a vacuum cleaner agent.
Ww Tech Knowledge
Publications
PF Al and DS - 1 (MU) 1-15 Introduction to Al
As shown in Fig, 1.7.4, there are two blocks A and B
having
some dirt. Vacuum cleaner agent supposed to sense the
dirt
and collect it, thereby making the room clean. In order to do
that the agent must have a camera to see the dirt and a
mechanism to move forward, backward, left and right to reach
08S 08S
to the dirt. Also it should absorb the dirt. Based on the percepts,
© ©
actions will be performed. For example : Move left, Move right,
absorb, No Operation. Fig. 1.7.4 : Vacuum cleaner Agent
Hence the sensor for vacuum cleaner agent can be camera, dirt sensor and the actuator can be motor to make it
move, absorption mechanism. And it can be represented as
[A, Dirty], [B, Clean], [A, absorb],[B, Nop], etc.
There are various definitions exist for an agent. Let’s see few of them.
IBM states that agents are software entities that carry out some set of operations on behalf of a user or another
program.
FIPA : Foundation for Intelligent Physical Agents (FIPA) terms that, an agent is a computational process that
implements the autonomous functionality of an application.
Another definition is given as “An agent is anything that can be viewed as perceiving its environment through
sensors and acting upon the environment through effectors”.
|
a
|
Solving
# experiments
{ : and
AM \ assignments
a
Washing clothes Cleaning service
By Russell and Norvig, F. Mills and R. Stufflebeam’s definition says that “An agent is anything that is capable of
acting upon information it perceives. An intelligent agent is an agent capable of making decisions about how it
acts based on experience”.
From above definitions we can understand that an agent is : (As per Terziyan, 1993)
o Goal-oriented o Creative
o Adaptive o Mobile
© Social o — Self-configurable
Ww Tech Knowledge
Puolications
:
VF Aland DS <1 (MU) 1-16 Introduction to ay
(peice eens eee ie
The basic abilities of an intelligent agent are to exist to be self-governed, responsive, goal-oriented, etc.
In case of intelligent agents, the software modules are responsible for exhibiting intelligence. Generally observed
capabilities of an intelligent agent can be given as follows :
o Ability to remain autonomous (Self-directed)
o Responsive
© Goal-Oriented
e — Intelligent agent is the one which can take input from the environment through its sensors and act upon the ~
environment through its actuators. Its actions are always directed to achieve a goal.
Scan DB for
corresponding action
Database of input
and actions"
Action is
given as output
y
W TeckKnowledge
Publications
WF Aland Ds-1 (MU) 1-17 Introduction to Al
So the new task will be to find solution if the hand is burnt. Now, you think about the states which will be
followed in this situation, As per Wooldridge and Jennings, “An Intelligent agent is one that Is capable of taking
flexible self-governed actions”,
_They say for an intelligent agent to meet design objectives, flexible means three things :
1. Reactiveness 2. Pro-activeness
3. Social ability
1, _Reactiveness : It means giving reaction to a situation in a stipulated time frame. An agent can perceive the
environment and respond to the situation in a particular time frame. In case of reactiveness, reaction within
situation time frame is more important. You can understand this with above example, where, if an agent
takes more time to take his hand away from the hot pan then agents hand will be burnt.
2. _Pro-activeness : It is controlling a situation rather than just responding to it. Intelligent agent show goal-
directed behavior by taking the initiative. For example : If you are playing chess then winning the game is
the main objective. So here we try to control a situation rather than just responding to one-one action
which means that killing or losing any of the 16 pieces is not important, whether that action can be helpful
to checkmate your opponent is more important.
3. Social ability : Intelligent agents can interact with other agents (also humans). Take automatic car driver
example, where agent might have to interact with other agent or a human being while driving the car.
Following are few more features of an intelligent agent.
o Self-Learning : An intelligent agent changes its behaviour based on its previous experience. This agent
keeps updating its knowledge base all the time.
o Movable/Mobile: An Intelligent agent can move from one machine to another while performing actions.
o —_Self-governing : An Intelligent agent has control over its own actions.
University Question
Q. Define rationality and rational agent. Give an example of rational action performed by any intelligent agent.
For problem solving, if an agent makes a decision based on some logical reasoning, then, the decision is called as
a “Rational Decision”. The way humans have ability to make right decisions, based on his/her experience and
logical reasoning; an agent should also be able to make correct decisions, based on what it knows from the
percept sequence and actions which are carried out by that agent from its knowledge.
Agents perceive their environment through sensors over a prolonged time period and adapt to change to create
and pursue goals and take actions through actuators to achieve those goals. A rational agent is the one that
does “right” things and acts rationally so as to achieve the best outcome even when there is uncertainty in
knowledge.
A rational agent is an agent that has clear preferences, can model uncertainty via expected values of variables or
functions of variables, and always chooses to perform the action with the optimal expected outcome for itself
from among all feasible actions. A rational agent can be anything that makes decisions, typically a person, a
machine, or software program.
Tech Knowledge
Publications
and DS - 1 (MU) 1-18 Introduction to AJ
® Rationality depends on four main criteria: First is the performance measure which defines the criterion of
success for an agent, second is the agent's prior knowledge of the environment, and third is the action performed
by the agent and the last one is agent's percept sequence to date.
© Performance measure is one of the major criteria for measuring success of an agent's performance. Take a
vacuum-cleaner agent's example. The performance measure of a vacuum-cleaner agent can depend upon various
factors like it's dirt cleaning ability, time taken to clean that dirt, consumption of electricity, etc.
e For every percept sequence a built-in knowledge base is updated, which is very useful for decision making,
because it stores the consequences of performing some particular action. If the consequences direct to achieve
desired goal then we get a good performance measure factor, else, if the consequences do not lead to desired
goal state, then we get a poor performance measure factor. ;
(a) Agent's finger is hurt while using Nail andhammer —(b) Agent is using Nail and hammer efficiently
Fig. 1.8.1 |
e For example, see Fig. 1.8.1. If agent hurts his finger while using nail and hammer, then, while using it for the
next time agent will be more careful and the probability of not getting hurt will increase. In short agent will be
able to use the hammer and nail more efficiently.
e Rational agent can be defined as an agent who makes use of its percept sequence, experience and knowledge to
maximize the performance measure of an agent for every probable action. It selects the most feasible action
which will lead to the expected results optimally.
Environmental
7 Tech Knowledge
Publicacior?
¥F Al and DS - 1 (MU)
.,
1-19 Introduction to AJ
.
University Question
Q. Describe different types of environments applicable to Al agents. MU - Dec. 13, May 15
1. Fully observable vs. Partially observable
The first type of environment is based on the
observability. Whether the agent sensors can have
access to complete state of environment at any given Fully observable
time or not, decides if it is a fully observable or (vs. partially observable)
For example, In case of an automated car driver system, automated car cannot predict what the other
drivers are thinking while driving cars. Only because of the sensor’s information gathering expertise it is
possible for an automated car driver to take the actions.
2. Single agent vs. Multi-agent
The second type of an environment is based on the number of agents acting in the environment. Whether
the agent is operating on its own or in collaboration with other agents decides if it is a Single agent or a
multi-agent environment.
For example : An agent playing Tetris by itself can be a single agent environment, whereas we can have an
agent playing checkers in a two-agent environment. Or in case of vacuum cleaner world, only one machine
is working, so it’s a single agent while in case of car driving agent, there are multiple agents driving on the
road, hence it’s a multi-agent environment.
Multi-agent environment is further classified as Co-operative multi-agent and Competitive multi-agent.
Now, you might be thinking in case of an automated car driver system which type of agent environment do
we have?
W TeckKnowledga
Publications
Al and DS- 1 (MU -
¥ . (MU) x 1-20 _ Introduction to A)
e Let's understand it with the help of an automated car driving example. For a car driving
system 'X' other
car say 'Y’ is considered as an Agent. When 'Y' tries to maximize its performance measure
and the input
taken by car 'Y' depends on the car 'X'. Thus it can be said that for an automated car driving system
we have
a cooperative multi-agent environment.
e Whereas in case of “chess game” when two agents are operating as opponents, and trying to maximize
their
own performance, they are acting in competitive multi agent environment.
e An environment is called deterministic environment, when the next state of the environment can be
completely determined by the previous state and the action executed by the agent.
¢ For example, in case of vacuum cleaner world, 8-puzzle problem, chess game the next state of the
environment solely depends on the current state and the action performed by agent.
¢ Stochastic environment generally means that the indecision about the actions is enumerated in terms of
probabilities. That means environment changes while agent is taking action, hence the next state of the
world does not merely depends on the current state and agent’s action. And there are few changes
happening in the environment irrespective of the agent's action. An automated car driving system has a
: stochastic environment as the agent cannot control the traffic conditions on the road.
e In case of checkers we have a multi-agent environment where an agent might be unable to predict the
action of the other player. In such cases if we have partially observable environment then the environment
is considered to be stochastic.
e Jf the environment is deterministic except for the actions of other agents, then the environment is
strategic. That is, in case of game like chess, the next state of environment does not only depend upon the
current action of agent but it is also influenced by the strategy developed by both the opponents for future
moves.
e¢ We have one more type of environment in this category. That is when the environment types are not fully
observable or non-deterministic; such type of environment is called as uncertain environment.
Task Car driving| Part-Picking |Cross word puzzle| Soccer game | Checkers -
environment Robot vei with’ clock
Observable Partially Partially fully partially Fully
Agents Multi agent Single agent single Multi agent Multi agent
(cooperative) (competitive) | (competitive)
Deterministic Stochastic Stochastic deterministic Strategic Strategic
Episodic Sequential Episodic sequential sequential Sequential
W bers tcatioa ne
WF Aland DS - 1 (MU) 1-22 Introduction to Ay
4
Q. _ Give PEAS description for a robot soccer player. Characterize its environment. CUEaG
Q. __ What are PEAS descriptor ? Give PEAS descriptors for Part — picking Robot.
PEAS : PEAS stands for Performance Measure, Environment, Actuators, and Sensors. It is the short form
used for performance issues grouped under Task Environment.
You might have seen driverless/ self driving car videos of Audi/ Volvo/ Mercedes, etc. To develop such
driverless cars we need to first define PEAS parameters.
Performance Measure : It the objective function to judge the performance of the agent. For example, in case of
pick and place robot, number of correct parts in a bin can be the performance measure.
Environment: [t the real environment where the agent need to deliberate actions,
Actuators : These are the tools, equipment or organs using which agent performs actions in the environment.
This works as the output of the agent.
Sensors : These are the tools, equipment or organs using which agent captures the state of the environment.
This works as the input to the agent.
To understand the concept of PEAS, consider following examples.
(A) Automated Car driving agent
wW Tech Knowledge
Publications
¥F Aland DS-1 (MU) 1-23 Introduction to AI
li) Traffic conditions : You will find different set of traffic conditions for different type of roads.
Automated system should be able to drive efficiently in all types of traffic conditions. Sometimes traffic
conditions are formed because of pedestrians, animals, etc.
iii) Clients : Automated cars are created depending on the client's environment. For example, in some
countries you will sce left hand drive and in some countries there is a right hand drive. Every
country/state can have different weather conditions. Depending upon such constraints automated car
driver should be designed.
Actuators are responsible for performing actions/providing output to an environment.
In case of car driving agent following are the actuators :
(i) Steering wheel which can be used to direct car in desired direction (i.e. right/left)
(ii) Aaccelerator, gear, etc. can be useful to increase or decrease the speed of the car.
(iii) Brake is used to stop the car.
(iv) Light signal, horn can be very useful as indicators for an automated car.
Sensors : To take input from environment in car driving example cameras, sonar system, speedometer,
GPS, engine sensors, etc. are used as sensors.
1. Performance Measures
Healthy patient : system should make use of sterilized instruments to ensure the safety (healthiness) of
the patient,
Minimize costs : the automated system results should not be very costly otherwise overall expenses of the
patient may increase, Lawsuits. Medical diagnosis system should be legal.
Environment: Patient, Doctors, Hospital Environment
Sensors : Screen, printer
Actuators : Keyboard and mouse which is useful to make entry of symptoms, findings, patient's answers to
given questions. Scanner to scan the reports, camera to click pictures of patients.
(D) Soccer Player Robot
1. Performance Measures : Number of goals, speed, legal game.
2 Environment : Team players, opponent team players, playing ground, goal net.
‘
3. Sensors : Camera, proximity sensors, infrared sensors.
4 Actuators : Joint angles, motors.
Ww TechKnowledge
Publications
I EEEEDESC#C#“#‘“#S#NSC”#””” OE]
Le NSS
simple reflex agents. any
T NR KlE ety PS Mtarenutesteyb
_. What action |. .
Should.
be taken 2.
output / action
Effectors
J
Fig. 1.10.2 : Simple reflex agents
An agent which performs actions based on the current input
only, by ignoring all the previous inputs is called
simple reflex agent. as
OF lechtoontetss
Fublicacions
WF Aland Ds -1 (MU) 1-25 Introduction to Al
Few possible input sequences and outputs for vacuum cleaner world with 2 locations are considered for
simplicity.
Table 1.10.1
BD
&S
A B Input sequence Output / action
ff) {location, content} Right, left, suck, no-op
B
al
In case of above mentioned vacuum agent only one sensor {s used and that is a dirt sensor. This dirt sensor can
detect if there is dirt or not. So the possible inputs are ‘dirt’ and ‘clean’.
v TochKnowledgs.
Publications
W Aland DS - 1 (MU) 1-26 Introduction to Ay
a eS SS 08080
Also the agent will have to maintain a database of actions, which will help to decide what output should He given
by an agent. Database will contain conditions like : If there is dirt on the floor to left or right then find out if there
is dirt in the next location and repeat these actions till the entire assigned area is cleaned then, vacuum cleaner
should suck that dirt. Else, dirt should move. Once the assigned area is fully covered, no other action should be
taken until further instruction.
If the vacuum cleaner agent keeps searching for dirt and clean area, then, it will surely get trapped in an infinite
loop. Infinite loops are unavoidable for simple reflex agents operating in partially observable savtrontients By
randomizing its actions the simple reflex agent can avoid these infinite loops. For example, on receiving {clean}
as input, the vacuum cleaner agent should either go to left or right direction.
If the performance of an agent is of the right kind then randomized behaviour can be considered as rational in
few multi-agent environments.
University Question
Q. Explain model based Reflex agent with block diagram. DEES)
Partially observable environment cannot be handled well by simple reflex agents because it does not keep track
on the previous state. So, one more type of agent was created that is model based reflex agent.
An agent which performs actions based on the current input and one previous input is called as model-based
agent. Partially observable environment can be handled well by model-based agent.
From Fig. 1.10.3 it can be seen that once the sensor takes input from the environment, agent checks
for the
current state of the environment. After that, it checks for the previous state which shows
how the world is
developing and how the environment is affected by the action which was taken by the
agent at earlier stage. This
is termed as model of the world.
f ~
i
input/percept
+) Previous state F-- | Sensors «—
Vv
=
_
L
‘01
Model-based Reflex Agents a
s
Je
Fig. 1.10.3 : Model-based reflex agents
Once this is verified, based on the condi
tion-action protocol an action is deci
effectors and the effectors give this outpu ded. This decision is given to
t to the envir onment,
The knowledge about “how the world
is chan ging” is called as a model
model while working is called as the “mod of the world. Agent which uses such
el-b ased agent”.
W Tech Knowledge
Pudtications
WF Aland DS - 1 (MU) 1-27 Introduction to Al
* Consider a simple example of automated car driver system. Here, the world keeps changing
all the time. You
must have taken a wrong turn while driving on some or the other day
of your life. Same thing applies for an
agent. Suppose if some car “Xx” js overtaking our automated driver agent “A”,
then speed and direction in which
"X" and “A” are moving their steering wheels is important. Take a scenario where
agent missed a sign board as it
was overtaking other car. The world around that agent
will be different in that case,
¢ Internal model based on the input history should be maintained
by model-based reflex agent, which can reflect
at least some of the unobserved aspects of the current state. Once this is
done it chooses an action in the same
way as the simple reflex agent.
f sey ,
input/percept
_ Previous state _ Sensors «
jUeWUOJJAUg
What is the effect
What will be the state if
of my action? isis some action A’ is performed ?
= __ What action
}) Should be taken next 2.
output / action
Effectors
Goal-based Agents
\ J
* Model-based agents are further developed based on the “goal” information. This new type
of agent is called as
goal-based agent. As the name suggests, Goal information will illustrate the situations that
is desired. These
agents are provided with goals along with the model of the world. All the actions selected by the agent
are with
reference of the specified goals. Goal based agents can only differentiate between goal states and non-goal states.
Hence, their performance can be 100% or zero.
e The limitation of goal based agent comes with its definition itself. Once the goal is fixed, all the actions are taken
to fulfil it. And the agent loses flexibility to change its actions according to the current situation.
¢ You can take example of a vacuum cleaning robot agent whose goal is to keep the house clean all the time.
This
agent will keep searching for dirt in house and will keep the house clean all the time. Remember M-O the
cleaning robot from Wall-E movie which keeps cleaning all the time no matter what is the environment or the
Healthcare companion robot Baymax from Big Hero 6 which does not deactivate until user says that he/she is
satisfied with care.
University Question
Q. _ Explain utility based agents with the help of neat diagram. MU - May 13, Dec. 19
W TechKnowledge
Pudlications
W Aland DS=1(MU) 1-28 Introduction to Ay
ry
Utility function Is used to map a state to a measure of utility of that state. We can define a measure for
determining how advantageous a particular state Is for an agent. To obtain this measure utility function can be
used,
The term utility Is used to depict how “happy” the agent Is to find out a generalized performance measure,
various world states according to exactly how happy they would make an agent is compared.
Take one example; you might have used Google maps to find out a route which can take you from source location
to your destination location In least possible time. Same logic is followed by utility based automatic car driving
agent.
Goals utility based automatic car driving agent can be used to reach given location safely within least possible
time and save fuel. So this car driving agent will check the possible routes and the traffic conditions on these
routes and will select the route which can take the car at destination in least possible time safely and without
consuming much fuel.
( ‘ Pm
> Input/percept
, Previous state... | Sensors ¢
Wiki Pecillo CoS el |
Why do you give mock tests ? When you get less marks for some
question, you come to know that you have
made some mistake in your answer. Then you learn the correct
answer and when you get that same question in
further examinations, you write the correct answer and avoid
the mistakes which were made in the mock test.
This same concept is followed by the learning agent.
Learning based agent is advantageous in many cases, becaus
e with its basic knowledge it can initially operate in
an unknown environment and then it can gain knowledge from the enviro
nment based on few parameters and
perform actions to give better results.
a Tock
Publicacions
WF Aland Ds - 1 (MU) 1-29 Introduction to Al
Si
Pertormance standard
vr ‘
—*, inpuvipercept | |’
Critle Sensors «<— a1
Meta) We
Feedback
changes
Performance
knowledge
Leaming
goals
[een arn} —— 4
Problem generator |
cunt ction
Learming Agents f J
_.
1. Critic : It is the one who compares sensor's input specifying effect of agent’s action
on the environment
with the performance standards and generate feedback for leaning element.
_ 2. Learning element : This component is responsible to learn from the differenc
e between performance
standards and the feedback from critic. According to the current percept it
is supposed to understand the
expected behavior and enhance its standards
3. Performance element : Based on the current percept received from sensors
and the input obtained by the
learning element, performance element is responsible to choose the
action to act upon the external
environment.
4. Problem generator : Based on the new goals learnt by learning agent,
problem generator suggests new or
alternate actions which will lead to new and instructive understanding,
TochKnowledge
Publications
Susi
SNESSLs
Introduction t
W Aland DS +1 (MU) 1-30 a eccrine
—
Ea
jp
e Fig. 1.11.1 shows few fields in which we have applications of artificial intelligence. There can be many fields
\ which Artificially Intelligent Systems can be used.
Automated = Voice
-. Business planning and :
| scheduling technology
|
i | Fig. 1.11.1: Fields of Al Application
1. Education
Training simulators can be built using artificial intelligence techniques. Software for pre-school children are
developed to enable learning with fun games. Automated grading, Interactive tutoring, instructional theory are
the current areas of application.
2. Entertainment
Many movies, games, robots are designed to play as a character. In games they can play as an opponent when
human player is not available or not desirable.
3. Medical
14 Al has applications in the field of cardiology (CRG), Neurology (MRI), Embryology (Sonography), complex operations
of internal organs, etc. It can be also used in organizing bed schedules, managing staff rotations, store and retrieve
!
information of patient Many expert systems are enabled to predict the decease and can provide with medical
prescriptions.
Military
>
Training simulators can be used in military applications. Also areas where human cannot reach or in life stacking
conditions, robots can be very well used to do the required jobs. When decisions have to be made quickly taking
into account an enormous amount of information, and when lives are at stake, artificial intelligence can provide
crucial assistance. From developing intricate flight plans to implementing complex supply systems or creating
training simulation exercises, Al is a natural partner in the modern military.
5. Business and Manufacturing
Latest generation of robots are equipped well with the performance advances, growing integration of vision and
an enlarging capability to transform manufacturing,
Ww Tech Knowledge
Publications
Ww Aland DS +1 (MU) 1-31 Introduction to Al
8. Heavy Industry
Huge machines involve risk in operating and maintaining them. Human robots are better replacing human
operators. These robots are safe and efficient. Robot are proven to be effective as compare to human in the jobs
of repetitive nature, human may fail due to lack of continuous attention or laziness.
Natural
language Robotics
- processing Veal
~potwonks
“Neural
| | Fuzzy ogis
Natural language processing : One of the application of Al is in field of Natural Language Processing (NLP).
NLP enables interaction between computers and human (natural) language. Practical applications of NLP are in
machine translation (e.g. Lunar System), information retrieval, text categorization, etc. Few more applications
are extracting 3D information using vision, speech recognition, perception, image formation.
Robotics : One more major application of Al is in Robotics. Robot is an active agent whose environment is the
physical world. Robots can be used in manufacturing and handling material, in medical field, in military, etc. for
automating the manual work. ,
Neural networks : Another application of Al is using Neural Networks. Neural Network is a system that works
like a human brain/nervous system. It can be useful for stock market analysis, in character recognition, in image
compression, in security, face recognition, handwriting recognition, Optical Character Recognition (OCR), etc.
Fuzzy logic : Apart from these Al systems are developed with the help of Fuzzy Logic. Fuzzy Logic can be useful
in making approximations rather than having a fixed and exact reasoning for a problem. You must have seen
systems like AC, fridge, washing machines which are based on fuzzy logic (they call it “6 sense technology!”).
Artificial Intelligence has touched each and every aspect of our life. From washing machine, Air conditioners, to
smart phones everywhere Al is serving to ease our life, In industry, Al is doing marvellous work as well. Robots are
doing the sound work in factories. Driverless cars have become a reality. WiFi-enabled Barbie uses speech-
recognition to talk and listen to children, Companies are using Al to improve their product and increase sales, Al saw
significant advances in machine learning, Following are the areas in which Al is showing significant advancements.
ser TechKnowledga
Publications
CC
for him.
7. Ethical Al
TechKnowledgs
Publications
BF Aland DS- 1 (MU) 1-33 Introduction to Al
Review Questions
Q.1 Explain various techniques for solving problems by
searching.
Q.2 What are the various Al techniques ?
O00
SEARCH TECHNIQUES
is (
Lkpee
Syllabus
Introduction
and select the most
* Search is an indivisible part of intelligence. An intelligent agent is the one who can search
game like chess,
appropriate action in the given situation, among the available set of actions. When we play any
move, but the intelligent one who
cards, tic-tac-toe, etc; we know that we have multiple options for next
|
searches for the correct move will definitely win the game.
any expert system; all they required to do is
e Incase of travelling salesman problem, medical diagnosis system or
minimum cost and efforts.
| to carry out search which will produce the optimal path, the shortest path with
2. Optimality : If the solution produced is the minimum cost solution, the algorithm is said to be optimal.
3. Time complexity : It depends on the time taken to generate the solution. It is the number of nodes generated
during the search.
4, Space complexity : Memory required to store the generated nodes while performing the search.
Complexity of algorithms is expressed in terms of three quantities as follows :
search
4. b: Called as branching factor representing maximum number of successors a node can have in the
tree.
7/21,4
5 f=) Parent node
8/3]1
Node
State =)
University Questions
Q. — Compare different uniformed search strategies.
Q, Write short note on Uniform.
MU - Dec: 14
¢ The term “uninformed” means they have only information about
what is the start state and the end state along
with the problem definition,
® — These techniques can generate successor states and can distingu
ish a goal state froma non-goal state.
¢ — All these search techniques are distinguished by the order in which nodes
are expanded.
¢ The uninformed search techniques also called as “blind search”.
WF TechKnewledya
Publications
SE
=e Ala nd Ds- 1(MU) 2-3 Search Technigy
2.4.1 Concept
:
In depth-
©Pth-first search, the search tree is expanded depth wise; Le. the deepest node :
Sear ch tree 8s expanded. As the leaf node is reached, the se arch backtracks to previousin the
node.current
The prbranch
ogres. ofoftt ,
arch is illustrated
in Fig. 2.4.1, :
ine explored nodes are shown in light gray. Explored nodes with no descendants in the fringe are removeg fron
"ory: Nodes at depth three have no successors and Mis the only goal node.
Process
Q (A)
+®
a) oO) ©&
HY) ©
HY) © ©
OY &
Fig. 2.4.1 : Working of Depth first search on a binary tree
2.4.2. Implementation
e DFS uses a LIFO fringe i.e. stack. The most recently generated node, which is on the top in the fringe, is chosen
first for expansion. As the node is expanded, it is dropped from the fringe and its successors are added.
e So when there are no more successors to add to the fringe, the search “back tracks” to the next deepest node that
is still unexplored. DFS can be implemented in two ways, recursive and non-recursive. Following is the algorithm
for the same.
2.4.3 Algorithm
a) Non recursive implementation of DFS
4. Push the root node ona stack
2. while (stack is not empty)
a) popanode from the stack;
WF Aland DS-1 (MU) 2-4 Search Techniques
(i) ifnodeisa goal node then return
success;
(ii) push all children of node onto the stack;
3. return failure
(b) Recursive implementation of DFS
DFS(c) :
1. Ifnode is a goal, return success;
2. for each child c of node
a) if DFS(c) is successful,
(i) return success
3: return failure;
/™.
M6
Ns
—H
Soln. :
® A)
© (8)
(0)
Fig. P. 2.4.1
Ww TechKnowledge
sulicecion:
a h .
uw
2.5 Breadth First Search (BFS)
Wace
Q, Explain breadth first algorithm.
CEE
2.5.1 Concept
° Asthe name suggests, in breadth-first search technique, the tree is expanded breadth wise.
* — The root node is expanded first, then all the successors of the root node are expanded, then their successors, and
so on.
In turn, all the nodes at a particular depth in the search tree are expanded first and then the search will proceed
for the next level node expansion.
Thus, the shallowest unexpanded node will be chosen for expansion. The search process of BFS is illustrated in
Fig. 2.5.1.
2.5.2. Process
=>0 © © 0
o oO
=O 0 =O =O
2.5.3 Implementation
¢ In BFS we use a FIFO queue for the fringe. Because of which the newly inserted nodes
in the fringe will
automatically be placed after their parents.
Thus, the children nodes, which are deeper than their parents, go to the back of the
queue, and old nodes, which
are shallower, get expanded first. Following is the algorithm for the same.
2.5.4 Algorithm
1. Put the root node ona queue
2. while (queue is not empty)
(a) remove a node from the queue
(i) if (mode is a goal node) return success;
(ii) putall children of node onto the queue;
3. return failure;
Ww Tech Knowledge
Publications
=—
2-6 Search Techniques
F Al and DS-1 (MU)
26 ooo
Uniform Cost Search (UCS)
2.6.1 Concept
¢ Uniform cost search is a breadth first search with all paths having same cost. To make it work in real time
conditions we can have a simple extension to the basic implementation of BFS. This results in an algorithm that
is optimal with any path cost.
e In BFS as we always expand the shallowest node first; but in uniform cost search, instead of expanding the
shallowest node, the node with the lowest path cost will be expanded first. The implementation details are as
follow.
2.6.2 Implementation
e¢ Uniform cost search can be achieved by implementing the fringe as a priority queue ordered by path cost. The
algorithm shown below is almost same as BFS; except for the use of a priority queue and the addition of an extra
check in case a shorter path to any node is discovered.
e The algorithm takes care of nodes which are inserted in the fringe for exploration, by using a data structure
having priority queue and hash table,
e The priority queue used here contains total cost from root to the node. Uniform cost search gives the minimum
path cost the maximum priority. The algorithm using this priority queue is the following.
2.6.3 Algorithm
e Insert the root node into the queue.
e While the queue is not empty :
(i) Dequeue the maximum priority node from the queue.
(If priorities are same, alphabetically smaller node is chosen)
(ii) Ifthe node is the goal node, print the path and exit.
Else
eee
e —_Insertall the children of the dequeued node, with their total costs as priority.
® The algorithm returns the best cost path which is encountered first and will never go for other possible paths.
The solution path is optimal in terms of cost.
¢ As the priority queue is maintained on the basis of the total path cost of node, the algorithm never expands a
node which has a cost greater than the cost of the shortest path in the tree.
eer
e The nodes in the priority queue have almost the same costs at a given time, and thus the name “Uniform Cost
Search”.
2.6.4 Performance Evaluation
* Completeness : Completeness is guaranteed provided the cost of every step exceeds some small positive
constant.
¢ Optimality : It produces optimal solution as nodes are expanded in order of their path cost.
* Time complexity : Uniform-cost search considers path costs rather than depths; so its complexity is does not
merely depends on b and d. Hence we consider C* be the cost of the optimal solution, and assume that every
), which can be
action costs at least €. Then the algorithm's worst-case time and space complexity is O(b °/ €
much greater than bd.
* Space complexity : O(b“/€), indicating number of node in memory at execution time.
—_
W TechKnowledge
Puolicatians
WF Aland ps. amu 2-7 Search Techniques——
2.7 Depth Limited Search (DLS)
2.7.1 Concept
* In order to avoid the infinite loop condition arising in DFS, in depth limited search
technique, depth-first search
is carried out with a predetermined depth limit.
* — The nodes with the specified depth limit are treated as If they don’t have any successors. The
depth limit solves
the infinite-path problem.
* Butas the search is carried out only till certain depth in the search tree, it introduces problem of incomplete
ness.
* Depth-first search can be viewed as a special case of depth-limited search with depth limit equal
to the depth of
the tree. The process of DLS is depicted in Fig, 2.7.1.
2.7.2. Process
If depth limit is fixed to 2, DLS carries out depth first search till second level in
the search tree.
Q O A a
65 Gd @€d G-%
oO | *
L
s
e Asin case of DFS in DLS we can use the same fringe implemented
as queue.
¢ Additionally the level of each node needs to be calculated to check
whether it is within the specified depth limit.
Depth-limited search can terminate with two conditions :
1. Ifthe solution is found.
2.7.4 Algorithm
Ifnot: Do nothing
[Fyes ; return
* Check if the current node is within the specified search depth
If not; Do nothing
If yes : Expand the node and save all of its successors in a stack.
* Call DLS recursively for all nodes of the stack and go back to Step 2.
Ww TechKnowledge
Publications
RF Aland DS-i (MU) Search Techniques
= 2-8
{
if (depth > limit) return failure;
if (node is a goal node) return success;
for each child of node
{
if (DLS(child, limit, depth + 1))
return success;
}
return failure;
}
2.7.6 Performance Evaluation
e Completeness : Its incomplete if shallowest goal is beyond the depth limit.
e Optimality : Non optimal, as the depth chosen can be greater than d.
¢ Time complexity : Same as DFS, 0 (b'), where | is the specified depth limit.
e Space complexity: Same as DFS, 0(b'), where | is the specified depth limit.
2.8.1 Concept
e Iterative deepening depth first search is a combination of BFS and DFS. In [DDFS search happens depth wise but,
at a time the depth limit will be incremented by one. Hence iteratively it deepens down in the search tree.
¢ It eventually turns out to be the breadth-first search as it explores a complete layer of new nodes at each
iteration before going on to the next layer.
* It does this by gradually increasing the depth limit-first 0, then 1, then 2, and so on-until a goal is found; and thus
guarantees the optimal solution. Iterative deepening combines the benefits of depth-first and breadth-first
search. The search process is depicted in Fig. 2.8.1.
Limit= 0 (A)
—~
Limit=2 (A) SAX
ee oe
(6) © © ©
oO) ©
(A) (A)
6) ‘C)
() @
Fig. 2.8.1 ; Search process in IDDFS (contd...)
WP leciknatadya
Publications
=
or a on ( DN (3) (Q) ee @ @
om (K)
WH ©
e Fig. 2.8.1 shows four iterations of on a binary search tree, where the solution is found on the fourth iteration.
2.8.2 Process
id TechKrowledge
Pubticarians
EF Aland DS-1 (MU) »10 search Techniques
2.8.3 Implementation
inbas exactly the same implementation as that of DLS. Additionally, iterations are required to increment the
depth limit by one in every recursive call of DLS.
2.8.4 Algorithm
e Initialize depth limit to zero.
e Repeat Until the goal node is found,
IDDFSQ)
{
limit = 0;
found = false;
while (not found)
{
found = DLS(root, limit, 0);
Jimit = limit+ 1;
i
}
2.8.6 Performance Evaluation
¢ Optimality : [tis optimal when the path cost is a non-decreasing function of the depth of the node.
¢ Time complexity:
o Doyou think in IDDFS there is a lot of wastage of time and memory in regenerating the same set of nodes
again and again.
° It may appear to be waste of memory and time, but it's not so. The reason is that, in a search tree with
almost same branching factor at each level, most of the nodes are in the bottom level which are explores
very few times as compared to those on upper level.
© The nodes on the bottom level that is level ‘d’ are generated only once, those on the next to bottom level are
generated twice, and so on, up to the children of the root, which are generated d times. Hence the time
complexity is O(b‘).
* Space complexity : Memory requirements of IDDFS are modest, i.e. O(b*).
Note: As the performance evaluation Is quite satisfactory on all the four parameters, IDDFS Is the preferred uninformed
search method when the search space Is large and the depth of the solution Is not known.
Ee TechKnowledye
> Publications
2-11 Search Techniques
¥ Al and DS-1 (MU) =
2.9.1 Concept
called
In bidirectional search, two simultaneous searches are run. One search starts from the initial state,
forward search and the other starts from the goal state, called backward search. The search process terminates when
in bidirectiona|
the searches meet at a common node of the search tree. Fig. 2.9.1 shows the general search process
search.
2.9.2 Process
2.9.3 Implementation
e _In Bidirectional search instead of checking for goal node, one need to check whether the fringes of the two
searches intersect; as they do, a solution has been found.
e When each node is generated or selected for expansion, the check can be done. It can be implemented with a
hash table, to guarantee constant time.
e For example, consider a problem which has solution at depth d = 6. If we run breadth first search in each
direction, then in the worst case the two searches meet when they have generated all of the nodes at depth 3.
Ifb=10.
e _ This requires a total of 2,220 node generations, as compared with 1,111,110 for a standard breadth-first search.
e Completeness: Yes, if branching factor b is finite and both directions use breadth first search.
e Optimality : Yes, if all costs are identical and both directions use breadth first search.
e Time complexity ; Time complexity of bidirectional search using breadth-first searches in both directions is
O(b*/2).
e Space complexity : As at least one of the two fringes need to kept in memory to check for the common node, the
space complexity is O(b‘/*).
Ww Tech Knowledge
Pubticactans
°
¥ Aland DS-I (MU) 2-12 Search Techniques
ST
e m:Maximum depth of the search tree
e |; Depth limit
TE TT
Table 2.10.1 : Comparison of tree-search strategies basis on performance Evaluation
TIS
Completeness Yes Yes No No Yes Yes
No No Yes Yes
TE
Optimality Yes Yes
O(b“*/£) O(b") | O(b) | O(b4 O(b4/2)
Time Complexity | O(b‘)
O(b “/€) O(b™) | O(b') | O(b‘) O(b472)
Space Complexity | O(b*)
SE
WF TechKnowledga
Publications
=
iF Al and Ds. (MU)
2-13
ii
ese Oe a
Search Techniques
—e
BFS
DFS
3.
Breadth First’ Search js implemented using | Depth
-——_}_eue which is FIFO list, First Search is pleemen
impl menteg
ted
using Stack which is LIFO list.
4. Thisis isj a Single step algorithm, wherein
the visited | This is two step algorithm.
nar In first stage, the Visiteq
are removed from the queue and then | vertices are
pushed onto the stack and later on
isplayed at once,
when there is no vertex further to visit thos
aes e are
popped out.
S.
> _| BFS requires more memory comp
are to DFS. DFS require less memory compare to BFS.
6. Applications of BFS:
Applications of DFS:
* — To find Shortest path
e Useful in Cycle detection
e Single Source & All pairs shortest
paths e InConnectivity testing
* In Spanning tree
e Finding a path between V and W in the graph.
* In Connectivity
¢ Useful in finding spanning trees & forest.
7. BFS always provides the shallowest
path solution. DFS does not guarantee the shallowest path
" solution.
8. No backtracking is required in BFS.
Backtracking is implemented in DFS.
9. BFS is optimal and complete if bran
ching factor is | DFS is neither complete
finite. nor optimal even in case of
finite branching factor.
3 th
10. BFS can never get trapped into infinite
loops. DFS generally gets trapped into infinite
loops, as
search trees are dense.
11. Example:
Example :
A
A
/\
/\
B C
B Cc
/ /N /
D E
/\
F
DB E F
A, B, C, D, E, F
A,B, D, C, E, F
2.12
—_—_—
Heuristic Function
University Quéstions
Q. Explain heuristic function with example,
a
Qa What is heuristics function 2 How wil you find suitable heuristic function ? Give suitable example. [IUEEEESEI
Q, Explain heuristic function with example,
TEAC aks
Q Define heuristic function. Give an example heuristics function
for block world problem. Drea
A heuristic function is an evaluation function, to
which the search state is given as input and it
tangible representation generates the
of the state as output.
It maps the problem state descri
ption to measures of desirability, usually represented as number weights. The
value ofa heuristic function at a given node in the search process gives a good estimate of that node being on the
desired path to solution.
It evaluates individual problem state and determines how much promising the state is. Heuristic functions are
the most common way of imparting additional knowledge of the problem states to the search algorithm.
Fig. 2.12.1 shows the general representation of heuristic function.
wah, ;
ee
far can be a simple
heuristic function.
Heuristic function can be of two types depending on the problem domain. It can be a Maximiza
tion Function or
Minimization function of the path cost.
In maximization types of heuristic, greater the cost of the node, better is the node while;
in case of minimization
heuristic, lower is the cost, better is the node. There are heuristics of every general
applicability as well as
domain specific. The search strategies are general purpose heuristics.
It is believed that in general, a heuristic will always lead to faster and better solution,
even though there is no
guarantee that it will never lead in the wrong direction in the search tree.
Design of heuristic plays a vital role in performance of search.
As the purpose of a heuristic function is to guide the search process in the most
profitable path among all that
are available; a well designed heuristic functions can provides a fairly good estimate
of whether a path is good or
bad.
However in many problems, the cost of computing the value of a heuristic
function would be more than the
effort saved in the search process. Hence generally there is a trade-off between the
cost of evalu ating a heuristic
function and the savings in search that the function provides.
So, are you ready to think of your own heuristic function definitions? Here is the word of
caution. See how the
function definition impact.
Following are the examples demonstrate how design of heuristi
c function completely alters the scenario of
searching process.
Ww TechKnowledga
Publications
18
7 5 4 1 2
5 6 3 4 5
8 3 1 6 7 8
Start state Goal State
Fig. 2.12.2 ; A scenario of 8-puzzle problem
Two simple heuristic functions
are:
© h, =the number of misplaced tiles. This is also known
as the Hamming Distance. In the Fig. 2.12.2 example,
the start state has h, = 8. Clearly, h, is an acceptable heurist
ic because any tile that is out of place will have
to be moved at least once, quite logical.
Isn’t it?
© hy = the sum of the distances of the tiles from their
goal positions. Because tiles cannot be moved
diagonally, the distance counted is the sum of horizo
ntal and vertical distances. This is also known
as the
Manha ttan Distance. In the Fig. 3.14.2, the start state hash,
=3+1+2+2+2+3+3+2=18. Clearly, h, is
also an admissible heuristic because any
move can, at best, move one tile one step closer to the goal.
As expected, neither heuristic overestimates the true number
of moves required to solve the puzzle, which is
26 (h, + hz). Additionally, it is easy to see from the definit
ions of the heuristic functions that for any given state,
h, will always be greater than or equal to h,. Thus, we can
say that h, dominates hy.
2.12.2 Example of Block World Problem
Q.__Find the heuristics value for a particular state of the blocks world problem.
° Fig. 2.12.3 depicts a block problem world, where the A, B, C.D Stan |A Goal D
letter bricks are piled up on one another and required to be D a
arranged as shown in goal state, by moving one brick ata time. — _
e As shown, the goal state with the particular arrangement of “ 8
blocks need to be attain from the given start state. Now it’s 'B A
time to scratch your head and define a heuristic function that
will distinguish start state from goal state. Confused??
e Let's design a function which assigns + 1 for the brick at right sa a" a
position and - 1 for the one which is at wrong position. a
Consider Fig. 2.12.4 a|
Blocks world
Fig. 2.12.4 : Definition of Heuristic Function “hy”
wW TechKnomledga
Puptications
WF Aland DS-1 (MU) 2-16 Search Techniques
Local heuristic
» +41 foreach block that is resting on the thing it is suppo
sed
to be resting on. 1 for each block that is Testing on a
wrong thing.
1A] 0
Fig. 2.12.5 shows the heuristic values generated 0
by 'D|
heuristic function “h," for various different states in
the 0
state space.
B felfaL [el fal fel
Please observe that, this heuristic is generating same value
for different states. Fig. 2.12.5 : State evaluations using
Due to this kind of heuristic the search may Heuristic function “h,”
end up in
limitless iterations as the state showing most promising D
heuristic value may not hold true or search may end up in san “ -
finding an undesirable goal state as the state evaluation D c
may lead to wrong direction in the search tree. CG B
Let's have another heuristic design for the same problem. B A
Fig. 2.12.6 is depicting a new heuristic function “h”
definition, in which the correct support structure of each Bivoke ware
brick is given +1 for each brick in the support structure. Fig. 2.12.6: Definition of heuristic
And the one not having correct support structure, -1 for function “h,"
each brick in the wrong support structure.
This example makes it clear that, the design of heuristic plays a vital role in search process, as the whole search
is carried out by considering the heuristic values as basis for selecting the next state to be explored.
The state having the most promising value to reach to the goal state will be the first prior candidate for
exploration, this continues till we find the goal state.
Global heuristic
For each block that has the correct support structure : + 1 to every block in the support structure.
For each block that has the wrong support structure : - 1 to every block in the support structure.
W TechKnowledga
Publications
4 Al and DS] (MU) 2-17 Search Techniques
=
The values should be a logical indicator of the profitability of the state in order to reach the goal state.
It may not guarantee to find the best solution, but almost always should find a very good solution.
It should reduce the search time; specifically for hard problems like travelling salesman problem where the time
>
required is exponential.
The main objective of a heuristic is to produce a solution in a reasonable time frame that is good enough for
solving the problem, as it’s an extra task added to the basic search process.
The solution produced by using heuristic may not be the best of all the actual solutions to this problem, or it may
simply approximate the exact solution. But it is still valuable because finding the solution does not require a
prohibitively long time. So we are investing some amount of time in generating heuristic values for each State in
search space but reducing the total time involved in actual searching process.
Do we require to design heuristic for every problem in real world? There is a trade-off criterion for deciding
whether to use a heuristic for solving a given problem. It is as follows.
© Optimality : Does the problem require to find the optimal solution, if there exist multiple solutions for the
same?
° Completeness : In case of multiple existing solution of a problem, is there a need to find all of them? As
many heuristics are only meant to find one solution.
° Accuracy and precision : Can the heuristic guarantee to find the solution within the precision limits? Is the
error bar on the solution unreasonably large?
oO Execution time : Is it going to affect the time required to find the solution? Some heuristics converge faster
than others. Whereas, some are only marginally quicker than classic methods.
In many AI problems, it is often hard to measure precisely the goodness of a particular solution. But still it is
important to keep performance question in mind while designing algorithm.
For real world problems, it is often useful to introduce heuristics based on relatively unstructured knowledge. It
is impossible to define this knowledge in such a way that mathematical analysis can be performed.
2.13.1 Concept
University Question
It should generate a unique value for each unique state in search space.
wn —
required is exponential.
The main objective of a heuristic is to produce a solution in a reasonable time frame that is good enough for
solving the problem, as it’s an extra task added to the basic search process.
The solution produced by using heuristic may not be the best of all the actual solutions to this problem, or it may
not require a
simply approximate the exact solution. But it is still valuable because finding the solution does
prohibitively long time. So we are investing some amount of time in generating heuristic values for each state in
search space but reducing the total time involved in actual searching process.
Do we require to design heuristic for every problem in real world? There is a trade-off criterion for deciding
whether to use a heuristic for solving a given problem. It is as follows.
© Optimality : Does the problem require to find the optimal solution, if there exist multiple solutions for the
same?
° Completeness : In case of multiple existing solution of a problem, is there a need to find all of them? As
many heuristics are only meant to find one solution.
¥
° Accuracy and precision : Can the heuristic guarantee to find the solution within the precision limits? Is the
y error bar on the solution unreasonably large?
i
*
o Execution time : Is it going to affect the time required to find the solution? Some heuristics converge faster
t
than others. Whereas, some are only marginally quicker than classic methods.
In many Al problems, it is often hard to measure precisely the goodness of a particular solution. But still it is
important to keep performance question in mind while designing algorithm.
For real world problems, it is often useful to introduce heuristics based on relatively unstructured knowledge. It
is impossible to define this knowledge in such a way that mathematical analysis can be performed.
2.13.1 Concept
University Question
Q. Explain search strategy to overcome drawbacks of BFS and DFS. | MU - Dec: 10]
In depth first search all competing branches are not getting expanded. And breadth first search never gets
trapped on dead end paths. If we combine these properties of both DFS and BFS, it would be “follow a single path
at a time, but switch paths whenever some competing path look more promising than the current one”. This is
what the Best First search is..!!
Best-first search is a search algorithm which explores the search tree by expanding the most promising node
chosen according to the heuristic value of nodes. Judea Pearl described best-first search as estimating the
promise of node n by a “heuristic evaluation function f(n) which, in general, may depend on the description of 9,
the description of the goal, the information gathered by the search up to that point, and most important, on any
extra knowledge about the problem domain”.
WwW TechKnowledge
Pupicaciar>
¥F Aland DS-1 (MU) 06 Search Techniques
e Efficient selecction of the current best candidate for extension is typically implemented using a priority queue.
_Efficien
‘ig. 2.12.13.1 de picts the search process of Best first search on an example search tree. The values noted below the
Fig.
nodes are the estimated heuristic values
of nodes
2.13.2 Implementation
e Best first search uses two lists in order to record the path. These are namely OPEN list and CLOSED list for
implementation purpose.
e OPEN list stores nodes that have been generated, but have not examined. This is organized as a priority queue, in
which nodes are stored with the increasing order of their heuristic value, assuming we are implementing
maximization heuristic. It provides efficient selection of the current best candidate for extension.
e CLOSED list stores nodes that have already been examined. This CLOSED list contains all nodes that have been
evaluated and will not be looked at again. Whenever a new node is generated, check whether it has been
generated before. If it is already visited before, check its recorded value and change the parent if this new value
is better than previous one. This will avoid any node being evaluated twice, and will never get stuck into an
infinite loops.
a. If itis notin CLOSED and it is not in OPEN: evaluate it, add it to OPEN, and record its parent.
b. Otherwise, if it is already present in OPEN with different parent node and this new path is better than
previous one, change its recorded parent.
i. Ifitis notin OPEN add it to OPEN.
W Tech Knowledge
Publications
XF Alandps- (MU) 2-19 Search Techniques
oe:
ii. Otherwise, adjust its priority in OPEN using this new evaluation.
done
This algorithm of Best First Search algorithm just terminates when no path is found. An actual implementation
would of course require special handling of this
case.
2.13.4 Performance Measures for Best First Search
Completeness : Not complete, may follow infinite path if heuristic rates each state on such a path as the best
option. Most reasonable heuristics will not cause this problem however.
Optimality : Not optimal; may not produce optimal solution always.
Time Complexity : Worst case time complexity is still O(bm) where m is the maximum depth.
Space Complexity : Since must maintaina queue of all unexpanded states, space-complexity is also O(bm).
ray TechKnowledge
Publticarions
WF_ Aland DS-1 (MU) 2-20 Search Techniques
2.13.6 Properties of Greedy Best-first
Search
1. Completeness : It's not comple
. te as, it can get stuck in loops, also is susceptible to wrong start and quality of
heuristic function.
University Questions
Explain A* Algorithm. What is the drawback of At ? Also shows that A‘
is optimally efficient. TUBENaE!
Q. Describe A* algorithm with merits and demerits.
MU - Dec. 13
Q. —_ Explain A’ algorithm with example.
MU - May 14, Dec. 14
a. Explain A* search with example.
Tee me
2.14.1 Concept
¢ In A* search, the value of a node n, represented as f({n) is a combination of g(n), which is the cost of
cheapest
path to reach to the node from the root node, and h(n), which is the cost of cheapest path to reach from the
node
to the goal node. Hence f(n) = g(n) + h(n).
Tema
* As the heuristic can provide only the estimated cost from the node to the goal we can represent h(n) as h*(n);
similarly g*(n) can represent approximation of g(n) which is the distance from the root node observed by A* and
the algorithm A* will have,
* Areasonable thing to try first is the node with the lowest value of g*(n) + h*(n). It turns out that this strategy is
more than just reasonable, provided that the heuristic function h*(n) satisfies certain conditions which are
discussed further in the chapter. A* search is both complete and optimal.
2.14.2 Implementation
ee
Ww Tech Knowledge.
tH ee STEM
Publications
ee
RF Aland ps1 (Mu) 2-21 Search Technique,
a)
i. Initialization OPEN list with initial node; CLOSED= 6; 2 = 0,f=h, Found = false;
2. While (OPEN # 6 and Found = false )
t
1, Remove the node with the lowest value of f from OPEN to CLOSED and call it as a Best_Node.
ii If Best_Node = Goal state then Found = true
-
in, else
{?
a. Call the matched node as OLD and add it in the list of Best_Node successors..
b. Ignore the Succ node and change the parent of OLD, if required.
- If (Succ) < g(OLD) then make parent of OLD to beBest_Node and change the values of g and f for OLD
}
a. If Succe CLOSED then /* already processed */
6
i. Call the matched node as OLD and add it in the list of Best_Node successors.
ii. Ignore the Succ node and change the parent of OLD, if required
- If g(Succ) < g(OLD) then make parent of OLD to be Best_Node and change the values of g and f for OLD.
Ww TechKnewledge
Puotications
¥F Aland DS-I (MU) 6:08 Search Techniques
}8 /* for loop*/
}2 /* else if */
As stated already the success of At totally depends upon the design of heuristic function and how well it is able
to evaluate each node by estimating its distance from the goal node. Let us understand the effect of heuristic function
on the execution of the algorithm and how the optimality gets affected
by it.
A. Underestimation
e — If we can guarantee that heuristic function ‘h’ never over estimates actual value from current to goal thatis,
the value generated by h is the always lesser than the actual cost or actual number of hopes required to
reach to the goal state. In this case, A* algorithm is guaranteed to find an optimal path to a goal, if one exists.
e Example:
f = g+h,Hereh is underestimated.
Underestimated
ete ee
3 moves away from goal
(2+3)E
Soren
3 moves away from goal
(3+3)F
Fig. 2.14.1
° _[fwe consider cost of all arcs to be 1. A is expanded to B, C and D. ‘f values for each node is computed. B is
chosen to be expanded to E. We notice that f(E) = f(C) = 5. Suppose we resolve in favor of E, the path
currently we are expanding. E is expanded to F, Expansion of a node F is stopped as {(F) = 6 so we will now
expand node C.
° Hence by underestimating h(B), we have wasted some effort but eventually discovered that B was farther
away than we thought. Then we go back and try another path, and will find optimal path.
B. Overestimation
® Here h is overestimated that is, the value generated for each node is greater than the actual number of steps
node.
required to reach to the goal
=r Tech Knowledge
Publications
Search Techniques
tas
e
RF Al and DS-I (MU) 2-23 —————
cl
e Example
ian
Overestimated
\
at
st a
a“
(2+2) E
/
(3+1)F
/
|
(440) G
Fig. 2.14.2
d to E, E to F and F to G fora
e As shown in the example, A is expanded to B, C and D. Now B is expande
D to G with a solution giving a
solution path of length 4. Consider a scenario when there a direct path from
path of length 2.This path will never be found because of overestimating h(D).
overestimating h, one
* Thus, some other worse solution might be found without ever expanding D. So by
cannot guarantee to find the cheaper path solution.
2.14.5 Admissibility of A*
University Questions
Q@. What do you mean by admissible heuristic function? Explain with example.
Q. Write short note on admissibility of A’.
e Asearch algorithm is admissible, if for any graph, it always terminates in an optimal path from initial state to goal
state, if path exists. A heuristic is admissible if it never over estimates the actual cost from current state to goal
state. Alternatively, we can say that A* always terminates with the optimal path in case h(n) is an admissible
heuristic function.
e Aheuristic h(n) is admissible if for every node n, if h(n) s h*(n), where h*(n) is the true cost to reach the goal
state from n. An admissible heuristic never overestimates the cost to reach the goal. Admissible heuristics are by
nature optimistic because they think the cost of solving the problem is less than it actually is.
e An obvious example of an admissible heuristic is the straight line distance. Straight line distance is admissible
because the shortest path between any two points is a straight line, so the straight line cannot over estimate the
actual road distance.
o Theorem: [fh(n) is admissible, tree search using A* is optimal.
o —_- Proof: Optimality of A*with admissible heuristic.
e Suppose some suboptimal goal G2 has been generated and is in the fringe. Let n be an unexpanded node in the
fringe such that n is on a shortest path to an optimal goal G.
Start
we Ne
80
G,
Fig. 2.14.3 : Optimality of A*
f(G2) = g(G2)
since h (G2) = 0
g(G2) > g(G)
since G2 is suboptimal
f(G) = g(G)
since h (G) =0
f(G2) > f(G)
from above
h(n) s_ h*(n)
since h ls admissible
g(n)+h(n) s g (n) +h*(n)
f(n) < F(G)
Hence f (G2) > f (n), and A* will never select G2 for expan
sion
2.14.6 Monotonicity
University Question
2.14.7 Properties of A*
3. 7-6 5 3 6
5 1 2 70 e2
40 8 4 1 8
Ww Tock Knowledge
Publications
= Search Techniques
SF Al and DS-] (MIN) 2-25 a
Start state
; ne f=0+4
| 37 6
5 12
408
up left right
(143) (145) } (145)
37 6 37 6) 376
50 2 512 5 12
418 O48 460
up left right
(243) (243) (2+4)
306 376 37 6
572 O52 5 20
41.8 418 418
left right
(3+2) | (3+4)
O36 360
572 572
418 ‘418
down
(441) 4) right
5 36 5 36 ai
|i O72 70 2] 20
14 418 4-481; =
F(X) = g()+h(X)
h(X) = the number of tiles not in their goal position in a given state X
g(X) = depth of node Xin the search tree
Ex. 2.14.1: Consider the graph given in Fig. 1 below. Assume that the initial state is S_ and the goal state is 7. Find a path
from the initial state to the goal state using A* Search. Also report the solution cost. The straight line distance
heuristic estimates for the nodes are as follows: h(1) = 14, h(2) = 10. h(3) = 8, h(4) = 12, h(5) = 10, h(6) = 10,
h(S) = 15.
Ww Tech Knowledge
Punlicacions
LE t—t™
(s)15 (s)15
3 4
3 ‘
14 (1) (4) 12
14 (1) (4) #2
ZW Tech Knowledge
Putlicatians
= Al and DS-1 (Mu)
[ Closed
: S(15), 4(12 + 4), 5(10 + 6), 1(14 +3), 2(10 4 7), 5(8 +4), 6(10 + 10)
2.14.9 Caparison among Best
First Search, A* search and Greedy Best First
Search
OLIN LESION Questi
on
Q. Compare following informed Searching algorithms based on perfo
rmance measure with justification : Complete,
Optimal, Time complexity and
space complexity.
(a) Greedy best first, (b) A*
(c) recursive best-first(RBFS)
e = The hreshold
threshold : iteration
isis initialized to the heuristic estimate of the initial state, It is increased in each successive
to the total cost of the lowest cost
node that w as pruned during the previous iteratio n. The algorithm terminates
when a goal state is reached
whose total cost¢ Joes not exceed the current threshold.
Algorithm
Nodecurrent node
g the cost to reach current node
bound := t
end loop
end procedure
functionsearch(node, g, bound)
f:=g +h(node)
iff > bound then return f
ifis goal(node) then return FOUND
min ;= 0
forsuccinsuccessors(node) do
W TechKnowledge
Publications
LD RE a
Search Techniques
XE Aland DS-1 (MU)
Performance evalua —
tion
1 Optimality : IDA* finds optimal solution, if the heuristic function is admissible.
limit. ay
2 finds solution if it exists within the th reshold
Completeness : IDA* js complete, It always
on ential time complexity.
3.
Time Complexity : IDA* expand s the same number of nodes, as A*. So iti has exp
is linear
searches. Hence, iits
ts memory requirement
Space Complexity : IDA* performs a series of depth-first
4
with respect to the maximum search depth.
From the above discussion it is clear that IDA* is optimal in time and space as compared to all other heuristic
search algorithms that find optimal solutions on a tree. As in case of A*, in IDA*, we need not manage OPEN and
CLOSED list hence, it often runs faster than A* and the implementation is much simpler as compared to A*.
; *
AS IDA* does not retain any path history, but the only thing kept back in iterations is the threshold eit value;
it is bound to repeat the same expansions through various iterations. Hence there cannot be an optimize use of
memory. Let's see how SMA* handles this memory management issue.
It has following properties :
1. SMA* uses all the available memory efficiently.
2. SMA* doesn’t generate states repeatedly.
3. SMA* is complete, if the available memory is sufficient to store the shallowest solution path.
4. SMA* is optimal if the available memory is sufficient to store the shallowest optimal solution path.
Otherwise it generates the best solution possible with the available amount of memory.
5. SMA* is optimally efficient when enough memory is available for the entire tree search.
The basic procedure of SMA* is just like A*. It expands the best leaf until memory is full. When there is no
memory left to add newly generated node, it needs to drop one of the early expanded old node. SMA* always
drops the leaf node with the highest f -cost. The dropped node is called as “forgotten node”.
SMA* then backs up the value of the forgotten node to its parent. In this way, the quality of the best path in that
sub-tree is always known to the ancestor of a forgotten sub-tree. This information is used when all other paths
look worse than the path it has forgotten. In that case, SMA* regenerates the forgotten sub-tree.
What if all the leaf nodes have the same f-value? Rare case, but possible. In such situation, to avoid selecting the
same node for deletion and expansion, SMA* expands the “newestbest” leaf and deletes the “oldest worst” leaf.
If there is only one leaf, even these coincide, but in that case, the current search tree must be a single path from
root to leaf that fills all of memory. If the leaf is not a goal node, then even if it is onan optimal
solution path, that
solution is not reachable with the available memory.
e ——p
By Tech Knowledge
Publications
|
“f Aland DS-1 (MU)
2-30 Search Techniques
=
Fig. 2.15.1 depicts a typical exa : is
The se mple where memory can store only three nodes. The original search tree.
shown at the top. cond } : /
talf of the figure shows iterations of SMA*, Each node is shown with the f-cost, Le. f
=g +h. The goal nodes ’
best forgotten sub-tree, I, F, J, K are shown in squares. The numbers in parenthesis stand for the f-cost of the
x : . -
er
eee
See
Search Technique
¥ Al and DS-1 (MU) 2-31 —
3. Now to proceed further, we can explore the node G. but first need to make sp ace for the new node. As the
algorithm is designed we need to drop the shallowest highest
f-cost leaf pps. That 18 Moe te unten
also add this forgotten descendant with f-cost 15, as shown in parenthesis. We can now a hte solution from ,
18, Unfortunately, this is not the goal node, and again memory is full. Hence, there is no path tos ,
So we sect {(H)
= 00,
:
| Again G is expanded for next child, and generate I with f(I) = 24. Now as Gis explored fully, with ne H and ]
|
|
ii with f-cost oo and 24, we drop Hl and f(G) becomes 24. As I is also a goal node, but this might not be the optima]
solution as still A’s f-cost is only 15.
i
i
‘
As A is once again the most promising node, so B is generated again. We have found that the path through G is
Not so great.
Cis first successor of B. Observe that it is a non-goal node and with maximum f-cost, so f(C) = ©.
In order to expand node B further we need to first drop C. Generate node D with
f(D) = 20, and this value is
forwarded to B and A.
Now, the deepest and lowest cost node D is selected for furthe
r expansion, as this turns out to be the goal node,
search terminates.
As there is a limited amount of memory available in case of SMA*, in case of very hard problems
where the state
space is complex, SMA* needs to switch back and forth con forced to switch
back and forth continually among
many candidate solution paths as only a small subset of all the expanded nodes can
fit in memory. Remember
the problem of thrashing in disk paging systems!! Then the extra time required
for repeated regeneration of the
same nodes.
This means that problems that would be practically solvable by A*, given
unlimited memory, become intractable
for SMA*. Hence in case of SMA*, memory limitations can make
a problem intractable, with respect to
computation time.
uy TechKnowledge
Publications
Aland DS-1 (MU) Search Techniques
we = 2-32
In the depth-first search, the test function will merely accept or reject a solution. But in hill climbing the test
— is provided with a heuristic function which provides an estimate of how close a given state is to goal
state.
In Hill climbing, each state is provided with the additional information needed to find the solution, i.e. the
Rather, it
heuristic value. The algorithm is memory efficient since it does not maintain the complete search tree.
looks only at the current state and immediate level states
For example, if you want to find a mall from your current location. There are n possible paths with different
directions to reach to the mall. The heuristic function will just give you the distance of each path which is
reaching to the mall, so that it becomes very simple and time efficient for you to reach to the mall.
Goal (Hill top)
Hill climbing attempts to iteratively improve the current state by
means of an evaluation function. “Consider all the possible states
Jaid out on the surface ofa landscape. The height of any point on
the landscape corresponds to the evaluation function of the state
at that point” (Russell & Norvig, 2003). Fig. 2.16.1 depicts the
typical hill climbing scenario, where multiple paths are available to Start
point
reach to the hill top from ground level.
Fig. 2.16.1 : Hill Climbing Scenario
Hill climbing always attempts to make changes that improve the current state. In other words, hill climbing can
only advance if there is a higher point in the adjacent landscape.
Hill climbing is a type of local search technique. It is relatively simple to implement. In many cases where state
space is of moderate size, hill climbing works even better than many advanced techniques.
For example, hill climbing when applied to travelling salesman problem; initially it produces random
combinations of solutions having all the cities visited. Then it selects the better rout by switching the order,
which visits all the cities in minimum cost.
There are two variations of hill climbing as discussed follow.
Search Technique,
WF Aland ps-1 (Mu) 2-33 eth
curren,
tate that is better than the
As we study the algorithm, we observe that in every pass the first node / $ ution to the
State is considered for further exploration. This strategy may not guarantee t hat most optimal sol
problem, but may save upon the execution time.
2.16.1(B) Steepest Ascent Hill Climbing
Q.___ Write algorithm of steepest ascent hill climbing. And compare It with simple hill climbing. is |
As the name suggests, steepest hill climbing always finds the steepest path to hill top. It does so tS an ms best
node among all children of the current node / state. All the states are evaluated using heuristic function. fan y e time
for steepest ascent hill climbing is as
requirement of this strategy is more as compared to the previous one. The algorithm
follows.
Algorithm
current state.
1. Evaluate the initial state, if it is a goal state, return and quit; otherwise make it as a
to current state:
2. Loop until a solution is found or a complete iteration produces no change
SUCC.
a. SUCC =a state such that any possible successor of the current state will be better than
b. For each operator that applies to the current state, evaluate the new state:
(i) If it is goal; then return and quit
(ii) Ifitis better than SUCC then set SUCC to this state.
c. SUCC is better than the current state — set the current state to SUCC.
e As we compare simple hill climbing with steepest ascent, we find that there is a tradeoff for the time
requirement and the accuracy or optimality of the solution.
Incase of simple hill climbing technique as we go for first better successor, the time is saved as all the successors
‘
e
Pea are not evaluated but it may lead to more number of nodes and branches getting explored, in turn the solution
<}
; found may not be the optimal one.
e While in case of steepest ascent hill climbing technique, as every time the best among all the successors is
selected for further expansion, it involves more time in evaluating all the successors at earlier stages, but the
solution found will be always the optimal solution, as only the states leading to hill top are explored. This also
makes it clear that the evaluation function i.e. the heuristic function definition plays a vital role in deciding the
performance of the algorithm.
2.16.1(C) Limitations of Hill Climbing
¢ —Nowlet’s see what can be the impact of incorrect design of heuristic function on the hill climbing techniques.
e Following are the problems that may arise in hill climbing strategy. Sometimes the algorithms may lead to a
position, which is not a solution, but from which there is no move possible which will lead to a better place on
hill ie, no further state that is going closer to the solution. This will happen if we have reached one of the
following three states.
e Local Maximum : A “local maximum” is a location in hill which is at height from other parts of the hill but is not
the actual hill top. In the search tree, it is a state better than all its neighbors, but there is not next better state
which can be chosen for further expansion. Local maximum sometimes occur within sight of a solution. In such
cases they are called “Foothills”.
W Tech Knowledge
Publicacions
On
| \ “
¢ Ridge: A “ridge” is an area in the hill such that, it is higher than the surrounding areas, but there is no further
uphill path from ridge. In the search tree it is the situation, where all successors are either of same value or
lesser, it’s a ridge condition. The suitable successor cannot be searched in a simple move.
{© Ridge
W TechXnowledge
Pavlicacions
a ee
Fig, 2.16.8 depicts all the different situations together in hill climbing.
f global
Objective function
Plateau
Local maxima
State space
Thus, in the simulated annealing there are very less chances of large uphill moves
than the small one. Also, the
probability of uphill moves decreases with the temperature
decrease.
Hence uphill moves are more likely in the beginning of the annealing process, when the temperatu
re is high. As
the cooling process starts, temperature comes down, in turn the uphill moves.
Downhill moves are allowed any
time in the whole process. In this way, comparatively very small upward moves are allowed till finally, the
process converges to a local minimum configuration, i.e. the desired low point destination
in the valley.
2.16.2(A) Comparing Simulated Annealing with Hill Climbing
Hill climbing procedure chooses the best state from those available or at least better than the current state for
further expansion. Unlike hill climbing, simulated annealing chooses a random move from the neighborhood. If
the successor state turned out to be better than its current state then simulated annealing will accept it for
further expansion. If the successor state is worse, then it will be accepted based on some probability.
EP TechKnowledge
Publications
Search Techniques
WF Atanas. (Mu) 2.37 =,
{
Get NODE from W_OPEN;
if NODE = Goal state then Found = true else aed
{ ins
Find SUCCs of NODE, if any with its estimated cost se cease aee el
store in OPEN list; ‘ .— 4
it
6 d<=--
Search Techniques
WF Aland Ds-1 (MU) 2-39 SS
-to-sideside oscillations accross the
ing, that is cancels side-to-
It smooths the weight changes and suppresses cross-stitch
error valey;
te causing a faste
When all weight changes are all in the same direction the momentum amplifies the learning ra S r
convergence;
BEIT
UP TechKnowledge
M pybbications
WY Aland DS-1(MU)
Search Techniques
One of the possible solution js [=3's- an
34 4W=8 T=5 S=9 M=6 B=1 E=0 as,
+ 85
+916
————
1035
The above example has multiple Solutions, Ideally, a good
crypto-arithmetic puzzle must have only one solution.
How to solve the problem?
We follow the classical three-stage method
stage 1: Describe
» Inthe describe stage we will
explain the problem and the
goal in natural language.
e Foracomplete problem description
we heed to answer following three questions :
1, Whatis the goal of the problem?
In this example, the goal is to
replace letters by digits such that
the sum IS + WT + SBM = BEI
2. Are there any unknow T is verified.
ns or decision variables?
The digits that the letters represen
t. In other words, for each letter we have one decision variable that can
take any digit as value of that letter.
3. Whatare the constraints?
The obvious constraint is the sum that has to be verified and
all the variables must have different values in a
feasible solution. But as we observe the given equation, we notice
that, there are other implicit constraints as
well,
ress
It is implicit that letters I, W, S and B cannot represent digit 0, as they are
the first digit of a number.
There are 9 distinct letters, so we need at least 9 different digits in the
answer.
By observation, we have B = 1, Consider three auxiliary variables, X, Y, Z. Hence
we can write the constraint
ES,
equations as,
S+T+M 10X+T
ll
1+W+B+X 10Y +I
it
S+Y 10B+E
Solve : A we got the quadratic equations representing the problem, we can solve them
SST:
simultaneously to get the
solution.
Example 2 :
TWO
+ TWO
FOUR
E.g., setting F= 1,0 =4,R=8, T=7, W=3,U = 6 gives
734 + 734 = 1468
Trick: introduce auxiliary variables X, Y
O+0O = 10X+R
W+W+X = 10Y+U
T+T+Y = 10F+0
ig’ Techinewledga
Pevtications
~
‘and DS-1 (MU) 2-41 Search Techniques
| a
ay
—— Also need pair wise constraints between original varlables if they are supposed to be different. a
EX.247.1: Solve Crypt arithmetic problem
EAT }
+ THAT
— ne
APPLE
Soln. :
EAT
THAT
eee
APPLE
! 2T=E+(C1"10 assuming C1 as carr
y so L = 2A+ C1
As Ais the final carry, A = 1, so we
have, L=2+C1
As carry can not be 2,C1=1
andL=3
So we have,
E1T
“TH1IT
—————_
IPP3E
As we observe that T is the only letter in the last addition and its generating
carry A.
Hence T= 9 andE=8
So we have,
819
9H19
1PP38
This makes P = 0
UF Toca Kncwiodse
Publicachons
Vy Al and DS-I (MU)
2-42 Search Techniques
The sum of the currents owing int
Oan
. 0
constraint satisfaction ; de must equal zero.
Some i
poplar puzzles like, .
ome poplar map coloring problem, Latin Square, Eight Queens, and Sudoku are stated below.
1. Map coloring problem:
In this example, Variables are WA, NT, Q, NSW, V,S A,T
Domains D, = {red, green, blue}
Constraints : adjacent regions must have different colors
e.g. WA # NT, or (WANT) in {(red, green),(red, blue),(green, red), (green, blue), (blue, red),(blue, green)}
_ Northem
Territory
Westem Queensland
Australia
South
Australia
New South Wales ESSN SE FSU.
Victoria
eS SW
Tadkani]
Fig. 2.18.1
= green, V = red,
Solutions are complete and consistent assignments, e.g., WA = red, NT = green, Q = red, NSW
SA = blue, T = green
symbols such that each symbol occurs
2. Latin Square Problem : How can one fill an n x n table with n different
column ?!
exactly once in each row and each
4 are:
Solutions : The Latin squares for n= 1, 2,3 and
TechKnowladge
Publications
Search Techniques
= _ Mand DS. (MU) 2-43
wt,
—=,
a
Fig. 2.18.2
3. Eight Queens Puzzle Problem : How can one put 8 queens on a (8 x B)chess board —
attack any other queen ?
:
Solutions : The puzzle has 92 distinct solutions. If rotations and reflections of the board are counted as one, the
en ee
Real-world CSPs :
=i
2. Discrete variables and infinite domains : integers, strings, etc. range of values
ak pla theta
How to solve CSPs? Is there any standard method??Let's start with the straightforward approach, then fix it
States are defined by the values assigned so far
1. Initial state : The empty assignment {}
2. Successor function : Assign a value to an unassigned variable that does not conflict with current
assignment; fail if no legal assignments
Tech
Pablicaciont
ey Aland DS-1 (MU)
—
3. Goaltest: . The 2-44
current assignment Search Techniques
, Variable assignments is
S Compl]
are comm Utatiy _
Only need to consider assignments mh he. WAA = red then NT = green ] same as [ NT = green then WA = red J.
variable assignments js calted 9 single y with single-
backt ra k arlable at each node. Depth-first search for CSPs
algorithm for
Cops. Uetng thls technique one can vl ms search Backtracking search js the basic uninformed
» _ Backtracking algorithm is as fo queen Sforn~ 25,
ean ces lows ;
function BACKTRACKING-SEARCH (csp) ret
/> return RECURSIVE-BACKTRACK) Ms a solution; or failure no
function RECURSIVE-BACKTRACKING
agg
NG({}, csp)
Bament, csp)returns a soluti
_ <> - ifassignmentis comple te then on, or failure
return assignment
var SELECT-
Se a TUNASSIGNED-VARIABLE(Vaiables{cspl, assignmentcsp)
oreach value in ORDER-DOMAIN-VALUES{var,
assignment,csp)do
if value is consi ;
eee ensistent with assignment according to Constraints[csp] then
hee add{var = value} to assignment
Pe ; result <~ RECURSIVE-
BACKTRACKING (assignment, csp)
if result # failure then ret
urn result
mean move {var = value} from assignment...
AR
Following are the general-purpose methods that can give huge gains in speed by enforcing minimum number of
backtracks. While assigning values are choosing variable for evaluation following questions are inevitable. By
answering these questions we can design an efficient strategy in order to improve efficiency of backtracking process.
1. Which variable should be assigned next?
The answer to this question leads to two simple strategies to select the next variable for assignment, called
“Most Constrained Variable” and “Most Constraining Variable”.
2. In what order should its values be tried?
This question designs a strategy called “Least Constraining Value”, which helps us in choosing a value among all
possible values from the domain.
3. Can we detect inevitable failure early?
This is the key question which has a significant impact on the speed of generating the solution. These are the
strategies which can foresee the failure if the current path of assignment is followed. Thereby, guide us whether
“Forward checking”, “Constraint
we are on the right track. There are three strategies under this section namely,
Propagation”, and “Arc Consistency”.
Let's study all these strategies one by one.
1. Most constrained variable :
It choose the variable with the fewest legal values. By assigning this variable first, we can get a fair idea of other
variable assignment. Also most of the constraints get satisfied by the assignment, hence chances of getting on
first
wrong track are lowered down. This is also called as minimum remaining values (MRV) heuristic or Forced
heuristic.
—_
UF TechKnowiadge
Publications
Search Techniques
1 (MU) 2-45 _—
Fig. 2.18.3
Most co nstraining variable: ,
ning variable strategy we first choose
This is the tie-breaker among most constrained variables. In Most constrai
ximum
the variable with the most constraints on remaining variables. Example, the region surrounded by ma
number of regions. Once we assign color to his part of the graph, it as have decided the colors for aj)
good as we
the surrounded parts.
oO
Fig. 2.18.4
Least constraining value:
Given a variable, choose the least
constraining value that is the one
that rules out the fewest values for
remaining variables. the
Fig. 2.18.5
Forward checking:
It keeps track of remaining legal values for unassigned variables. Terminate
search when any variable has no
legal values. As shown in the graph, as we go on assigning colors to different
regions, we notice that SA is not left
with any valid color to assign. Hence the assignment will terminat
e and the process backtracks to previous state.
oO
WA
SA Tw
EE Ae
me cs
Bang
[2re)
Fig. 2.18.6
wh SA T
Ee
a Ee
| SEED
Lea BIE
9°
WA SA T
Maal Meas
In Fig. 2.18.1as WA and Q are assigned red and green colors respectively, only blue is left for NT and if it is
assigned then there is no consistent value left for SA. |
7 TechKnewledge
Publications
WF Aland Ds-1 (Mu) 2-47 Search Techniques
Soln.:
Yellow
Hence, the optimal number of colours required to colour this map is ‘4’.
Fig. P. 2.18.1
2.18.6 Water Jug
eS
ahd
that can be used to fill the jugs with water how can you get exact 2 gallons of water into the 4 - gallon jug?
* The state space for this problem can be described as of ordered pairs of integers (x, y), such that x = 0,1,2,3 or 4,
representing the number of gallons of water in the 4- gallon jug and y = 0,1,2 or 3, representing the quantity of
water in the 3- gallon jug. The start state is (0, 0). The goal state is (2, n) for any value of n, since the problem
does not specify how many gallons need to be in the 3- gallon jug.
The operators to be used to solve the problem can be described as shown bellow. They are represented as rules
whose left side are matched against the current state and whose right sides describe the new state that results
from applying the rule.
Rule set :
1. (xy) ——> (4,y) fill the 4- gallon jug
Ifx<4
2. (xy) ——» (x3) fill the 3-gallon jug
Ifx<3
3. (xy—» = (x-d.y) pour some water out of the 4- gallon jug
Ifx>0
4. (xy) —~» (x-dy) pour some water out of the 3- gallon jug
Ify>0
5. (xy) ——> (0,y) empty the 4- gallon jug on the ground
Ifx>0
6. (xy)——+» (x,0) empty the 3- gallon jug on the ground
Ify>0
7. (xy) ——» (4y-(4-x)) pour water from the 3- gallon jug into the 4- gallon
Ifx+y>=4andy > 0 jug until the 4-galoon jug is full
W Tech Knowledge
Pupiicarions
v Aland DS-1 (MU) a.49 Search Techniques
— 3
a. (xy)— > (x-(3-y),3)) pour water from the 4- gallon jug into the 3-gallon
Ifx+y>=3andx>0 jug until
the 3-gallon jug is full
9, (xy)——> (+y,0) pour all the water from the 3 -gallon
jug into
Ifx+y <=4andy>0 the 3-gallon jug
40. (&y)——> (Ox+y) pour all the water from the 4 -gallon jug into
Ifx+y <=3andx>0 the 3-gallon jug
11. (0,2) —> (2,0) pour the 2-gallon from the3
-gallon jug into
the 4-gallon jug
12. (2y) ——> (0x) empty the 2 gallon in the 4 gallon on the groun
d
production for the water jug problem:
(0, 0)
(4, 0) (0, 3)
ak hs
43) 3) (0) %°
~~
W™ (4.0) (9 3) (0, 0) (0, 1)
~i~™
Fig. 2.18.9
Ww Tech Knowledge
Puslications
aff A
ce BR ™
=
2.19 Adversarial Search -
ae
according
* — Adversarial Search Problem is having competitive activity which involves 'n! players and it is played
to certain set of protocols.
surro unding environment in g
Game is called adversarial because there are agents with conflicting goals and the
game is competitive as there are 'n' players or agents participating.
very participant wants to win the
We say that goals are conflicting and environment is competitive because e
game.
* From above explanation it is understandable that we are dealing with a competitive multi agent environment.
* — As the actions of every agent are unpredictable there are many possible moves/actions.
* In order to play a game successfully every agent in environment has to first analyze the action of other agents
i and how other agents are contributing in its own wining or loosing. After performing this analysis agent
i executes.
There can be two types of environments in case of multi-agent. Competitive and cooperative.
1. Competitive environment: \
e In this type of environment every agent makes an effort to win the game by defeating or by creating
superiority over other agents who are also trying to win the game.
e Chess is an example of a competitive environment.
2. Cooperative environment:
e Inthis type of environment all the agents jointly perform activities in order to achieve same goal.
e Car driving agent is an example of a cooperative environment.
Under artificial intelligence category, there are few special features as shown in Table 5.3.1, which make the
game more interesting.
Ww TechKaowladge
Pupticarioes
j
Search Techniques
Bs0
¢ “Zero sum game” concept is associated with payoffs which are assigned to each player when the instance of the
game is over. It is a mathematical representation of circumstances when the game is in a neutral state. (i.e.
agents winning or losing is even handed with the winning and losing of other agents).
¢ Forexample, if player 1 wins chess game it is marked with say + 1 point and at the same time the loss of player 2
is marked with - 1 point, thus sum is zero. Another condition is when game is draw; in that case players 1 and 2
are marked with zero points. (Here + 1, - 1 and 0 are called payoffs).
2.21.2 Non-Zero Sum Game
Non-zero sum game's don't have algebraic sum of payoffs as zero. In this type of games one player winning the
game does not necessarily mean that the other player has lost the game.
There are two types of non-zero sum games :
1. Positive sum game
2. Negative sum game
Search Techniques
WF Aland ds.1(mu) 2-51 —
LOL
It is also called as competitive game. Here, every player has a different goal so no one really real y wins the game,
everybody loses. Real world example of a war suits the most.
4 ive overview of
To understand game playing, we will first take look at all appropriate aspects of a game which give
the stages in a game play. See Fig. 2.22.1.
handy. F
| ¢ Accessible environments : Games with accessible environments have all the necessary information handy. For
j example : Chess.
* Search : Also there are games which require search functionality which illustrates Bow players have to search
through possible game positions to play a game. For example : minesweeper, battleships.
|
¢ Unpredictable opponent : In Al games opponents can be unpredictable, thisis j introduces unce rtainty in game
‘Relevant aspects.
— ofgame
Fig. 2.23.1 shows examples of two main varieties of problems faced in artificial intelligence games. First type is
“Toy Problems” and the other type is “Real World Problems”.
“Traveling Salesperson
Se (NP hard) Seuueee
Game play follows some strategies in order to mathematically analyze the game and generate possible outcomes.
A two player strategy table can be seen in Fig, 2.23.2.
W Tech Knowledge
Publicatians
pr
Game can be classified under deterministic or probabilistic category. Let's see what we mean by deterministic
and Probabilistic.
Deterministic :
e It is a fully observable environment. When there are two agents playing the game alternatively and the final
results of the game are equal and opposite then the game is called deterministic.
e Take example of tic-tac-toe where two players play a game alternatively and when one player wins a game then
other player losses game.
Probabilistic :
e Probabilistic is also called as non-deterministic type. It is opposite of deterministic games, where you can have
multiple players and you cannot determine the next action of the player.
e You can only predict the probability of the next action. To understand probabilistic type you can take example of
card games.
e Another way of classification for games can be based on exact/perfect information or based on inexact /
approximate information. Now, let us understand these terms.
1. Exact/perfect information: Games in which all the actions are known to other player is called as game
of exact or perfect information. For example tic-tac-toe or board games like chess, checkers.
2. Inexact / approximate information : Game in which all the actions are not known to other players (or
actions are unpredictable) is called game of inexact or approximate information. In this type of game,
player's next action depends upon who played last, who won last hand, etc. For example card games like
hearts.
Consider following games and see how they are classified into various types of games based on the parameters
which we have learnt in above sections :
Exact/ perfect information Inexact / approximate information
e Monopoly e Poker
sey TechKnowledge
Publicacttens
Search Techniques
|
aM m3 Fi
Bama a i
i eB wm
Ld
Data
7.23.1(B) Checkers
om il im | | wom my | |
a ne 8 L | le | oe) ee
Oo oc Moonf.-Bo oc o Peel oi onc o oc M eM oml loll oll eile
(ome) oe ae |e JalOi 6} | OF O3i Om OO OM oHomic [CMSM omic I omic om:
(oll oll oll a (eile eMe onc 8 foMomcMek) (eMoMoMell [oMoMeMoll \plem ome
sey TechKnowledge
Publications
rnin el
Search Techniques
BF Aland Ds-1 (MU) 2-55 —
MAX (X)
xX X X 7
MIN (O)
(O X x . x X
O|X x}O Xx
MAX (X) o coe
Ente
Initial state It gives give the starting position of the game board.
Terminal state Which indicates that all instance of game are over.
Utility It displays a number which indicates if the game was won or lost or it was draw.
From Tic-Tac-Toe game's example you can understand that for a 3 x 3 grid, two player game, where the game
tree is relatively small (It has 9! terminal nodes), still we cannot draw it completely on one single page.
Imagine how difficult it would be to create a game tree for multi player games or for the games with bigger grid
size.
Many games have huge search space complexity. Games have limitation over the time and the amount of
memory space it can consume. Finding an optimal solution is not feasible most of the times, so there is a need for
approximation.
Therefore there is a need for an algorithm which will reduce the tree size and eventually will help in reducing
the processing time and In saving memory space of the machine.
One method is pruning where, only the required parts, which improve quality of output, of the tree are kept and
reaming parts are removed,
Another method is heuristic method (it makes use of an evaluation function) it does not require exhaustive
research. This method depends upon readily available information which can be used to control problem
solving.
Up lechtnewtedys
Publicacions
ay. Mand DS (MU
— ’ fe
2.24 MiniMax Algorithm 2-56 Search Techniques
i =a
noes Question
~<
oO
*<
Oo
«x
x<
oO
Oo
Oo
oO
<
Oo
~
Fig. 2.24.2
17 TechKnowlodye.
Publications
aa PS I i
—
Search Techniques
WF Aland ps. (MU) 2-57 ==,
X}|X}O
Oo x
°
o|x]lo!
XIX]O xIxlo xI|xlo xlxlo xK|x]oO —X|xX
O]O]x ololx olx]x o x o;x|x oO
x oO aoe) Oo o|x|o o|o x|o
O'sTum
X| xX }O X}|X]O
oj;o;x o X
° ° oO
O'sTum ' i |
Fig. 2.24.4
UF lecatnewtadee
Publicaciens
2-58 Search Techniques
¥ Aland DS (MU)
step3: Apply
tpwandiinstietiite ity valuvalues
MIN and MAX operators on the nodes of the present stage and propagate the utility
x|xlo
o| [x
eh ees
0 | 1 1 1 o
x|xJo xx x|xlo_x|x x|x]o _x|x|o
ofo|x olg@|x olx|x “olg|x olx|x olo|[x
xlolo xlo -etete xlo tele xlolo
Fig. 2.24.5
Step4: With the max (of the min) utility value (payoff value) select the action at the root node using minimax
decision.
x
Oo x
x °
Fig. 2.24.6
WF
tees Aland Ds. (Mu) 2.59 Search
Technigy
x} xX |9O
Oo xX
x
Oo
Fig. 2.24.7
(In case of Steps 2 and 3 we are assuming that the opponent will play
perfectly as per our expectation)
2.24.2 Properties of Minimax Algori
thm
e —_ Itis considered as Complete if the game
tree size is finite.
° — Itis considered Optimal when it is played against
an optimal number of opponents.
e Time complexity of minimax algorithm is indicat
ed as O(b™).
¢ Space complexity of minimax algorithm is indicated by 0(b™) (using
depth-first exploration approach).
e For chess, b = 35, m ~100 for “reasonable” games,
e Exact solution is completely infeasible in most of the games.
Q. Whatisa-Bpuning? =
Q. _.- Show the use of a-f pruning for a two person game with example.
e Pruning means cutting off. In game search it resembles to clipping a branch
in the search tree, probably which is
not so fruitful.
e —a-B pruning is an extension to minimax algorithm where, decision making process need not consider each and
every node of the game tree.
e Only the important nodes for quality output are considered in decision making. Pruning helps in making the
search more efficient.
es
W Tech Knossledse
punsicacio®
a
MIN
Ag ” 7 2
Fig. 2.25.1
« Forwhich we have to calculate minimax values for root
node
Minimax value of root node = Max(m
in(4,1 0,6), min(3,A,B), min(13,7,2))
MAX A |-.+)
MIN
MIN
[--, 4]
MAX
Padre] [4, +20]
MIN [4,4]
4 10 6
W ledtinonteaye
Publications
Search Techniques
WF Aland Ds. (MU) 2-61 SR
MAX
MIN
MAX
Dt] [4] XG] (4, 7]
MIN
>3] [-~, 7]
4 10 6 3 13267
MAX
MIN
So in this example we have pruned 2 B and 0 a branches. As the tree is very small, you may not appreciate the
effect of branch pruning; but as we consider any real game tree, pruning creates a significant impact on search as far
as the time and space is concern.
er Yoch Knowled
W Al and DS-I (MU)
es |
2-62 Search Techniqu
575 Example of a-B Pruning
22113443741035
4256
Fig. P. 2.25.1: Game Tree
221134437-110354256
ee
ee
SS
221134437-110354256
ESS
So
Total pruned branches
a cuts = 2
B- cuts= 3
fae cae
Ce
/~\ [L\prm/o
Joovbdrdd'sds Fig. P. 2.25.2
EE
—~— W TeckKnowledge
Publications
WF Aland Ds-i (mv) as
Soin. :
| Fig. P. 2.25.2(a)
No. of a- cuts
=1
No. of B - cuts
= 2
Review Questions
Q.1 Why is it called uninformed search? What is not been informe
d about the search?
Q.2 Write a note on BFS.
Q.3 Write a note on : Uniform cost search.
i
| rs
publicattor>
> E ETE
ey Aland DSI-1 (MU ) 2-64 Search Techniques
, g. 18 Write a short note on IDA*,
0. 19
Explain SMA" algorithm with example. When should we choose SMA’ given options?
Give a-B pruning algorithm properties,
Q. 20
Q.21 Write short note on behavior of A* In case of underestimating and overestimating Heuristic.
Q. 22 Discuss admissibllity of A* In case of optimality,
Q, 23
Compare and contrast A*, SMA’ and IDA*,
Q, 26 Give a-B pruning algorithm with an example and i's properties, also explain why is it called a-B pruning.
Q, 27 Write a note on : Simulated Annealing.
Q. 28 What is CSP?
Introduction
SLIME mesl
Q.
¢ As shown Fig. 3.1.1, a knowledge based
a gents can be described at differen
an Inference Engine. t levels : Knowledge Base (KB) and
Domain-independent algorith
ms
Domain-specific content
tena
TELL mechanism is simi n).
lar to taki ng Input for
a system.
eee
o Then the agent can ASK itself What action should be carried out
to get desired output. ASK mechanism is
nena
similar to producing Output for a system. However
ASK mechanism makes use of the knowledge base
decide what it should do, to
re
TELL and ASK mechanism involve inference. When you run ASK function, the answer is generate
d with the
help of knowledge base, based on the knowledge which was added with TELL function previously.
o TELL(K): Isa function that adds knowledge
K to the knowledge base.
o ASK(K) : Is a function that queries the agent about
ey ee
the truth of K.
ee
An agent carries out following operations : First, it TELLs the knowledge
base about facts/information it
perceives with the help of sensors, Then, it ASKs the knowledge base what action should be carried
out
based on the input it has received. Lastly, it performs the selected action with the help of effectors.
FO
Knowledge based agents can be implemented at three levels namely, knowledge level, logical level and implementation
level.
1. Knowledge level 2. Logical level
3, Implementation level
1, Knowledge Level :
It is the most abstract level of agent implementation. The knowledge level describes agent by saying what it
knows. That is what knowledge the agent has as the initial knowledge.
Basic data structures and procedures to access that knowledge are defined in his level. Initial knowledge of
ledge.
knowledge base is called as background know
an agent for which one only need to specify what the agent
gents at : the the knowledge level can be viewed as
Agents
implemented.
kn hat its goals are in order to specify its behavior, regardless of how it is to be
co with the
exampl
Forvecnp e : A tax sire
tevs agent might know that the Golden Gate Bridge connects San Francis
marin county.
—____
UP Techtnowisdge
Publications
¥ Aland DS-1 (MU) 33 First 0 Order Log
tion using using First
Knowledge Representation
2. Logical Level :
ses some formal langua
* At the logical level, the knowledge is encoded into sentences. This level u have Buage ty
Fepresent the knowledg are propositiona] J,..
e the agent has, The two types of representations We
and first order or predicate logic, Tog
Both these representation techniques are discussed in detail in the further sectio ns.
For example : Links(Golden Gate Bridge, San Francisco, Marin County)
3; Implementation Level :
In implementation level, the physical representation of logical level sentences is done. 7 leve] also
describes data structures used in knowledge base and algorithms that
used for data manipulation.
| For example: Links(Golden Gate Bridge, SanFrancisco,
Marin County) ser sen pe ae RS SNES ES ERT
function KB - Agent (percept) returns an action
ed oor eae rlee ipa eMee
"static: KB,a knowledge base
t, a counter, initially o, indicating time
TELL (KB, MAKE -— PERCEPT-SENTENCE(percept, t))
action — Ask (KB, MAKE-ACTION-QUERY(t))
TRLL(KB, MAKE-ACTION-SENTENCE(action,t)) ' |
ww Tech Knowledge
Publicacions
» STi
-1 (MU ic
aan — 3-4 Knowledge Representation using First order ee
Vv
re are various S spri sprites in th © game like pj as some feature
fea
rats understand this one-by-one ; pit stench, breeze, gold, and arrow. Every sprite h
> Few room have mowtomles pits which trap the player (agent) if he comes to that room. You can see in the
Fig. 3.2.1 tha ‘on (1,3), (3,3) and (4,4) have bottomless pit. Note that even WUMPUS can fall into a pit
stench experienced in a room which has a wu 3.2.1, here room
0 (2.1) (3,2) and (4,1) have Stench, MPUS in its neighborhood room . See the Fig. 3.4.4,
q frets’ is expertenced in a room which has a pit in its neighborhood room. Fig. 3.2.1 shows that room (1,2),
(1,4), (2,3), (3,2), (3,4) and (4,3) consists of Breeze,
> Player (Agent) has arrows and he can shoot these arrows in straight line to kill WUMPUS.
> one of the rooms consists of gold, this room glitters. Fig. 3.2.1 shows that room (3, 2) has
Gold.
Apart from above features player (agent)
. can acce pt two types of percepts which are: Bump and scream. Abump
is generated if player (agent) walks into a wall. While: a sad scream created everywhere in: the cave when the
WUMPUS is killed.
a ASenans Create
MA AARAARD yas
*.'3N
ws
|
|XStoze”
She Bree nche | Ex ud} | Breeze”
Gay)
MAK AABAA
“ yy
2 ASioncw, Breeze”
aN
RE ere
eeePT
yxy ‘ xa A >
+e eet
1 2 3 4
Fig. 3.2.1: The WUMPUS World
IG
32.1 Description of the WUMPUS World I
* An agent receives percepts while exploring the rooms of cave. Every percepts can be represented with the help
of five element list, which is [stench, breeze, glitter, bump, scream]. Note that player (agent) cannot perceive its
AA
own location.
eran
* Ifthe player (agent) gets percept as [Stench, Breeze, None, None, None]. Then it means that there is a stench and
abreeze, but no glitter, no bump, and no scream in the WUMPUS world at that position in the game.
a
* Let's take a look at the actions which can be performed by the player(agent) in WUMPUS World :
a
LL
SF
W TechKnowledye
Publications
W Aland DS-I (MU) 3-5 Knowledge Representation using First Order Logi,
————_ SSS SSS
* These actions are repeated till the player (agent) kills the WUMPUS or If the
WUMPUS
player Cogent) Os Hanes.
is killed then {t {s a winning condition, else if the player (agent) Is killed then it is a losing Ifthe
conditi on
and the game is over.
* — Game developer can keep a restriction on the number of arrows which can be used by the player (agent). So jf
we allow agent to have only one arrow, then only the first shoot action will have some effect. If this shoot action
kills the WUMPUS then you win the game, otherwise {t reduces the probability of winning the game.
° Lastly there is a die action : It takes places automatically If the agent enters In a room with a bottomless pit or {p
a room with WUMPUS, Die action {s Irreversible.
Goal of the game:
* Main aim of the game ts that player (agent) should grab the gold and return to starting room (here its (1,1)
without being killed by the monster (WUMPUS).
e Award and punishment points are assigned to a player (Agent) based on the actions it performs. Points can be
| given as follows:
© 100 points are awarded if player (agent) comes out of the cave with the gold.
© 1pointis taken away for every action taken.
© 10 points are taken away if the arrow is used.
© 200 points are taken away if the player (agent) gets killed.
Tock Knowledge
Punticatioss
Ala rder Logic
3-6 Knowledge Representation using First 0
ertectorS (assuming a robotic agent)
‘ Motor to move left, right
Robot arm to grab the gold
Robot mechanism to shoot the arrow
” wuUMPUS world
agent has following
characteristics :
Let's try to understand the WUMPUS world problem in step by step manner. Keep Fig. 3.2.2 as a reference figure.
B - Breeze
G - Glitter, Gold
3100/3200 «4/3 0 I 34 OK - Safe square
P- Pit
Ll 12 W- Wumpus
ox lox 13 14
«The knowledge base initially contains only the rules (facts) of the WUMPUS world environment.
Stepi: Initially the player(agent) is in the room (1,1). See Fig. 3.2.2(a).
The first percept recelved by the player Is [none, none, none, none, none]. (remember percept consists of
{stench, breeze, glitter, bump, scream])
Player can move to room(1,2) or (2,1) as they are safe cells.
Step2: Let us move to room (1,2). See Fig. 3.2.2(b).
G - Glitter, Gold
2.1 2.2 2.3 2.4
OK - Safe square
1.1 1,2 1.3 1.4
P-Pit
V P?
S ~ Stench
OK B V- Visited
OK W - Wumpus
room (1,2)
Fig. 3.2.2(b): - WUMPUS world with player in
ee
As room (1,1) is visited you can see “V" mark in that room. The player receives following percept: (none, breeze,
hone, none, none].
——________
Ww Tech Knowiodge
Pubticetiass
AT SR ESD AS RS aA ng >
Step 3:
aA 42 4.3 a4
Vv B
OK OK
Fig. 3.2.2(c) : WUMPUS world with player moving back to room (1,1) and then moves to other safe room (2,1),
As seen in Fig. 3.2.2(c). Player in now in room (2,1), where it receives a percept as follows : [stench, none, None,
none, none] which means that there is a WUMPUS in neighboring room (i.e. either room (2,2) or (3,1) has
WUMPUS).
As we did not get breeze percept in this room, we can understand that room (2,2) cannot have any pit and from
step 2 we can understand that room (2,2) cannot have WUMPUS because room (1,2) did not show stench
percept.
Thus room(2,2) is safe to move in.
Step4: Player receives [none, none, none, none, none] percept when it comes to room (2,2). From Fig. 3.2.2(d) you can
understand that room (2,3) and room (3,2) are safe to move in.
44 42 4.3 44
2.1
Vv
Eb
2.2 2.3 24
OK
1.1 1.2 1.3 py | 14
Vv B
OK OK
Fig. 3.2.2(d) : WUMPUS world with player moving to
room (2,2)
_ eee
Wy recitarein
pubiicatié
it Oita AR eS tee” —_-—
a wey Ae Sau AA,
g —?z£ i
ie
(MU)
rder Logic
¢ Knowledge Representation using First 0
Al and ps-l
3-6
teP >!
starting po sition,
without being killed by the WuMpus. of this game | s to grab the gold and g 0 back to the
|,
..00———
" 42 Tag 4A
P?
aa. Ta
P? P?
24 2D i = ri
1.1 1.2 13 p 14
V B
step 6: As can be seen in Fig. 3.2.2(£). We will go from room (2,2) to room (2,1) and from room (2,1) to room (1,1).
Thus we won the WUMPUS World game!!!
44 42 43 TA
Ww?
P?
34 3.2 33 34
W? w?
P? P?
21 2.2 2.3 24
AK +—
Vv AI OK
OK V
vd 1.2 13 0 1.4
v 8
OK OKV
Fig. 3.2.2(f) : WUMPUS world with player moving back to room (1,1) with gold
33
—.
Logic
to perform a
* Logic can be called as reasoning which is carried out or itis a review based on strict rule of validity
Specified task.
In case of intelligent systems we say that any of logic's particular form cannot bind logical representation and
form of logic.
reasoning, they are independent of any particular
—__ UP Techinowtedge
Publications
id Al and DS-I (MU) 3-9 Knowledge Representatio n using g First O tder Logic
nt and when knowleg
Be is
Make a note that logic is beneficial only if the knowledge Is represented In small exté
represented in large quantity the logic {s not considered valuable.
it is shows that sentences need
* — Fig. 3.3.1 depicts that sentences are physical configurations of an agent 4 Iso
sentence. This means that reasoning Is a process of forming new physical con fig ura tions from old ones.
Need
Sentences. —a Sentence
° Logical reasoning should make sure that the new configurations represent features of the world that actually
follow the features of the world that the old configurations represented
Probabilistic logic
¢ Propositional logic can be considered at fuzzy logic level, where rules are values between range of 0 and 1. Next
level is also called as probabilistic logic level using which first order predicate logic is implemented.
e In this Fig. 3.3.2 that there are two more levels above higher order logic which are multi-valued and non-
monotonic logic levels and they consist of modal logic and temporal logic respectively. All these types of logic are
basic building blocks of intelligent systems and they all use reasoning in order to represent sentences. Hence
reasoning plays a very important role in Al.
Ww TechKnowteds?
puprications
Ww Al and DS-I (MU) 3-10 Knowledge Representation using First Order Logic
2. First Order Predicate Logic : These are much more expressive and make use of variables, constants, predicates,
functions and quantifiers along with the connectives explained already In previous section.
3. Higher Order Predicate Logic : Higher order predicate logic is distinguished from first order predicate logic by
using additional quantifiers and stronger semantics.
4. Fuzzy Logic: These indicate the existence of In between TRUE and FALSE or fuzziness in all logics.
Other Logic : These include multiple valued logic, modal logics and temporal logics.
peHaENrsproopnosit
«T
* Example:
me Is small.
o IF pressure is high, THEN volu
Is dangerous.
© IF the road is slippery, THEN driving
Tech
Publications
WF Aland ps. (MU) 3-14 Knowledge Rep resentation using First Order Logic
oC
Some of the benefits of IF-THEN rules are that they are modular, each defining a relatively small and, at least in
principle, independent piece of knowledge. New rules may be added and old ones deleted usually independently
of other rules,
Production rules are simple but powerful forms of representing knowledge, they P rovide flexibility for
combining procedural and declarative representations In a unified manner. The major advantage aD
rules are that they are modular, independent of other rules with the provision for addition ne S and
deleting older ones,
(c) Semantic networks
These represent knowledge in the form of graphical networks, since graphs are easy to be stored inside programs as
they are concisely represented by nodes and edges,
A semantic network basically comprises of nodesthat are named and represent concepts, and
labelled links representing relations between concepts. Nodes represent both types and tokens.
For example, the semantic network in Fig. 3.4.1 expresses the knowledge to represent the following data:
Tom is a cat.
00006000
Fig. 3.4.1
Conceptual Graph : It is a recent scheme used for semantic network, introduced by John Sowa, has a finite,
connected, bipartite graph. The nodes represent either concepts or conceptual relations. It differs from the
previous method that it does not use labelled arcs. For example : Ram, Laxman and Bharat are Brothers or cat
color is grey can be represented as shown.
Fig. 3.4.2
) Frame Representation
This concept was introduced by Marvin Minsky in 1975. They are mostly used when the task becomes quite
complex and needs more structured representation. More structured the system becomes more would be
the requirement of using frames which would prove beneficial.
Generally frames are record like structures that consists of a collection of slots or attributes and the
corresponding slot values.
WV Techinowiedge
Publications
yo
gy Aland wm) 3-12 Logic
» _ Slots can be of any size ang type Knowledge Represen tation using First Order
The slots
have names or numbers too, A simple frame ve names and values (subfields) called as facets. Facets can
8. 3.4.2 for a person Ram,
o (PROFESSION (VAL
UE Professor)
o (AGE(VALUE 50))
(WIFE(VALUE sita))
90
(CHILDREN(VALUE Juy ku
sh))
90
(ADDRESS (STREET(VALUE
4¢ Bb road)))
O09
CITY(VALUE banaras))
89
(STATE(VALUE mh))
Oo
(ZIP(VALUE400615))
o
35.1 Syntax
re
'.
Literal is an atomic sentence or it can be negation of atom
ic sentence, (A, 7A)
IfA is asentence, then 7A is a sentence.
Propositional logic makes use of relationships between propos
itions and it is denoted by connectives, if A and B
SER
are propositions. Connectives used in proposition logic can be seen in
the Table 3.5.1.
Table 3.5.1: Connectives used in Propositional logic ES RE
Rn Not 7A Negation
— > Implies A=>B Implicati
plication / conditional
Pn is equivalent/ if and ASB Biconditional
~~ only if
W TechKaowladga
Publications
W Al and Ds-] (MU) 3-13 Knowledge Representation using First Order Logic
connectives.
* To define logical connectives truth tables are used. Truth table 3.5.2 shows five logical
Table 3.5.2 re
* Take an example, where A B, i.e. Find the value of A B where A is true and B |s false. Third row ofthe Table
3.5.2 shows this condition, now see third row of the third column where, A A B shows result as false. Similarly
other logical connectives can be mapped in the truth table.
3.5.2 Semantics
° World is set of facts which we want to represent to form propositional logic. In order to represent these facts
propositional symbols can be used where each propositional symbol's interpretation can be mapped to the real
world feature.
* — Semantics of a sentence is meaning of a sentence. Semantics determine the interpretation of a sentence. For
example : You can define semantics of each propositional symbol in following manner :
1. Ameans “It is hot”
2. Bmeans “It is humid”, etc.
* — Sentence is considered true when its interpretation in the real world is true. Every sentence results from a finite
number of usages of the rules. For example, if A and B are sentences then (A 4 B), (Av B), (B—> A) and(A_B)
are sentences. The knowledge base is a set of sentences as we have seen in previous section.
e Thus we can say that real world is a model of the knowledge base when the knowledge base is true for that
world. In other words a model can be thought of as a truth assignment to the symbols.
¢ If truth values of all symbols in a sentence are given then it can be evaluated for determining its truth value (ie.
we can Say if it is true or false).
3.5.3 What is Propositional Logic ?
e A«aBand BAA should have same meaning but in natural language words and sentences may have different
meanings. Say for an example,
1. Radha started feeling feverish and Radha went to the doctor.
2. Radha went to the doctor and Radha started feeling feverish.
e Here, sentence 1 and sentence 2 have different meanings.
e In artificial intelligence propositional logic ts a relationship between the truth value of one statement to that of
the truth value of other statement.
3.5.4 PL Sentence - Example
Take example of a weather problem,
e Semantics of each propositional symbol can be defined as follows:
o Symbol Aisa sentence which means “It is hot”.
WF Pubticaciens
iectovsietst
Al and DS-1 (MU)
> symbol Bisa sentence which Means “It | — Knowledge Representation using First Order Logic
symbol C is a sentence
which Means “It
js rainin
| Wecanalso choose symbols which are easy to
5 HT for "It is hot”, unde rstand, like:
o HM for “It is humid”,
o RN for “It is raining”.
————
True True False Not Valid
True True True Valid
* Now, if the knowledge base is [HM, HM HT, (HT A HM )— RN][(Le. [“It is humid”, “If itis
humid, then itis hot”,
‘If itis hot and humid, then it is raining”] ), then “True -True
- True” is the only possible valid model.
Tautology and Contradiction
Tautology means valid sentence. It is a sentence which is true for all the
interpretations. For example :
(A wA) (“A or not A”): “It is hot or It is not hot”
Contradiction means an inconsistent sentence. It is a sentence which is false for al ] the
interpretations.
For example : A A 7A ("A and not A’) : “It is hot and it is not hot.”
X entails Y, is shown as X |= Y. It means that whenever sentence X is True, sentence Y will
be True,
For Example : if, X= Priya is Pooja's Mother's Sister and Y = Priya is
Pooja's Aunty. Then X |= Y (X entails Y).
35.5 Inference Rules
New sentences are formed with the logical inference. For example : If A = B and B = C then A = C. You must have
Come across this example many times it implies that if knowledge base has
“A = B” and “B = C” then we can infer
that “A = ¢”,
hh
W Tech Knowledge
Publications
EE ERT BS “~y
* In short inference rule says that new sentence can be create by logically following the set of sentences of
knowledge base.
Table 3.5.3 : Inference Rules
XOZBYOTZ X=¥
Substitution
X 3 Y, YZ XZ
Chain rule
XY ~X3~Y
Transposition
* — Entailment is represented as : KB |= Q and Derivation is represented as : KB |- Q.
° There are two types of inference rules:
1. Sound inference
Complete inference
1. Sound inference
e Soundness property of inference says that, if “X is derived from the knowledge base” using given set of
protocols of inference, then “X is entailed by knowledge base”. Soundness property can be represented as:
“If KB |- X then KB |= X”.
e For Modus Ponens (MP) rule we assume that knowledge base has [A, A — B], from this we can conclude
that knowledge base can have B. See following truth table :
ay
In general,
e Foratomic sentences p,, p;’, and q, where there is a substitution 0 such that
Example :
A :Itis rainy.
B_ :] will stay at home.
A—B: If it is rainy, I will stay at home.
¥ Aland DS-] (MU)
—_— 3-16 Knowledge Representation using First Order Logic
Modus Tollens ————————————
when B is known to be false, and if there
is a rule “IFA, then B,” it is valid to conclude
that A Is also false.
2, Complete inference
° seComplete
snag inferen
7 so on of soundness, Completeness property of inference says that, if “X is entailed
| can be derived from the knowledge base” using the inference
protocols.
e Completeness property can be represented as : "If KB |= Q then KB |- Q”
3.5.6 Horn Clause
e nn
Seer written as sets of literals. Horn clause is also called as horn sentence. In a horn clause a
conju or more symbols is to the left of “-y" and 0 or 1 symbols
to the right. See following formula :
AyAA2AA3 « AA;—B,, Where n >=
0 and m is in range{0,1}
|
e There can be following special cases in horn clause in
the above mentioned formula :
o Forn=0andm=1:A (This condition shows
that assert A is true)
o Forn>0Oand m=0:AAB- (This constraint shows that both
A and B cannot be true)
o Forn=0and m=0: (This condition shows empty
clause)
¢ Conjunctive normal form is a conjunction of clauses and by its set of clauses it is determined up to equivalence.
For a horn clause conjunctive normal form can be used where, each sentence is a disjunction of literals with at
most one non-negative literal as shown in the following formula: —A,v7A,VA; ... VnA,VB
¢ This can also be represented as : (A B)= (—A vB)
Significance of horn logic
¢ Horn sentences can be used in first order logic. Reasoning processes is simpler with horn clauses. Satisfiability of
a propositional knowledge base is NP complete. (Satisfiability means the process of finding values for symbols
which will make it true).
¢ For restricting knowledge base to horn sentences, satisfiability is in A. Due to this reason, first order logic horn
sentences are the basis for prolog and datalog languages.
¢ Let's take one example which gives entailment for horn formulas.
¢ Find out if following horn formula is satisfiable?
(true X) a (X AY Z) A (Z! W) a (ZA W- false)a (true >Y)
* From the above equation, we entail if the query atom is false. Equation shows that there are clauses which state
that true >X and true >Y, so we can assign X and Y to true value (i.e. true >X AY).
* Then we can say that all premises of X , Y->Z are true, based on this information we can assign Z to true. After
that we can see all premises of ZW are true, so we can assign W to true.
* As now all premises of Z A W — false are true, from this we can entail that the query atom is false. Therefore, the
horn formula is not satisfiable.
4. | (HTAHM)ORN | Premise(initial sentence) “If it’s hot and humid, it's raining” _
5. | HTAHM And introduction(1,3) “It’s hot and humid” _|
¢ — Propositional logic forms the foundation for higher logics like First Order Logic (FOL), etc.
* — Propositional logic is NP complete and reasoning is decidable.
¢ The process of inference can be illustrated by PL.
TRIACS 8 NS gs
Q. Short note on predicate logic.
* — Because of the inadequacy of PL discussed above there was a need for more expressive type of logic. Thus First-
Order Logic (FOL) was developed. FOL is more expressive than PL, it can represent information using relations,
variables and quantifiers, e.g., which was not possible with propositional logic.
o “Gorilla is Black” can be represented as :
Gorilla(x) Black(x)
—s
WEF Techinewledst
Publicaciess
WF Aland DS-1 (MU)
Logic
3-18 Knowledge Representation using First Order
° "It is Sunday y today””
today can be represented as; —
today(Sunday)
First Order Logic(FOL)is
also called as First Ord
Soe
er Predi cate Logic (FOPL). Since FOPL is much more
expressive as a knowledge representation langu
Sweet
age than PL it j s more commonly used in artificial intelligence.
3.6.1 Syntactic Elements, Semant
ic and Syntax
« _ Assuming that “x” is a domain
gti
of values, we can define a term with following rules :
1. Constant term : It is a term with fixed value
which belongs to the domain.
2, Variable term: It is a term which can be assigned
values in the domain.
3. Function : Say “f" is a function of “n” arguments. If we assume that t;, tz, ..t, are
terms then f(ty tz, » ta) is also
called as a term.
oD be abs ed en
variables are quantified is called as a “well-formed formula’.
o Every ground term is mapped with an object.
Be
o Every condition (predicate) is mapped to a relation.
ONS ee Le
o Aground atom is considered as true if the predicate’s relation holds between the terms’ objects.
° Rules in FOL : In predicate logic rule has two parts predecessor and successor. If the predecessor is
evaluated to TRUE successor will be true. It uses the implication > symbol. Rule represents If-then types of
sentences.
e Example : The sentence “If the bag is of blue colour, | will buy it.” Will be represented as colour (bag, blue)>
buy(bag)
Quantifiers
Apart from these connectives FOPL makes use of quantifiers. As the name suggests they quantify the number of
variables taking part in the relation or obeying the rule.
1. Universal Quantifier ‘ V’
® Pronounced as “for all” and it is applicable to all the variables in the predicate
¢ “vx A” means Ais true for every replacement of x.
¢ Example: “Every Gorilla is Black” can be represented as:
“Yx (Gorilla(x) Black(x))
TechKnewladge
Publications
y
Representation using First Order L
WF Aland DS-I (MU) 3-19 Knowledge Bie
Note:
1. | PLcan not represent small worlds like vacuum | FOL can very well represent small worlds’ problems.
!
cleaner world.
3. | Propositional Language uses propositions in | FOL uses predicated which involve constants, variables,
which the complete sentence is denoted by a | functions, relations.
symbol.
4. | PL cannot directly represent properties of | FOL can directly represent properties of individual entities or
individual entities or relations between | relations between individual entities using individual predicates
individual entities. e.g. Meera is short. using functions. E.g. Short(Meera)
5. | PL cannot express specialization, | FOL can express specialization, generalizations, or patterns, etc.
generalizations, or patterns, etc. | Using relations. E.g. no_of_sides(rectangle, 4)
e.g. All rectangles have 4 sides.
8. | PLassumes the world contains facts FOL assumes the world contains objects, relations, functions like
natural language.
9, | In PL Meaning of the facts is context- | In FOL Meaning of the sentences is context dependent like
independent unlike natural language. natural language.
———— |
oat
Wate Knowledge
publicat
W Aland DS-I (MU)
3-20 Knowledge Representation using First Order Logic
—————
al
some cycle limit is met g me Memory. , This continues untili no more rules can be applied or
TA
e For exaniple, "If it is raining then, we will take umbrella”. Here, “it is
raining” is the data and “we will take umbrella” is a decision. This means it
was already known that it’s raining that’s why it was
decided to take
Sa
umbrella. This process is forward chaining.
Baer
Given :
o Rule: human(A) > mortal(A)
o Data: human(Mandela)
e To prove: mortal(Mandela)
Forward Chaining Solution
o Human (Mandela) matches Left Hand Side of the Rule. So, we can get A = Mandela
o based on the rule statement we can get : mortal(Mandela)
¢ Forward chaining is used by the “design expert systems”, as it performs operation in a forward direction (i.e.
from start to the end).
Example
* — Consider following example. Let us understand how the same example can be solved using both forward.
* Given facts are as follows:
1, It isacrime for an American to sell weapons to the enemy of America.
2. Country Nono {s an enemy of America.
3. Nono has some missiles.
West.
4. All the missiles were sold to Nono by Colonel
5. Missile is a weapon.
6. Colonel West is American.
* We have to prove that West isa criminal.
* Let's see how to represent these facts by FOL.
to the enemy nations,
1. [tis acrime for an American to sell weapons
America) => Criminal (x)
American(x) A Weapon(y) A sell (x, y, 2) A enemy(z,
—____ Ww TechKnowledge
Publications
ES Ne
(SA
_
i Missile(x)=> weapon(x)
6. Colonel West is American.
American (West)
University Question
Q.. . Describe backward chaining algorithm with an example. ara ay 14, Dec. 19
e If based on the decision the initial data is fetched, then it is called as backward chalning. Backward chaining or
goal-driven inference works towards a final state, and by looking at the working memory to see if goal already
there. If not look at the actions (THEN-parts) of rules that will establish goal, and set up sub-goals for achieving
premises of the rules (IF-part). This continues until some rule can be applied, apply to achieve goal state.
e For example, If while going out one has taken umbrella. Then based on this
decision it can be guessed that it is raining. Here, “taking umbrella” is a decision
based on which the data is generated that “it's raining”. This process is
backward chaining. “Backward chaining” is called as a decision-driven or goal-
driven inference technique.
wisn
een’
puoticall
Aland DS-I (MU)
g
Knowledge Representation using First Or er Logic
3-22
= Given :
o Rule: human(A) 4 mortal(A)
o Data: human(Mandela)
» Toprove : mortal (Mandela)
packward Chaining Solution
« _ mortal (Mandela) will be matched with
la) which is also a
mortal (A) which gives human (A) ie. human (Mandela)
given fact. Hence proved,
e Itmakes use of right hand side matchin
8. backward chaini tic expert systems”, because
it performs operations in a backward di rection (i.e, ning Is used by the “diagnostic exp
from end to start).
Example
Sate
Proof by backward Chaining
di
The proof will start from the fact to be proved. And as we
can map it with given facts, it will lead us to the solution.
em
Please refer to Fig. 3.8.4. As we observe, all leaf nodes of the proof are given facts that means “West is Criminal”
.
SS 1
<Criminal (west) .
aD
Fb
(8) | [Sell (west, x,2)| [Enemy(Nono, Amarioay]
True UN True
Missile (0| [Missile
()] [fous (Nona) x)
L
True
|
True True
Fig. 3.8.4 : Proof by backward Chaining
Ex. 3.8.1: Using predicate logic find the course of Anish’s liking for the following :
(i) Anish only likes easy courses. (ii) | Computer courses are hard.
(iii) All electronics courses are easy (iv) DSP is an electronics course.
Soln. :
Ww TithKuowledga
Publications
yr —— sere i cS FMEA RURIE BET Ak SEB A CAN ee
likag (Anish, x)
|
| |
True True
Fig. P. 3.8.1
3.10.1 Unification
The processes of finding legal substitutions that make different logical expressions look identical. The unification
algorithm is a recursive algorithm; the problem of unification Is : given two atoms, to find if they unify, and, if they do,
return an MGU (Most General Unifler) of them.
WF Techiinewledye
Publicatioas
>
ys Aland DS-I a
3-24 Knowlede ne aie First ries =
ont katte
pee Inputs Me pte o
ta -tytat atoms Output
a
pee
~ > Most general unifier of ty and ty if it exists or 7 otherwise 4
Pep ren
Local i
a
E: a set of equality statement
s
ha a3
replace x with y everywhere inin Ean = |
dS E
| SH} US Be
else if vy isa variable) then
Re teplace y with xeverywhere i in E and Si
King(Ram)
Brave(Ram)
¢ We get an ‘x’ where, ‘x’ is a king and ‘x’ is brave (Then x is noble) then ideally what we want is 9= {substitution
set}
ie. O={x/ Ram}
Hence, Ram Unifies x.
3.10.2 Lifting
* For atomic sentences p, pr, and q, where there is a substitution 6 such that
foralll,
SUBST (@,pi) = SUBST (®,p')
D,sPy» Pye=P,y (PyA yA Pus » Py => 4)
SUBST (©, q)
TechKnowledya
Publications
ne ARS AU a
W Al and DS-1 (MU) 3-25 Knowledge Representation using First Order Logic
®
N +1 premises = N atomic sentences + one implication.
Applying SUBST(0, q) produces the conclusion we seek.
P,= King(Ram) p= Brave(y)
P,= King(x) P, = Brave(x)
| 8 = {x / Ram, y/ Ram) q = Noble(x)
SUBST(6,q) is Noble(Ram)
* — Generalized Modus Ponens is a lifted version of Modus Ponens. It raises Modus Ponens from ground (variable.
free) propositional logic to first-order logic. Hence it is called as !fting.
° Here “lifted” indicates transformed from.
¢ The major advantage of lifted inference rules over proposit
ional logic is that only those substituttons'are made
that are required so as particular inferences are allowed to proc
eed. ——.
Ex. 3.10.1 ~ Represent following sentences in FOL using a consistence vocabula
ry. Seon ah Sear
(i). Every person who buys a policy is smart. : oe
(ii) No person buys an expensive policy. j
(ii). There is an agent who sells Policles only to people who are not
insured.
{ivy There iisa barber who shaves all men in town who do not save themselves. sy
Soin. :
(i) Wx Vy: person (x) a policy (y) a buys (x, y) > smart (x)
(ii) V x, Vy: person (x) A policy (y) A expensive (y) > ~ buys (x, y)
Ex.3102: FR tepresent oF
DEE
Soin.:
Ex. 3.10.3: Witte first order logic statements for following statement:
(i) - Ifa perfect square Is divisible by a prime p ‘then I Isalso divisible by square of a
(ii) Every perfect square Isdivisible by some prime,
(iil) Alice does not like chemistry and history.
(iv) If it is Saturday and warm, then sam is In the park, =
“(v)-. Anything anyone eats andisnotkilledbyisfood, a
Te TechKaowlodge
Publicacions
ww Aland DS-I (MU)
3-26 using First Order Logic
Knowledge RepresentationS S
soln. :
(i) wx: square(x) A prime (y) divides (p, x) > [3z: : square_ooft,
ff p) Adivides(z, x)]
vx Sy: squar e(x) A divides (p, x)
(ii)
(iil) ~ likes(Alice, History) 4 ~ likes (Alice,
Chemistry)
(iv) day(Saturday) 4 weather(warm) > {n park(Sam)
vx: Vy: person(x) a eat
) s(x, y) A~killed (x) >
food(y)
3.11
—_
Resolution
University. Question
We have seen thata literal is an atomic symbol or a negation of the atomic symbol (i.e. A, A).
Resolution se only interference rule you need, in order to build a sound (soundness means that every
sentence produced by a procedure will be “true”) and complete (completeness means every “true” sentence can
be produced by a procedure) theorem proof maker.
Take an example where we are given that:
o Aclause X containing the literal : Z
o Aclause Y containing the literal : + Z
Based on resolution and the information given above we can conclude :
(X - {Z}) U (Y - (42})
Take a generalized version of the above problem :
Given :
o Aclause X containing the literal : Z
o Aclause Y containing the literal : -Y
o Amost general unifier G of Zand -Y
Let knowledge base be a set of true sentences which do not have any contradictions, and Z be a sentence that we
want to prove.
The Idea is based on the proof by negation. So, we should assume ~Z and then try to find a contradiction (You
must have followed such methods while solving geometry proofs). Then based on the Intuition that, if all the
and assuming 4Z creates a contradiction then Z must be inferred from
knowledge base sentences are true,
knowledge base. Then we need to convert knowledge base U {7Z} to clause form.
Z is proved. Terminate the process after that.
If there is a contradiction in knowledge base, that means
knowledge base. If we do not find any
Otherwise select two clauses and add their resolvents to the current
terminate. Else, we have to start finding if there is a
resolvable clauses then the procedure fails and then we
contradiction in knowledge base, and so on.
—.
I se TechKnewledge
Publicatians
i st 0 rder Logic
usii ng Fir
ReprTve sentation
Knowledge
* 7
3-2
3.11.2
, : Conversion
- from FOL Clausal Normal Form (CNF)
Soin. :
FOL:A—>(B#C)
Normalizing the given statement.
(i) A> (BO CAC B)
o _ Our goal is to show that X always wins with the help of resolution.
1, H=>Win(X)
2. T=>Loose(Y)
3, =H=>T
3, {H,T)
4, {-Loose(Y), Win(X))
5, {a Win(X)}
6. {aT, Win(X)}
ww (From 2 and 4)
7. {T, Win(X)} + (From 1 and 3)
g. {Win(X)}
wm (From 6 and 7)
9 BO
(From 5 and 8)
3.11.4 Example
Let’s take the same example of forward and backward chaining to learn how to write proofs for resolution.
Step 1:
e Owns (Nono, x)
e Missile(x)
3. Owns (Nono, x)
Tech Knewladga
Publicarions
=
4. Missile(x)
ie
ys
6. ~Missile(x) V weapon(x)
7. American (West)
ta re
Step 3;
To prove that West fs criminal using resolution.
Seat
i x /West
ah
is ~ American (West) v ~ weapon (y) v enemy (Nono, America)
;4 ~ sell (West, y, z) v ~ enemy (z, America)
z/Nono
NIL
t West is criminal.
Ke
Coe aan
Tepe
Tork Knowledge
Publicatiens
ay sland DS. (U)
3-30 Knowledge Representation using First Order Logic
ee
soln. '
In this case we have to add few common sense predicate which are always true.
(g) ¥ x:~killed (x) — alive (x)
\ Ze
~ food (Peanuts) ~ eats (x, y) V killed (x) v food (y)
y/Peanuts
x/Ajay
Killed (Ajay) ~ alive (x) v killed (x)
\Au
~ alive (Ajay)
x/Ajay
alive (Ajay)
\
>
72!
Tech ilnowled
Publications
Knowledge Representation using First Order Log
3-31 __Knov ¢
WF Aland DS-1 (MU)
nuts”,
ce proved that “Ravi likes Pea
==
eats(Rita, x)
eats(Ajay; x)
x /Peanuts
True True
@). Wy to cross the dich sto ump SEG IER SUE ies Sige sau Stent tage ono
-(e)., A log is across the Tight road. SS
“(One needs to jump across the log,to go ahead.
Soln.:
pupil
Ww Aland DS-I (MU)
—_ 3-32 Knowledge Representation using First Order Logic
(bia) ~At(x, temple) v take left (x)
(bas) ~ At temple) v At (x, PostBox)
(b2a) ~ At (x temple) v take right (x)
x/Ram
~ At (Ram, log) ~ take right (x) v at (x, log)
x/Ram
~ take right (Ram) ~ At (x, temple) v take Tight (x)
\—
~At(x,temple) At (Ram, temple)
o
Hence proved.
W TechKnowledge
Publications
A I TT a
~ Barks (Rimi)
Barks (Rimi)
a v ~ hungry (Rimi)
~ hungry (Rimi)
hungry
This shows that our assumption is Wrong. Hence proved that maya is Angry:
To prove that
4. guilty (butler) > got_cream (butter)
Step 2 : Converting FOL to CNF.
1. ~steal (maid, jwellary) v ~ guilty (butler)
2. steal (maid, jwellary) v milk (maid, cow)
3. ~milk (maid, cow) v got_cream (butler)
4. ~guilty (butler) v got_cream (butler)
Step 3 : Negate the proof sentence
TechKuowledys
Publications
W Aland Ds-I (MU) 3-34 Knowledge Representation using First Order Logic
step 4: Proof by resolution av got cream (butter)
guilty (butler)
~ milk (maid, cow)
Vv got_cream (butler)
~ guilty (butler)
Vv ~ steal (maid jwellery)
>
Hence proved.
aoe
ee a
eens se
ESS
ee
(ii) Vx: happy (x) > smile (x)
Proof by resolution
TechKeowladga
Publications
a ee ee eh LE eS
NG
W Aland DS-1 (MU) 3-35 Knowledge Representation using First Order Lop,
~ smile (x3)
~ happy (x1) v
mils in
X3 | X1
~ happy (x1)
~ graduating (x)
Vv happy(x)
xy |X
~ graduating (x) graduating | (x2)
x |X,
%
Hence our assumption is wrong.
Hence proved.
¢ Planning in Artificial Intelligent can be defined as a problem that needs decision making by intelligent systems
{it can be a robot/ a computer program) to accomplish the given target.
¢ Sometimes even a human being cannot perform two tasks at a same time if tasks have same importance level. In
that case, you have to put tasks in a sequence to accomplish the target.
¢ Take example of a driver who has to pick up and drop people from one place to another. Say he has to pick up
two people from two different places then he has to follow some sequence, he cannot pick both passengers at
same time.
e Based on these facts there is one more definition of planning which says that, Planning is an activity where
agent has to come up with a sequence of actions to accomplish target.
¢ Now, let us see what information is available while formulating a planning problem and what results are
expected.
¢ We have information about the initial status of the agent, goal conditions of agent and set of actions of an agent.
¢ Aim of an agent is to find the proper sequence of actions which will lead from starting state to goal state and
produce an efficient solution.
e Say we have an agent which can be a coffee maker, a printer and a mailing system, also assume that there are 3
people who have access to this agent.
a
e Suppose at same time if all 3 users of an agent give a command to execute 3 different tasks of coffee making,
printing and sending a mail.
i a hl
e Then as per definition of planning, agent has to decide the sequence of these actions.
Tech
Publications
a
PU i
’
Ww Aland DS-I (MU)
Knowledge Representation using First Order Logic
Fig. 3.12.2 depicts a general diagrammatic representation of a planning agent that interacts with environment
with its sensors and effectors/actuators, When a task comes to this agent it has to decide the sequence
of actions
to be taken and then accordingly execute these actions.
Sensors
eee
ee
ee
Environment
?
SST
Effectors
CS
St
5S
Fig. 3.12.2 : Planning agent
e Example : You must have solved word problems in school. We can create an agent which generates a solution
for word problems, It splits a sentence and follows logical representation,
e We plan activities in order to achieve some goals and to achieve the goal we should select appropriate actions,
also we can divide main goal into sub-goals to make planning more efficient.
* Take example of a grocery shopping, suppose you want to buy milk, bread and egg from supermarket, then your
initial state will be - “at home” and goal state will be - “get milk, bread and egg”.
vw Aland DS-I (MU) 3-38 Knowledge Representation using First Order Logic
Now if you look at the Fig. 3.14.1
qe setofactions, for 2.6. depending upon
you will understand that branching factor can b e enormous depending up
) for e.g. Watch TV, read book, , etc “p sai e at that point of time.
90. to school
pealscdasaiaes attend lactura
CTY NE ie nbiy me
~.. glean,
ie eee
0 to. supermarke
i a 90 {0 supermarket. a) buy
NB We Bhapple
etal Sel)
a Wateh TV.eae
Hee
Conditional planning.
Planning with operators.
oo
Tech Knowledge
Puolicacians
3-39 Knowledge Represen tation using First Order Ly gic
¥ Al and DS-I (MU)
Oo Planning with graphs.
© Planning with propositional logic.
© Planning reactive.
* Out of these major approaches we will be learning about following approaches in etal
© Planning with state space search
Partial ordered planning and
0
Conditional planning.
Oo
Goal(Have(Apple) « Ate(Apple))
0
EFFECT : Have(Apple))
o
Lo Ag Ly
ser TechKuowledge
Puotications
Wf Al and DS-I (MU) Knowledge Representation using First Order Logic
3-40
» _ Start at level LO and determine action level AO and next level Li
AO >> all actions whose prerequisite
is satisfled at previous level
9
Level L1 contains all literals that could result from picking any subset of actions in Level AO.
Conflicts between literals which cannot occur together (as a effect of selection action) are represented by
o
e Oneaction cancels out the effect of another action. e =f onee literal is5 the 5 negation of the “other + ltera
OR. .
ney TechKuewledge
Publications
sentation using First Order Logic
3-41 Knowledge Repre may
WF Al and DS-I (MU)
he ll
a mail and making coffee
We have seen example of an agent that can perform three tasks of printing, sending
namely lets’ call this agent as office agent.
: three people at a same time to perform these three different tasks then,
When this office agent ° gets order from
finite space.
let us see how planning with state space search problem will look | f we have a
a
When he gets
n the state space grid.
You can understand from Fig, 3.16.1 that the office agent is at location 2500
ly in lesser time.
a task he has to decide which task can be performed more efficient
; f printing task then the
If it finds some input and output locations nearer on state space grid for example in case
probability of performing that task will increase.
igning tasks
But to do this it should be aware of it’s own current location, the locations of people who are assigning and
the locations of the required devices.
it requires complete description of
State space search is unfavourable for solving real-world problems because,
every searched state, also search should be carried out locally.
There can be two ways of representations for a state :
1. Complete world description
University Question .
oo Boy
ere
0.
——_—
Explain Water Jug problem with State Space Search Method. ee
Let us take an example of a water jug problem kers on it. There is a pump
gall on one and a 3- gall on one. Neither has any measuring mar
» We have two jugs, a 4 - r into the 4 - gallon jug?
r how can you get exact 2 gallons of wate
that can be used to fill the jugs with wate y), such that x = 0,1,2,3 or 4,
this prob lem can be desc ribe d as of ordered pairs of integers (x,
The state space for the quantity of
e
of wate r in the 4- gall on jug and y = 0,1,2 or 3, representing
ons
representing the number of gall state is (2, n) for any value of
n, since the problem
start state is (0, 0). The goal
water in the 3- gallon jug. The
to be in the 3- gallon jug.
does not specify how many gallons need ented as rules
shown bellow. They are repres
to solv e the problem can be described as
e The oper ator s to be used the new state that results
mat che d agai nst the curr ent stat e and whose right sides describe
whose left side are
from applying the rule.
Rule set :
Ifx<4
2. (xy) ——> (%3) fill the 3-gallon jug
Ifx<3
the 4- gallon jug
3. (xy) ——» (x-dy) pour some water out of
Ifx>0
gallon jug
4. (xy) ——» (x-dy) pour some water out of the 3-
Ify>0
ground
5. (xy) ——» (Oy) empty the 4- gallon jug on the
Ifx>0
jug on the ground
6. (xy) ——» (x,0) empty the 3- gallon
Ify>0
jug into the 4- gallon
7. (xy) ——> (4,y-(4-x)) pour water from the 3- gallon
jug is full
lfx+y>=4andy> 0 jug until the 4-galoon
into the 3-gallon
(xy) ——» (x-(3-y),3)) pour water from the 4- gallon jug
oo
TechKnewledya
Publications
W Aland DS-I (MU) Knowle
Representation using First Order Log;
3-43
SSS
0 3
3 0 9
3 3 2
4 2 7
0 2 5or12
2 0 9or11
One solution to the water jug problem.
(x, y)
(0,0)
: 4,0) 3)
(4,3) (1, 3)
I~
43) 3) 0 49%
(40)
T~~
(4,3) 0,0) 1)
(4, 0) (0,3)
~~ (0,0) (0, 1)
(4.3) Tw
1) (2,3)
3) 3) @o) 41)
Fig. 3.16.3
On
UF Techkawwiete
pupiscariees
vw a} and DS-1 (MU) 3-44 Knowledge Representation using First Order Logic
3,17 Classification of Planning with State Space Search
—_—_—
As the name suggests state space search planning techniques is based on the spatial searching.
Planning with state space search can be done by both forward and backward state-space search techniques.
Planning ne State
oe oa. Search Th
eas tier
Flight 1 at LocationA
Bete Srees ipa Hy
H Li A.
riety) ng
WP Techknowindge
Publications
3-45 ers
Knowledge Represen tation using First gic
. Order Logic
W Aland DS-1 (MU)
Progression planner algorithm :
1, Formulize the state space search problem :
h don't
‘ siti ve, the literals whic
* Initial state is the first state of the planning problem which has a set of po
appear are considered as false,
onditio ns are satisfied then
* — If preconditions are satisfied then the actions are favoured
i.e. if the prec deleted
ete for that action.
positive effect literals are added for that action else the negative effect literals are de
* — Perform goal testing by checking if the state will satisfy the goal.
e Lastly keep the step cost for each action as 1.
2. Consider example of A* algorithm, A complete graph search is considered as a complete planning algorithm. Functions
are not used.
3. Progression planner algorithm is supposed to be inefficient because of the irrelevant action problem and requirement
of good heuristics for efficient search.
Tech Knowledge
Publications
4 DS-1 (MU)
Alay 3-46 K
:
Order Logic
nowledge Representation using First
Re gression algorithm
i should not undo preferred literals, If there are positive effects of actions which appea! in
i ust be c
2, eh "he ae eteea
goal then they eleted. Otherwise Each precondition literal of action is added, except it already appears.
3, Main advantage of this method is only relevant actions are taken into consideration. Compared to forward search,
backward search method has much lower branching factor.
Progression or Regression is not very efficient with complex problems. They need good heuristic to achieve
better efficiency. Best solution is NP Hard (NP stands for Non-deterministic and Polynomial-time).
e There are two ways to make state space search efficient :
or predecessors.
o Use linear method : Add the steps which build on their immediate successors
ordering constraints are imposed
o Use partial planning method : As per the requirement at execution time
on agent.
Ui erect | Questions
¢ For example : Set of Ordering ={Right-sock < Right-shoe; Left-sock < Left-shoe} that is In order to wear
shoe, first we should wear a sock.
——
ww TechKnewladge
Publicariens
Ww Aland DS-1 (MU) 3-48 K nowledge i Order
Representation using First Logic
Logi
a
» so the ordering constraint can be W ear Left-sock < wear Left-shoe (Wearing Left-sock action should be
taken before wearing Left-shoe) 0 t Wea .
be taken before wearing right-shoe). ions ™
» _ If constraints are cyclic then it represents Inconsistency
wan to have a consistent plan then there should not be any cycle of preconditions.
e _ Ifwe want
3, Set of causal links :
» Action A achieves effect "E" for action B
(a) Action A Eteal + ActionB
E2408
(o) [ Buy Apple | Gut oe
Fig. 3.21.2(a) : Causal Link Partial Order Planning (b) Causal Link Example
an apple it’s effect can be eating an apple and the
© From Fig. 3.21.2(b) you can understand that if you buy
precondition of eating an apple is cutting apple.
ng constraints
has an effect 7 E and, according to the orderi
e There can be conflict if there is an action C that
it comes after action A and before action B.
This action can
of that we want to make a decorative apple swan.
* — Say we don’t want to eat an apple instead
be between A and B and It does not have effect "E".
ock > Leftsockon
t-sock->Right-sock-on — Right-shoe, Lefts
o For example: Set of Causal Links = {Righ
Finish, leftshoe> leftshoeon > Finish }.
— Leftshoe, Rightshoe — Rightshoeon >
conflicts with the causal links.
o To have aconsistent plan there should not be any
4. Set of open preconditions :
plan. Least commitment
t be achieved by some actions in the
e Preconditions are called open if it canno
e during search.
strategy can be used by delaying the choic
be any open precondition
e Tohaveaconsistent plan there should not
Problem
3.21.2 Consistent Plan is a Solution for POP
causal links and does not
Asconsistent plan does not have cycle of constraints; it does not have conflicts in the
e
a solution for POP problem.
have open preconditions so it can provide
preconditions in order
While solving POP problem operators can add links and steps from existing plans to open
e steps for removing the
to fulfill the open preconditions and then steps can be ordered with respect to the other
try solving the problem
is unattainable, then backtrack the steps and
potential conflicts. If the open precondition
with POP.
se with the help of POP we can progress from vague
e Partial ordered planning is a more efficient method, becau
we can solve a huge state space plan in less number of
plan to complete and correct solution ina faster way. Also
when sub-plans interact.
steps, this is because search takes place only
For example:
move (X,¥,2)
Take-right
|
Take-tran Take-Bus
ae
Catch (Train) Leave (Tran, dest.)
Goto (tran, Source) Buy-Ticket (Train)
_ Start web
¢4: browser:
“Open Indian ]
Ralvays webate
~
ag,Select, date —
ees
3.22.3 Planner
e.
Patch major levels as detail actions become visibl
Finally demonstrate.
Example :
can be given as follows :
Actions required for “Travelling to Rajasthan”
e Opening yatra.com (1)
e Finding train (2)
e Buy Ticket (3)
e = Get taxi(2)
e Reach railway station(3)
e Pay-driver(1)
e =Checkin(1)
e Boarding train(2)
e Reach Rajasthan (3)
TechKnewledge
Publications
Logic
Knowledge Representation using First Order =
WE Aland DS-1 (MU) 3-51
Goal Wf
afore “”
poe eee
1 EXTERNAL
} INTERFACE |
*Criticality" :
j «& Maximum !
; H
| Preconditions of
| Dummy «- Goal Wif |
7 |
}
| "Skeleton Plan”
« Dummy !
I
I
L $s seer ton om tem seo Coe eee
TT
a ee ae a ee mn me 4 oa ee
!
| PLANNING
| EXECUTIVE Set State to Initlal
World Model |
| j
\
F
!
| Is
i
\ No “ "Skeleton \ Yes
Plan" |I
|
1 ‘ null |
I s ?
"Step" < First step of Is y I
"Critically" \V 28S ;
i "Skeleton Plan" j
minimum
| "Skeleton Plan" <— Rest
of "Skeleton Plan"! » 2
| No
\ }
Determine '
Plan to Achieve a State j
= Lower, *Criticality* us
1 in which Preconditions of i
the Operator that was RUE re
| Applied in "Step" are True t
Collect Steps Along j
| Successful Path into New 1
pt XR ee eens
nee
' / Resume Process in ‘ ofgSkeleton Plan’.
} 4 I
1 | Higher Abstraction
| Space, Forbidding theey | Oe peonea ~
! @ Choice of "Step" { hai ee \
1* Jevel plan:
—<
Tech Knowledge
Pubtications
rs
(3).
inding train (2), Buy ticket (3), Get taxi(2), Reach railway station (3), Boarding train(2), Reach Rajasthan
gr level plan (final) :
Railway station (3), Pay-driver(1), Check
Opening yatra.com (1), Finding train (2), Buy ticket (3), Get taxi(2), Reach
in(1), Boarding train(2), Reach Rajasthan (3).
nguages
3.3,2293 7Planning La
Wirinccuesh Question
MU - May 13, Dec. si
@. __ Explain STRIPS representation of planning problem.
problems and restrictive enough to allow
» _ Language should be expressive enough to explain a wide variety of
efficient algorithms to operate on it.
e Planning languages are known as action languages.
2. | Makes use of closed-world assumption (i.e. Makes use of Open World Assumption (i.e.
Unmentioned literals are false) unmentioned literals are unknown)
3. | We only can find ground literals in goals. We can find quantified variables in goals.
For example: Intelligent A Beautiful. For example : 4x At (P1, x) A At(P2, x) is the goal of
having P1 and P2 in the same place in the example of
) the blocks
5. | Effects are conjunctions Conditional effects are allowed: when P:E means E js
|
an effect only if P is satisfied
a
Equality predicate (x == y) is built in.in
6._| Does not support equality.
Supported for types
7 | Does not have support for types
For example : The variable
_|
p: Person
Start Goal
Fig. 3.23.1
Standard sequence of actions is :
1. Grab Zand Pickup Z
2. Then Place Z on the table
3. Grab Y and Pickup Y
4. Then Stack Y onZ
5. Grab X and Pickup X
6. Stack XonY
e Elementary problem is that framing problem in AI is concerned with the question of what piece of knowledge or
information is pertinent to the situation.
¢ To solve this problem we have tom make an Elementary Assumption which is a Closed world assumption. (i.e. If
something is not asserted in the knowledge base then it is assumed to be false, this is also called as “Negation by
failure”)
e Standard sequence of actions can be given as for the block world problem :
on(Y, table) on(Z, table)
on(X, table) on(Y, Z)
on(Z, X) on(X, Y)
hand empty hand empty
clear(Z) clear(X)
clear(Y)
a, Tool rh Tool
HA [y] FA
Start Goal
Fig. 3.23.2
Ww Tech Knowledge
Publications
“(MU 3-54
| atand DSM) os Knowledge Representation using First Order Logic
We can write 4 main rules for the block world etc as fol ows !
Rule Precondition and Deletion List Add List
Rule 1 | pickup(X)
pi | hand empty, on(X,table), holding(X)
clear(X)
clear(Y) clear(X)
clear(X) clear(Y)
as follows :
Based on the above rules, plan for the block world problem : Start > goal can be specified
eT
«
a
1. unstack(Z,x)
2. putdown(Z)
SS
3, pickup(Y)
4, stack(¥,Z)
PET
5. pickup(X)
POT S
6. — stack(X,Y)
use ofa data structure called "Triangular Table".
e Execution of this plan can be done by making
SS
hand empty (C, A)
holding(C) | putdown (C)
2
ee
handempty | pickup
on (B, table)
3 (B)
clear (C) Holdin | stack
$ g(8) | (BC)
Hand pickup
on (A, table) clear (A)
? empty _| (A)
clear (B) | holdin | stack
: g(a) | (AB)
on (C, table) on (B, C) on (A, B)
clear (A)
7
0 1 2 3 4 5 6
Fig. 3.23.3
* — With the help of triangular table a tree is formed as shown below to achieve the goal state :
Fig. 3.23.4
one such example.
® — Anagent (in this case robotic arm) can have some amount of fault tolerance. Fig. 3.23.5 shows
Not allowed
Tool [|
hy
Xx] YW] — pl wg ™*_zl
Wrong
Start
move
Fig. 3.23.5
3.23.2 Example of the Spare Tire Problem
;
Q. _ Explain planning problem for spare tyre problem. UE ea ae)
© Consider the problem of changing a flat tire. More precisely, the goal is to have a good spare tire properly
mounted onto the car’s axle, where the initial state has a flat tire on the axle and a good spare tire in the trunk.
To keep it simple, our version of the problem is a very abstract one, with no sticky lug nuts or other
complications.
e There are just four actions: removing the spare from the trunk, removing the flat tire from the axle, putting the
spare on the axle, and leaving the car unattended overnight. We assume that the car is in a particularly bad
neighborhood, so that the effect of leaving it overnight is that the tires disappear.
e The ADL description of the problem is shown. Notice that it is purely propositional. It goes beyond STRIPS in
that it uses a negated precondition, ~At(Flat, Axle), for the PutOn(Spare, Axle) action. This could be avoided by
using Clear (Axle) instead, as we will see in the next example.
Solution using STRIPS :
e Init(At(Flat, Axle) A At(Spare, Trunk ))
e Goal(At(Spare, Axle)) Action(Remove(Spare, Trunk ),
e PRECOND: At(Spare, Trunk)
e EFFECT :-At(Spare, Trunk ) A At(Spare, Ground))
e Action(Remove(Flat, Axle),
e PRECOND: At(Flat, Axle)
e EFFECT: ~At(Flat, Axle) A At(Flat, Ground))
e Action(PutOn(Spare, Axle), |
© PRECOND : At(Spare, Ground) A- At(Flat, Axle) |
© EFFECT:~ At(Spare, Ground) A At(Spare, Axle)) :
e Action(LeaveOvernight)
e PRECOND
e EFFECT:~At(Spare, Ground) A~ At(Spare, Axle) A ~ At(Spare, Trunk) A > At(Flat, Ground) A ~ At(Flat, Axle))
ay Tech Knowledge
Publications
a ee OO
} FI aw
.
wv atand DS-t (MU) using First Order Logic al
Knowledge Representation as
n10e
anning
epresenting Real World Problems as Pl
{aSe
ics °R
earning
324 > r
bounded indeterminacy
pla nni ngs are som eti mes ter med as contingency planning and deals with
Conditional depending on the condition.
d earl ier. Age nt mak es a plan , eval uate the plan and then execute it fully or partly
discusse
and replanning :
(ii) Execution monitoring
Se Thee
* — Whatever planning we have discussed so far, belongs to single user environment. Agent acts alone in a single
user environment.
When the environment consists of multiple agent to, then the way a single agent plan its action get changed.
We have a glimpse of environment where multiple agents have to take actions based on current state. The
environment could be co-operative or competitive. In both the cases agent's action influences each other.
e Few of the multi agent planning Strategies are listed below:
(i) Co-operation
te
ee eed
(i) Co-operation :
Oe
In co-operation strategy agents have joint goals and plans. Goals can be divided into sub goals but ultimately combined
— weet Delt
to achieve ultimate goal.
(ii) Multibody planning :
Multi body planning is the strategy of implementing correct joint plan.
(iii) Co-ordination mechanisms :
These strategies specify the co-ordination between co-operating agents. Co-ordination mechanism is used in several
co-operating plannings.
(iv) Competition :
Competition strategies are used when agents are not co-operating but competing with each other. Every agent wants to
achieve the goal first
w TechKnowledge
Publications
P
Left Suck
a a
sl #8 |) a} ws [AQ]
|as| 7 7a 5 |= ol Tel aa] ©
LOOP
GOAL
example
Fig. 3.26.1 : Conditional Planning - vacuum world
but not about Left. Then, in such
vacu um agen t exam ple if the dirt is at Right and agent knows about Right, set or a
In
»
behi nd when the agent , leave s a clean square. Initial state is also called as a state
cases Dirt might be left
belief state. . Automatic sensing
rvable environments
itional Planning for partially obse
e Sensors play important role in Cond every step. Another method is
auto mati c sens ing an agent gets all the available percepts at
can be useful; with
sensory actions.
ing, with whic h perc epts are obtained only by executing specific
Active sens
Pa ae
°
esie rs
Ae =e op
ee -
- See
Se neenep eet
CleanL
-
~, Mee et
- Sheena we pene” ro
-- mae
- - .
7. sea,
A waeniccncensoee*™
le (condition 2)
Fig. 3.26.2: Conditional Planning - vacuum world examp
nay TechNaowledya
Publications
W Aland DS-I (MU) resentation using First Order Logic
3-59 Knowledge Rep
Review Questions
Q.3 Specify PEAS properties and type of environment for the same.
Q.10 Write syntax and semantics and example sentences for propositional logic.
Q.11 Write syntax and semantics and example sentences for propositional logic.
Q.12 Explain the inference process in case of propositional logic with suitable examples.
Q.18 Explain inference process in FOL using Forward Chaining.and Backward Chaining.
TechKuowledge
Publicettons
3-60 Knowledge Representation using First Order Logic
w Aland DS-I (MU)
as
ee EET
Ves
FSET
ee
PS
Fe STEELE
TAT
SN
ETC
SE
ee ST SEE
CRssata
=
~~
Rae -
Sh pe Stel
ath ct [
TO DS
A
INTRODUCTION
A
i
SELatT
Data Analytics, : Lifecycle,
ce
Introduction and Evolution of Data Science, Data Science Vs. Business Analytics Vs. Big Data,
Roles in Data Science Projects, Self-Learning Topics : Applications and Case Studies of Data Science in various
|
Industries.
i el
41 Introduction of Data Science
Data science is the combination of statistics, mathematics, programming, problem-solving, capturing data in
Ni i
data.
ingenious ways, the ability to look at things differently, and the activity of cleansing, preparing, and aligning
This umbrella term includes various techniques that are used when extracting insights and information
i
from data.
Data science is the practice of mining large data sets of raw data, both structured and unstructured, to identify
patterns and extract actionable insight from them. This is an interdisciplinary field, and the foundations of data
il a
science include statistics, inference, computer science, predictive analytics, machine learning algorithm
i I
the data science pipeline workflow involves capture: acquiring data, sometimes extracting it, and entering it into
the system. The next stage is maintenance, which includes data warehousing, data cleansing, data processing,
data staging, and data architecture.
Data processing follows, and constitutes one of the data science fundamentals. It is during data exploration and
i
processing that data scientists stand apart from data engineers. This stage involves data mining, data
classification and clustering, data modelling, and summarizing insights gleaned from the data—the processes
that create effective data. i
Next step is data analysis, an equally critical stage. Here data scientists conduct exploratory and confirmatory
work, regression, predictive analysis, qualitative analysis, and text mining.
During the final stage, the data scientist communicates insights. This involves data visualization, data reporting,
and the use of various business intelligence tools, and assisting businesses, policymakers, and others in smarter
decision making.
The term “Data Science” has emerged only recently to specifically designate a new profession that is expected to
make sense of the vast stores of big data. But making sense of data has a long history and has been discussed by
scientists, statisticians, librarians, computer scientists and others for years.
-1(MU
Introduction to DS
its use, attempts to define it, and related
The f ollowing Bg timeline trace S the evolution of the term "Data Sclence” and
terms:
» i
In 1947, 7, TukTukey coined the term “bit” e e which Claude Shannon used in his 1948 paper “A Mathematical Theory of
ications.” ‘
Communications." In 1977, Tukey published Exploratory Data Analysis, arguing that more emphasis needed to be
placed “ using data to suggest hypotheses to test and that Exploratory Data Analysis and Confirmatory Data
Analysis "can and should - proceed side by side,"
1974 Peter Naur published Concise Survey of Computer Methods in Sweden and the United States. The studied
contemporary data processing methods that are used in a wide range of applications. It is organized around the
concept of data as defined in the IFIP Guide to Concepts and Terms in Data Processing. Naur offered the following
definition of data science: “The science of dealing with data, once they have been established, while the relation
of the data to what they represent is delegated to other fields and sciences.”
1996 Members of the International Federation of Classification Societies (IFCS) met in Kobe, Japan, for their
biennial conference. For the first time, the term “data science” is included in the title of the conference (“Data
science, classification, and related methods”). The classification societies have variously used the terms data
analysis, data mining, and data science in their publications.
e 2001 William S. Cleveland published "Data Science: An Action Plan for Expanding the Technical Areas of the
Because
Field of Statistics.” It was a plan “to enlarge the major areas of technical work of the field of statistics.
the plan is ambitious and implies substantial change, the altered field was called ‘data science.”
ng on Analytics,” a Babson
e May 2005 Thomas H. Davenport, Don Cohen, and Al Jacobson published “Competi
a new form of competition
College Working Knowledge Research Centre report, describing “the emergence of
Instead of competing on
based on the extensive use of analytics, data, and fact-based decision making...
tp
and predictive modelling as
traditional factors, companies began to employ statistical and quantitative analysis
t in the Harvard Business
primary elements of competition. ” The research was later published by Davenpor
SA
g on Analytics: The New
Review (January 2006) and was expanded (with Jeanne G. Harris) into the book Competin
ae
Science of Winning (March 2007).
of the
e january 2009 Harnessing the Power of Digital Data for Science and Society is published. This report
and Technology
Interagency Working Group on Digital Data to the Committee on Science of the National Science
Council stated that “The nation needs to identify and promote the emergence of new disciplines and specialists’
expert in addressing the complex and dynamic challenges of digital preservation, sustained access, reuse and
repurposing of data. Many disciplines are seeing the emergence of a new type of data science and management
expert, accomplished in the computer, information, and data sciences arenas and in another domain science.
These individuals are key to the current and future success of the scientific enterprise,
« September 2011 Harlan Harris wrote in “Data Science, Moore's Law, and Moneyball”: “Data Science’ is defined
as what ‘Data Scientists’ do. What Data Scientists do has been very well covered, and it runs the gamut from data
collection and munging, through application of statistics and machine learning and related techniques, to
interpretation, communication, and visualization of the results. |
W TechKnowledge
Publications
a ee el aL ee eee ee a NE eT
Business Analytics is the statistical Data science Is the study of data using
Big data is the raw material used in
Study of business data to gain the field of science.
data
statistics, algorithms and technology.
insights. Characterized by its velocity, variety,
and volume (the 3Vs).
Does not involve much coding. It is Coding is widely used. This field is a Data comes from various sources,
more statistics oriented. combination of traditional analytics such as online purchases,
practice with good computer science multimedia forms, instruments,
knowledge. financial logs, sensors, text files, and
others.
The whole analysis is based on Statistics is used at the end of analysis Big data is the raw material for data
Statistical concepts. following coding. science, which affords the
techniques for analyzing the data.
Studies trends and patterns specific Studies almost every trend and pattern. Contains numerous trends and
to business. patterns.
Top industries where business Top industries/applications where data Big data is used in all the industries
analytics is used finance, science is used to produce insights.
healthcare, marketing, retail, supply e-commerce, finance, machine learning,
chain, telecommunications manufacturing.
Data mining is a technique used in business and data science both, while data science is an actual field of
scientific study or discipline. Data mining’s goal is to render data more usable for a specific business purpose.
Data science, in contrast, aims to create data driven products and outcomes—usually ina business context.
Data mining deals mostly with structured data, as exploring huge amounts of raw, unprocessed data is within
the bounds of data science. However, data mining is part of what a data scientist might do, and it's a skill that's
part of the science.
A data scientist is more likely to tackle larger masses of both structured and unstructured data. They will also
formulate, test, and assess the performance of data questions in the context of an overall Strategy. A data
scientist is more likely to look ahead, predicting or forecasting as they look at data.
Ww Tech Knowledge
Publicacions
4a
jand D 5-1 ( (MU) Introduction to DS
I
14 Data Analytics
ne conclusions.
pata analytics is the science of examining raw data to reach certain conclusi
es applyi
analytics involvook h
, ee fee vets tol nee an algorithmic or mechanical process to derive insights and running throug
zations
ati ons , It is use d in seve ral industries, which enables organi
e correl ormed decisions, as well as verify and disprove existing theories
lytics companies erhake
a data ana
ond more inf
i s that are
or models. The focus of data analytics lies in inf erence, which is th e process of deriving ivi conclusion
already knows,
solely based on what the researcher
4.5 Lifecycle
is a cyclic structure that
map s out such step s for data science professionals. It
e Data analytics architecture its significance and characterist
ics.
omp ass es all the data life cycl e phases, where each stage has
enc one direction, either
data prof essi onal s to proceed with data analytics in
form gui des
© The lifecycle’s circular info rmation, professionals can scrap the
entire research and
Based on the ne wly rece ived
forward or backward. the lifecycle diagram.
step to red o the com plete analysis as per
move back to the initial still no-defined structure
are tal ks of the dat a ana lyt ics lif ecycle among the experts, there is
» However, while there te dat a analytics architecture that
is uniformly followed
're unlike ly to find a con cre
of the mentioned stages. You ing extra phases (when
exp ert. Suc h amb igu ity gives rise to the probability of add
lysis ce or
by every data ana ing for different stages at on
ng the bas ic ste ps. Th er e is also the possibility of work
necessary) and removi
entirely.
skipping a phase
ics Lifecycle
4.5.1 Phases of Data Analyt
into six phases
give the data anal ysis proc ess a structured framework is divided
e Ascientific method that helps
ture.
of data analytics architec
y and Formation
Phase 1: Data Discover
and how to achieve it by
a defi ned goal. In this phas e, you'll define your data’s purpose
Everything begins with lifecycle,
the time you reach the end of the data analytics
W TechKnowledge
Publications
YA SATS TAL A A A A a
ooo
EE
° Data Entry : Formulating recent data points using digital systems or manual data entry techniques within
the enterprise.
e Signal Reception : Capturing information from digital devices, such as control systems and the Internet of
Things.
Phase 3 : Design a Model
After mapping out your business goals and collecting a glut of data (structured, unstructured, or semi-
structured), it is time to build a model that utilizes the data to achieve the goal.
There are several techniques available to load data into the system and start studying it:
e ETL (Extract, Transform, and Load) transforms the data first using a set of business rules, before loading it
into a sandbox.
e ELT (Extract, Load, and Transform) first loads raw data into the sandbox and then transform it.
e ETLT (Extract, Transform, Load, Transform) is a mixture; it has two transformation levels.
This step also includes the teamwork to determine the methods, techniques, and workflow to build the model in
the subsequent phase. The model's building initiates with identifying the relation between data points to select
the key variables and eventually find a suitable model.
Phase 4 : Model Building
This step of data analytics architecture comprises developing data sets for testing, training, and production
purposes. The data analytics experts meticulously build and operate the model that they had designed in the
previous step.
They rely on tools and several techniques like decision trees, regression techniques and neural networks for
building and executing the model. The experts also perform a trial run of the model to observe if the model
corresponds to the datasets.
W TechKnowledge
Puplicacions
=
Fig. 4.6.1
Data Scientist
Data Scientists find and interpret rich data sources, merge data sources, create visualizations, and use machine
Jearning to build models that aid in creating actionable insight from the data. They know the end-to-end process
of data exploration and can present and communicate data insights and findings to a range of team members
W Tech Knowledge
Publications
s Introduction to Ds
BF Aland DS-1 (MU) 4-7 —
: : in actionable knowledge
* — Inshort, they apply the scientific discovery process, including hypothesis testing, to obtaina :
related to a scientific or business problem.
Data Engineer
. ‘ efforts . They design,
desig develop,P
¢ — Data engineers make the appropriate data accessible and available for data science
and code data-focused applications that capture data, as well as clean the data.
. : tasets).
¢ This role also helps to ensure consistency of datasets (e.g., meaning of attributes across da )
e Data Science Developers design, develop, and code large data (science) analytics applications to support
scientific or enterprise/business processes. This role enables models to be deployed (i.e, use a model in
production) and requires some expertise in data science, as well as knowledge of how to effectively develop
software applications.
© Sometimes this role is known as a machine learning engineer. Regardless, they help bridge the worlds of data
science and software development.
e The product owner is responsible for prioritizing what work gets done, ensuring that each work item is clearly
defined from a business context, and that the upcoming work and priorities of the team are visible and
transparent.
e In addition, the product owner must agree that the tasks in the done column are actually done. In short, the
product owner represents all the stakeholders for the project. While a product owner is often the product
manager, it is possible to have separate these roles, in that the product managers has a more strategic focus on
the product's vision, company objectives, and the market (as compared to product owners which are more
tactical and directly involved within the day-to-day data science team by translating a product manager’s
strategy into actionable tasks.
Data/Business Analyst
e Data/Business Analysts analyze a large variety of data to extract information about system, service, or
organization performance and present them in usable/actionable form.
e They better shape a problem for the data scientist to explore. Note the difference between a data analyst and
data scientist.
Subject matter experts are people with extensive knowledge of how to apply the analytics within a specific
organizational context. This role is accountable to ensure the desired insights are actionable.
Ww TechKnowledge
Publications
| OO EE
The financial industry is one of the Most numbers-driven in the world, and one of the first industries that adopted data
science into the field. As it is fairly known, financial companies are information-driven, and data science is the perfect
helper to get actionable insights and obtain a sustainable development for financial institutions such as banks. Data
science helps in risk assessment and monitoring analysis, and
potential fraudulent behavior, payments, customer
creates a more stable financial
experience; among Many other utilizations. The ability to make data-driven decisions
environment and data scientists make the backbone of the industry.
2, Healthcare
By connecting pattern recognition, analytics, statistics, and deep learning algorithms, data science makes healthcare
more efficient. The demand for data scientists in the healthcare area grows rapidly, according to research published by
the Journal of the American Medical Informatics Association. The ability to quickly process large volumes of data for
clinical and laboratory reports, data scientists enable a more precise diagnosis process by utilizing deep learning
techniques. There are also many companies that market smart wearables, used to track and detect health conditions,
and data science is in the heart of the process. This allows data scientists to reduce the risk of health issues, and directly
impact the state of human wellbeing, not just in the US, but in the entire world.
3, Travel industry
Travel personalization has become an increasingly deeper process than it used to be. The possibility to create customer |
profiles based on segmentation, offering personalized experiences according to their needs and preferences, has its |
foundations in data science. Forecasting the behavior of travelers by knowing where they want to go next, what kind of
prices are they ready to pay, and when to launch special promotions, hugely depends on the level of applying data
scientists‘ skills and abilities.
4. Energy
e The energy industry experiences major fluctuations in prices and higher costs of projects - obtaining high-
quality information has never been so important. Data scientists help in cutting costs, reducing risks,
optimizing investments and improving equipment maintenance. They use predicting models to monitor {
compressors, which, in turn, can reduce the number of downtime days.
e “A day’s production at a small site - 1 000 barrels of oil - represents $30 000 of revenue,” stated Francisco
Sanchez, president of Houston Energy Data Science. Regarding the (data science) tools used in extracting F,
and evaluating data, it can range from Oracle, Hadoop, NoSQL, Python, and various other software and
solutions that can manipulate and analyze large datasets.
5. Manufacturing
e Often referred as industry 4.0 (with the introduction of robotization and automation as the 4" industrial
revolution), the manufacturing industry keeps growing in need of data scientists where they can apply their
knowledge of broad data management solutions through quality assurance, tracking defects, and increasing
the quality of supplier relations.
¢ Similar to the energy industry, utilizing preventive maintenance to troubleshoot potential future equipment
issues is another focus where data scientists can find good usage of their skills. Avoiding delays in the
production process, implementing artificial intelligence and predictive analytics offers the possibility to
manage frequent manufacturing issues : overproduction of products, logistics or inventory. In short, data
scientists help in identifying inefficiencies and tuning the production process.
W TechKnowledye
Pudlicacgions
Introduction to ps
WF Aland Ds-1 (MU) 4-9 ——<— =
6. Gaming
There are 2.5 billion gamers across the world, and the industry is becoming the heart of entertainment,
Data science is used in the industry to build models, analyze optimization points, make predictions or
identify patterns to ultimately improve gaming models. Not just limited to rer ERARCRAR PROSE, Uta
Scientists also work in the monetization, where they need to identify the most valuable players and analyze
general consumer behavior to increase the profitability of the company (the more the players spend, the
higher the profitability).
Another area where data scientists can put their skills to use is in fraud detection; security levels in the
gaming industry must be of highest standards, thus, machine learning algorithms allow faster identification
of suspicious account activities.
7. Pharmaceuticals
Connected to human health, the pharma industry has also emerged as an industry where data science is
increasing its application. For example, a pharmaceutical company can utilize data science to ensure a more
stable approach for planning clinical trials. The patent exclusivity “starts roughly at the same time of its first
clinical trial,” therefore, companies need to resort to data science in order to build precision into their
calculations of the potential success or failure of the clinical trials.
Another application can be seen before the trial even starts, by identifying suitable candidates based on
their body structure such as chemical structure, medical history or other important characteristics. Data
scientists read, evaluate, monitor and perform these analyses.
These are just some of the industries where we see active applications of data science and its benefits. The
future will certainly bring even more usage of this exciting field, and, whether you are a striving data
scientist or already in the field for years, the wealth of career choice is beneficial to all the inquisitive data
_ explorers out there.
The human gene is composed of four building blocks - A, T, C and G. Our looks and characteristics are
een
determined by the three billion permutations of these four building blocks. While there are genetic defects
and defects acquired during lifestyle, the consequences of it can lead to chronic diseases.
Ww Tech Knowlodye
Punlicacians
ge Aland DS-T(MU)
—_— [Introduction ta DS
Identifying such defects at an —
ensuite. s. : 8 can
Carly stage help the doctors and diagnostic teams to take ke p preventive
Helix is one of the genome an
alysis Companies that provide customers with their genomic details. Also,
several medicine s tailored for
Specific genetic designs have become increasingly popular due to the advent
of new computational met
hod ologies,
Due to the explosion in data, we ¢
‘an understand complex genomic sequences and analyze them on a large
scale,
Data ensts can Use contemporary computing power to handle large datasets and underst
and patterns
of genomic sequences to identify defects and provide insights to physicians
and researchers.
bertaragre with the usage of wearable devices,
data scientists can use the relationship between
genetic characteristics the
and the medical visits to develop a predi
ctive modeling system.
3, Predictive Modeling for Maintaining Oil
and Gas Supply
Crude oil and gas industries face a major
problem of equipment failures which usual
inefficiency of oil wells and their performa ly occurs due to the
nce at a subpar level.
With the adoption of a successful Strategy that advocates for predict
ive maintenance, the well operators can
be alerted of crucial stages for shutdown as well as can be notified of
maintenance periods. This will lead to
a boost in oil production and prevent
further loss.
Data Scientists can apply Predictive Maintenance Strategy to use data in order
to optimize high-value
machinery for manufacturing and refining oil products. With the telemetry data
extracted through sensors,
a Steady stream of historical data can be used to train our machine
learning model.
This machine learning model will predict the failing of machine parts and will notify the operators
of timely
maintenance in order to avert oil losses.
SPs=pemors
A Data Scientist assigned with the development of PdM strategy will help to avoid hazards and
will predict
machine failures, prompting the operators to take precautionary steps.
Ene
Review Questions
Sy See,
Q.1 Explain Difference between Data Science and Business Analytics.
go0O0
EXPLORATORY DATA
ANALYSIS
Introduction to exploratory data analysis, Typical data formats, Types of EDA, Graphical/Non grapbial itethous,
Univariate/multivariate methods Correlation and covariance, Degree of freedom, Statistical Methods for Evaluation
including ANOVA,
Self-Learning Topics: Implementation of graphical EDA methods.
* Exploratory Data Analysis refers to the critical process of performing initial investigations on data so as to
discover patterns, to spot anomalies, to test hypothesis and to check assumptions with the help of summary
statistics and graphical representations.
e In Exploratory data analysis (EDA) data scientists use visualisation and transformation to explore the data in a
systematic way. EDA is an iterative cycle.
e Following is the general process of EDA:
o Generate questions about your data.
o Search for answers by visualising, transforming, and modelling your data.
o Use what you learn to refine your questions and/or generate new questions.
fo Visualise
Explore
Program
Fig.5.1.1 : EDT Process
EDA is not a formal process with a strict set of rules. More than anything, EDA is a state of mind. During the
initial phases of EDA user should feel free to investigate every idea that occurs to the user.
Some of these ideas will pan out, and some will be dead ends. As the data exploration continues, user will home
in ona few particularly productive areas that user will eventually write up and communicate to others.
EDA is an important part of any data analysis. Data cleaning is just one application of EDA. To do data cleaning,
we need to deploy all the tools of EDA: visualisation, transformation, and modelling.
e The goals of the EDA process
o Aproper EDA hopes to accomplish several goals:
ef Aland DS-1(MU)
= 5-2 Exploratory Data Analysis
oO To q question the data ;
aand determine if there are
> To determine if the d r
problems inherent in the dataset; ;
ata on hand
3 is suffi
research
clent to answer a particular question or wheth er
additional feature engineeringis required
o vel
Todevelopa framework for answering the research
question;
o Torefine the q question S and/or
research problem based on what you have learned about the data.
5.1.1 Typical Data Formats
ED Ais fundamenta lly a creative process, And like most creative processes, the key to asking quality questions
tne t
!s
tog enerate a large i
TBE quantity of questions. It is difficult to ask revealing questions at the start of your analysisi
because you do not know what insights are contained in your
dataset.
On the other hand, each new question that you ask will expose you to a new aspect of your data and increase
your chance of making a discovery. You can quickly drill down into the most interesting parts of your data—and
develop a set of thought-provoking questions—if you follow up each question with a new question based on
what you find.
There is no rule about which questions you should ask to guide your research. However, two types of questions
will always be useful for making discoveries within your data.
e Youcan loosely word these questions as:
1. What type of variation occurs within my variables?
2, What type of co-variation occurs between my variables?
e _ Lets define few of the important terms before we proceed.
o Avariable is a quantity, quality, or property that you can measure.
° A value is the state of a variable when you measure it. The value of a variable may change from
measurement to measurement.
An observation is a set of measurements made under similar conditions (you usually make all of the
measurements in an observation at the same time and on the same object). An observation will contain
several values, each associated with a different variable. I'll sometimes refer to an observation as a data
point.
Tabular data is a set of values, each associated with a variable and an observation. Tabular data is tidy if
each value is placed in its own “cell”, each variable in its own column, and each observation in its own row.
Variation is the tendency of the values of a variable to change from measurement to measurement. You can
see variation easily in real life; if you measure any continuous variable twice, you will get two different
results. This is true even if you measure quantities that are constant, like the speed of light. Each of your
measurements will include a small amount of error that varies from measurement to measurement.
Categorical variables can also vary if you measure across different subjects (e.g. the eye colors of different
people), or different times (e.g. the energy levels of an electron at different moments). Every variable has its
own pattern of variation, which can reveal interesting information. The best way to understand that pattern
is to visualise the distribution of the variable’s values.
* How you visualise the distribution of a variable will depend on whether the variable is categorical or continuous.
A variable is categorical if it can only take one of a small set of values,
* In R, categorical variables are usually saved as factors or character vectors. To examine the distribution of a
categorical variable, use a bar chart:
w TechKnowledge
Publications
Exploratory Data Analysis
¥F Aland Ds-1 (MU) 5-3 =
library(plotly)
geplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
| -
20000 + . _ ne egos ager
| | | :
| i
| S
| | | _
15000+-— tT i pti | = tc
: | | |
|
{ po ag et Heer }
8 10000 4- | new|. re
|
| |
5000 +~~---—~-f
| C
ob coe : ae Sales i 3 5,3 ? “3 sss :
Fig. 5.1.2
© The height of the bars displays how many observations occurred with each x value.
e A variable is continuous if it can take any of an infinite set of ordered values. Numbers and date-times are two
examples of continuous variables. To examine the distribution of a continuous variable, use a histogram:
library(plotly)
ggplot(data = diamonds).+
20000
count
10000 fe veined
carat
: Fig. 5.1.3
Ww Tech Knowledge
Publications
EF Aland DS-1 (MU)
Ee
5-4 Exploratory Data Analysis
A histogram divides the X-axis into
equally spaced bins and
observations that fall in each then uses the height of
bin a bar to display the nu
a carat value between 0.25 aand mber "
nc ; 0.75
nth, ewhBra
ichphareabothe
ve, leftthe andtallesrigt bar shows that almost 30,000 observations have
If you wish to over] ay multip ht edges of the bar.
le histogr ams in
of gecom_histogram(), the same plot, | recommend using geom_fre
Bcom
_freqpoly() pe tforms the qpoly() instead
displaying the counts wit same calculation as geom_histogram(), but inste
h bars, Uses line S instead. It's much easier to unde ad of
rstand overlapping lines than bars.
geplot(data = smaller, mapping = aes(x = car
at, colour = cut)) +
geom_freqpoly(binwidth = 0.1)
4000
cut
~ — Fair
5 —— Good
~ == Very Good
2000qj— fe —- Premium
---- Ideal
wy .
XY ee
carat
Fig. 5.1.4
5.1.2 Typical Values
In both bar charts and histograms, tall bars show the common values of a variable, and shorter bars show less-
common values. Places that do not have bars reveal values that were not seen in your data. To turn this information
into useful questions, look for anything unexpected :
Which values are the most common? Why?
Which values are rare? Why? Does that match your expectations?
Can you see any unusual patterns? What might explain them?
As an example, the histogram below suggests several interesting questions:
Why are there more diamonds at whole carats and common fractions of carats?
Why are there more diamonds slightly to the right of each peak than there are slightly to the left of each peak?
Why are there no diamonds bigger than 3 carats?
geplot(data = smaller, mapping = aes(x = carat)) +
geom_histogram(binwidth = 0.01)
Copy
W Tech Knowledge
Pedlications
ee Aland DS-I (MU) 5.5 Exploratory Data Analysis
ee
2000 +-
count
1000
carat
Fig. 5.1.5
Clusters of similar values suggest that subgroups exist in your data. To understand the subgroups, ask:
e How are the observations within each cluster similar to each other?
e® Howare the observations in separate clusters different from each other?
e Howcan you explain or describe the clusters?
e Why might the appearance of clusters be misleading?
The histogram below shows the length (in minutes) of 272 eruptions of the Old Faithful Geyser in Yellowstone
National Park. Eruption times appear to be clustered into two groups: there are short eruptions (of around 2
minutes) and long eruptions (4-5 minutes), but little in between.
geplot(data = faithful, mapping = aes(x = eruptions)) +
geom_histogram(binwidth = 0.25)
Copy
40
30
count
ro
°o
10
2 3 4 5
eruptions
Fig. 5.1.6
WF TechKnowledya
Publications
€F Aland DS-1 (MU)
=== 5-6 Exploratory Data Analysis
=> SSS
Many of the questions above will Prompt you to explore a relationship between variables, for example, to see if
the values of one variable can explain the behavior of another variable, We'll get to that shortly.
5.1.3 Unusual Values
e Outliers are observations that are unusual; data points that don’t seem to fit the pattern. Sometimes outliers are
data entry errors; other times outliers suggest important new science.
When you have a lot of data, outliers are sometimes difficult to see in a histogram. For example, take the
distribution of the y variable from the diamonds dataset. The only evidence of outliers is the unusually wide
limits on the x-axis.
ggplot(diamonds) +
geom_histogram(mapping = aes(x = y), binwidth = 0.5)
12000
8000
Ee
2 2,
° iy
“ te
A
4000
caret
0
0 20 40 60
y
Fig. 5.1.7
e There are so many observations in the common bins that the rare bins are so short that you can’t see them
(although maybe if you stare intently at 0 you'll spot something).
e Tomake it easy to see the unusual values, we need to zoom to small values of the y-axis with coord_cartesian() :
ggplot(diamonds) +
geom_histogram(mapping = aes(x = y), binwidth = 0.5) +
coord_cartesian(ylim = ¢(0, 50))
W TechKnowledga
Publicacions
WF Aland ps. (Mu) Exploratory Data Analysis=<
i
507 +}
1
dna
count
oa i fi se
20 y 40 60
Fig. 5.1.8
(coord_cartesian() also has anxlim{) argument for when you need to zoom into the x-axis. ggplot2 also
has xlim{) and ylimQ) functions that work slightly differently: they throw away the data outside the limits.)
If you've encountered unusual values in your dataset, and simply want to move on to the rest of your analysis,
you have two options.
1. Drop the entire row with the strange values:
2. diamonds2 <- diamonds %>%
filter(between(y, 3, 20))
I don’t recommend this option because just because one measurement is invalid, doesn’t mean all the
measurements are. Additionally, if you have low quality data, by time that you've applied this approach to every
variable you might find that you don’t have any data left!
3. Instead, 1 recommend replacing the unusual values with missing values. The easiest way to do this is to use
mutate() to replace the variable with a modified copy. You can use the ifelse() function to replace unusual values
with NA:
4, diamonds2 <- diamonds %>%
mutate(y = ifelse(y < 3 | y > 20, NA, y))
ifelseQ) has three arguments. The first argument test should be a logical vector. The result will contain the value
of the second argument, yes, when test is TRUE, and the value of the third argument, no, when it is false.
W Tech Knowledge
Publications
-
this is the simplest form of data analysis as during this we use just one variable to
*
1. Univer i
is to know the underlying sample
research the info. The standard goal of univariate non-graphical EDA of the
lly part
distribution/ data and make observations about the population. Outlier detection is additiona
analysis. The characteristics of population distribution include:
to do with typical or middle
e Central tendency : The central tendency or location of distribution has got
es
values. The commonly useful measures of central tendency are statistics called mean, median, and sometim
mode during which the foremost common is mean. For skewed distribution or when there’s concern about
outliers, the median may be preferred.
out the find the
e Spread : Spread is an indicator of what proportion distant from the middle we are to seek
of spread. The variance is that the
info values. the quality deviation and variance are two useful measures
variance is the root of the variance
mean of the square of the individual deviations and therefore the
are the skewness and kurtosis of the
¢ Skewness and kurtosis: Two more useful univariates descriptors
kurtosis may be a more subtle measure of
distribution. Skewness is that the measure of asymmetry and
peakedness compared to a normal distribution
tion
l EDA technique is usually wont to show the connec
Multivariate Non-graphical : Multivariate non-graphica
tion or statistics.
between two or more variables within the sort of either cross-tabula
variables,
called cross-tabulation is extremely useful. For 2
e For categorical data, an extension of tabulation
with column headings that match the amount of
cross-tabulation is preferred by making a two-way table filling the counts
t of the opposite two variables, then
one-variable and row headings that match the amoun
with all subjects that share an equivalent pair of levels.
quantitative variables
tative variable, we create statistics for
e For each categorical variable and one quanti of categorical
then compare the statistics across the amount
separately for every level of the specific variable
variable.
t version of
of ANOVA and comparing medians may be a robus
e Comparing the means is an off-the-cuff version
one-way ANOVA.
not give the
are quantitative and objective, they are doing
Univariate graphical: Non-graphical methods analysis, also
methods are more involve a degree of subjective
complete picture of the data; therefore, graphical
graphics are:
are required. Common sorts of univariate
bar
ram, which may be a barplot during which each
e Histogram: The foremost basic graph is a histog
/total count) of cases for a variety of values.
represents the frequency (count) or proportion (count
lot about your data, including central tendency,
Histograms are one of the simplest ways to quickly learn a
spread, modality, shape and outliers.
nd-leaf plots. It shows all data values
» Stem-and-leaf plots: An easy substitute for a histogram may be stem-a
and therefore the shape of the distribution.
Boxplots are excellent at
e Boxplots: Another very useful univariate graphical technique is that the boxplot.
also as
presenting information about central tendency and show robust measures of location and spread
re
aspects like
providing information about symmetry and outliers, although they will be misleading about
multimodality. One among the simplest uses of boxplots is within the sort of side-by-side boxplots.
e Quantile-normal plots: The ultimate univariate graphical EDA technique is that the most intricate. it’s called
to see how well a
the quantile-normal or QN plot or more generally the quantile-quantile or QQ plot. it’s wont
specific sample follows a specific theoretical distribution. It allows detection of non-normality and diagnosis
of skewness and kurtosis
W Tech Knowledge
Publications
— —
“i
2
one variable on the x-axis
* Run chart: It's a line graph of data plotted over time.
® Heat map: It's a graphical representation of data where values are depicted by color.
ee
and response.
* Multivariate chart: It’s a graphical representation of the relationships between factors
) in two-dimensional plot
e Bubble chart: It's a data visualization that displays multiple circles (bubbles
Perform
In a nutshell: You ought to always perform appropriate EDA before further analysis of your data.
learn about
whatever steps are necessary to become more conversant in your data, check for obvious mistakes,
It is very
variable distributions, and study about relationships between variables. EDA is not an exact science-
important are!
If we are focusing on data from observation of a single variable on n subjects, i.e, a sample of size n, then in
addition to looking at the various sample, we also need to look graphically at the distribution of the sample. Non-
graphical and graphical methods complement each other.
While the non-graphical methods are quantitative and objective, they do not give a full picture of the data;
therefore, graphical methods, which are more qualitative and involve a degree of subjective analysis, are also
required.
var Tech Knowledge
Publications
— Tee ee ee ———-oe
» —_
-1(MU
BF Aland DS (MY) — 5-10 Exploratory Data Analysis
53.1
— ° . ——————==== —
Multivariate Non-Graphical EDA
Multivariate non-graphical EDA techniques generally show the relationship between two or more variables in
the form of cither cross-tabulation or statistics.
cross-tabulation:
« For categorical data (and quantitative data with only a few different values) an extension of tabulation called
cross-tabulation is very useful. For two variables, cross-tabulation is performed by making a two-way table with
column headings that match the levels of one variable and row headings that match the levels of the other
variable, then filling in the counts of all subjects that share a pair of levels.
e The two variables might be both explanatory, both outcome, or one of each. Depending on the goals, row
percentages (which add to 100% for each row), column percentages (which add to 100% for each column)
and/or cell percentages (which add to 100% over all cells) are also useful.
e Cross-tabulation is the basic bivariate non-graphical EDA technique.
Histogram :
¢ The most basic graph is the histogram, which is a barplot in which each bar represents the frequency (count) or
proportion (count/total count) of cases for a range of values. Typically the bars run vertically with the count (or
proportion) axis running vertically. To manually construct a histogram, define the range of data for each bar
(called a bin), count how many cases fall in each bin, and draw the bars high enough to indicate the count.
° — It is often worthwhile to try a few different bin sizes/numbers because, especially with small samples, there may
sometimes be a different shape to the histogram when the bin size changes. But usually the difference is small.
Stem-and-leaf plots :
e Asimple substitute for a histogram is a stem and leaf plot. A stem and leaf plot is sometimes easier to make by
hand than a histogram, and it tends not to hide any information.
e Nevertheless, a histogram is generally considered better for appreciating the shape of a sample distribution than
is the stem and leaf plot. A stem and leaf plot shows all data values and the shape of the distribution.
Boxplots :
e Another very useful univariate graphical technique is the boxplot. The boxplot will be described here in its
vertical format, which is the most common, but a horizontal format also is possible. Boxplots are very good at
presenting information about the central tendency, symmetry and skew, as well as outliers, although they can be
misleading about aspects such as multimodality. One of the best uses of boxplots is in the form of side-by-side
boxplots.
e The term fat tails is used to describe the situation where a histogram has a lot of values far from the mean
relative to a Gaussian distribution. This corresponds to positive kurtosis. In a boxplot, many outliers (more than
the 1/150 expected for a Normal distribution) suggests fat tails (positive kurtosis), or possibly many data entry
errors. Also, short whiskers suggest negative kurtosis, at least if the sample size is large.
* — Boxplots are excellent EDA plots because they rely on robust statistics like median and IQR rather than more
sensitive ones such as mean and standard deviation. With boxplots it is easy to compare distributions usually,
for one variable at different levels of another, with a high degree of reliability because of the use of these robust
statistics.
W Tech Knowledge
Publications
WF Aland Ds (Mu) 5-11 Exploratory Data Analysis
=
Quantile-normal plots :
The final univariate graphical EDA technique is the most complicated. It is c glled the quantile-normal or QN plot
ple follows a
or more generality the quantile-quantile or QQ plot. It is used to see how well a particular sam
particular theoretical distribution.
Although it can be used for any theoretical distribution, we will limit our attention to seeing how well a sample
of data of size n matches a Gaussian distribution with mean and variance equal to the sample mean and variance.
By examining the quantile-normal plot we can detect left or right skew, positive or negative kurtosis, and
bimodality.
For two quantitative variables, the basic statistics of interest are the sample covariance and/or sample
correlation, which correspond to and are estimates of the corresponding population parameters. The sample
covariance is a measure of how much two variables “co-vary”, i.e., how much (and in what direction) should we
expect one variable to change when the other changes. Sample covariance is calculated by computing (signed)
deviations of each measurement from the average of all measurements for that variable.
Then the deviations for the two measurements are multiplied together separately for each subject. Finally these
values are averaged (actually summed and divided by n-1, to keep the statistic unbiased). Note that the units on
sample covariance are the products of the units of the two variables.
Positive covariance values suggest that when one measurement is above the mean the other will probably also
be above the mean, and vice versa. Negative covariances suggest that when one variable is above its mean, the
other is below its mean. And covariances near zero suggest that the two variables vary independently of each
other. Technically, independence implies zero correlation, but the reverse is not necessarily true.
Covariances tend to be hard to interpret, so we often use correlation instead. The correlation has the nice
property that it is always between -1 and +1, with -1 being a “perfect” negative linear correlation, +1 being a
perfect positive linear correlation and 0 indicating that X and Y are uncorrelated. The symbol r or Ixy is often
used for sample correlations.
The general formula for sample covariance is,
W Tech Knowledge
Publicacions
P
8 —— Premium
2000
--- Ideal
1000
sateen,
0
10000 15000 20000
0 5000
price
Fig. 5.3.1
15000
10000
5000
OT
Fair Good VeryGood Premium Ideal
cut
Fig. 5.3.2
Tech Knowledge
Publications
WF Aland Ds-1 (MU) 5-13 Exploratory Data Analysis
—=——
To make the comparison easier we need to swap what is displayed on the y-axis. Instead of displaying count,
We'll display density, which is the count standardised so that the area under each frequency polygon is one.
geplot(data = diamonds, mapping = aes(x = price, y = ..density..)) +
geom_freqpoly(mapping = aes(colour = cut), binwidth = 500)
5e-04
48-04 +——-
! cut
3e-04 +—
— Fair
= --- Good
8 ~— Very Good
2e-04 4 —- Premium
~- Ideal
je-04
0e+00 4
Fig. 5.3.3
There’s something rather surprising about this plot - it appears that fair diamonds (the lowest quality) have the
highest average price! But maybe that’s because frequency polygons are a little hard to interpret - there’s a lot going
on in this plot.
Another alternative to display the distribution of a continuous variable broken down by a categorical variable is
the boxplot. A boxplot is a type of visual shorthand for a distribution of values that is popular among statisticians.
Each boxplot consists of:
e Abox that stretches from the 25th percentile of the distribution to the 75th percentile, a distance known as the
interquartile range (IQR). In the middle of the box is a line that displays the median, i.e. 50th percentile, of the
distribution. These three lines give you a sense of the spread of the distribution and whether or not the
distribution is symmetric about the median or skewed to one side.
¢ Visual points that display observations that fall more than 1.5 times the IQR from either edge of the box. These
outlying points are unusual so are plotted individually.
e A line (or whisker) that extends from each end of the box and goes to the farthest non-outlier point in the
distribution.
| | Outliers <= |
1.5xJQR
® Whisker to
t farthest non- —————>
outlier point
Inter-Quartile Range
oF
75th percentile ae
(tOR)
cmoee
Fig. 5.3.4
% 10000
c
Fig. 5.3.5
» We see much less information about the distribution, but the boxplots are much more compact so we can more
easily compare them (and fit more on one plot). It supports the counterintuitive finding that better quality
diamonds are cheaper on average! In the exercises, you'll be challenged to figure out why.
e Cutis an ordered factor: fair is worse than good, which is worse than very good and so on. Many categorical
variables don’t have such an intrinsic order, so you might want to reorder them to make a more informative
isplay. One way to do that is with the reorder() function.
Ww TechKaowledge
Publications
Exploratory Data Analysis
WF Atanas. (MU) 5-15 SSS
4 |
40 ++ aaa
: ne ,
30 j
>
»
>
20 ae
? mr
6 >
20
4 9
Fig. 5.3.7
If you have long variable names, geom_boxplot() will work better if you flip it 90°. You can do that
with coord_flip().
Ww Tech Knowledge
Publications
RF Aland DS-1 (MU) 5-16 Exploratory Data Analysis
geplot(data = mpg) +
geom_boxplot(mapping = aes(x = reorder(class, hwy, FUN = median), y = hwy)) +
coord flip()
midsize +— |, _ | | [_ sf
al | }
|
=§ compact +— ; | |
5e :
| |
e :
x subcompact +—— ————— i oe
5 4
a | |
: 2seater : i |
E aRe|
w
oe ts oh e I
& minivan ® We
> BES ;
2
8@ SUV &
Peea -2-¢-t-8
1 |
20 30 40
hwy
Fig. 5.3.8
|
y = color))
geom_count(mapping = aes(x = cut,
-_-+—
tpt
| @—_@—__@—__@-_—__@___@
® 1000
@ 2000
o—
color
ro)
@ 3000
@ 4000
Fair Good
Very Good Premium Ideal
cut
Fig. 5.3.9
ww TechKuowledye
Publications
ay
The size of each circle in the plot displays how many observations occurred at each combination of values,
Covariation will appear as a strong correlation between specific x values and specific y values.
Then visualise with geom_tile() and the fill aesthetic:
diamonds %>%
count(color, cut) %>%
Ideal4
Premium;
a Very Good;
Good)
Fair 7
color
Fig. 5.3.10
If the categorical variables are unordered, you might want to use the seriation package to simultaneously
reorder the rows and columns in order to more clearly reveal interesting patterns. For larger plots, you might want to
try the d3heatmap or heatmaply packages, which create interactive plots.
e You've already seen one great way to visualise the covariation between two continuous variables: draw a
scatterplot with geom_point().
* You can see covariation as a pattern in the points. For example, you can see an exponential relationship between
the carat size and price of a diamond.
ggplot(data = diamonds) +
geom_point(mapping = aes(x = carat, y = price))
ney YechKuowledge
A
— Publications
a
i
r
Re Exploratory Data Analysis
| 15000
ps
;
eos
«
°
° ga
e e
10000 %
ee
e
] 8
j a
| °
5000
}
| 0
0 1 2 carat «3 4 5
Fig. 5.3.11
e Scatterplots become less useful as the size of your dataset grows, because points begin to overplot, and pile up
into areas of uniform black (as above). You've already seen one way to fix the problem: using the alpha aesthetic
to add transparency.
ggplot(data = diamonds) +
geom_point(mapping = aes(x = carat, y = price), alpha = 1 / 100)
Copy
15000
10000
price
5000
carat
Fig. 5.3.12
But using transparency can be challenging for very large datasets. Another solution is to use bin. Previously you
used geom_histogram() and geom_freqpoly()to bin in one dimension. Now you'll learn how to
use geom_bin2d() and geom_hex() to bin in two dimensions.
e geom_bin2d() and geom_hex() divide the coordinate plane into 2d bins and then use a fill color to display how
many points fall into each bin. geom_bin2d() creates rectangular bins. geom_hex() creates hexagonal
bins. You
will need to install the hexbin package to use geom_hex().
Ww TechKnowledye
Puolications
5-19 Exploratory Data Analysis
¥F Al and DS-1 (MU)
ggplot(data = smaller) +
geom_bin2d(mapping = aes(x = carat, y = price))
# install.packages("hexbin")
geplot(data = smaller) +
geom_hex(mapping = aes(x = carat, y = price))
20000
Ree Sarr i
15900
T8000. a
¢ 10000
19009 Q
a
8
a
5000 $000
0 0
T ' *
1 caral 2 | carat 2
Fig. 5.3.13
e Another option is to bin one continuous variable so it acts like a categorical variable. Then you can use one of the
techniques for visualising the combination of a categerical and a continuous variable that you learned about. For
example, you could bin carat and then for each group, display a boxplot:
ggplot(data = smaller, mapping = aes(x = carat, y = price)) +
geom_boxplot(mapping = aes(group = cut_width(carat, 0.1)))
¢ 4 ;3 :* &
e ‘ FE 3
15000 EL |
ha
8 10000 ! . a
|
o 4 Pr
EE
0
TTieeTT
1 f *
|
|
t
1 carat 2
Fig. 5.3.14
WF Tech Knowledge
Publications
¥F Aland DS-I (MU) Exploratory Data Analysis
5-20
cut_width(x, width), as used above, divides x into bins of width width. By default, boxplots look roughly the
same (apart from number of outliers) regardless of how many observations there are, so it's difficult to tell that
each boxplot summarises a different number of points. One way to show that is to make the width of the boxplot
proportional to the number of points with varwidth = TRUE.
When we have many quantitative variables the most common non-graphical EDA technique is to calculate all of
the pairwise covariances and/or correlations and assemble them into a matrix. Note that the covariance of X
with X is the variance of X and the correlation of X with X is 1.0.
For example the covariance matrix of table 5.3.1 tells us that the variances of X, Y , and Z are 5, 7, and 4
respectively, the covariance of X and Y is 1.77, the covariance of X and Z is -2.24, and the covariance of Y and Z
is 3.17,
Similarly the correlation matrix in _gure 4.6 tells us that the correlation of X and Y is 0.3, the correlation of X
and Z is -0.5. and the correlation of Y and Z is 0.6.
Table 5.3.1: Covariance Calculation
JA 62 15 +12 -4 - 48
Total 0 0 ~1106
Table 5.3.2: A Covariance Matrix
X Y Z
W TechKnowledye
Publications
WF Al and DS-I (MU) 5-21 Exploratory Data Analysis
SS
SS SSS
Degrees of freedom refers to the maximum number of logically independent values, which are values that have
the freedom to vary, in the data sample.
° — Degrees of freedom refers to the maximum number of logically independent values, which are values that have
the freedom to vary, in the data sample,
* Degrees of freedom are commonly discussed in relation to various forms of hypothesis testing in statistics, such
as a chi-square,
¢ — Calculating degrees of freedom is key when trying to understand the importance of a chi-square statistic and the
validity of the null hypothesis.
¢ Consider a data sample consisting of, for the sake of simplicity, five positive integers. The values could be any
number with no known relationship between them. This data sample would, theoretically, have five degrees of
freedom.
¢ Four of the numbers in the sample are {3, 8, 5, and 4} and the average of the entire data sample is revealed to be
6.
¢ This must mean that the fifth number has to be 10. It can be nothing else. It does not have the freedom to vary.So
the degree of freedom for this data sample is 4.
The formula for degrees of freedom equals the size of the data sample minus one:
Df = N-1
Degrees of freedom are commonly discussed in relation to various forms of hypothesis testing in statistics, such
as a chi-square. It is essential to calculate degrees of freedom when trying to understand the importance of a chi-
square statistic and the validity of the null hypothesis.
The F-test in ANOVA also tests group means. It uses the F-distribution, which is defined by the DF. However, you
calculate the ANOVA degrees of freedom differently because you need to find the numerator and denominator
DF.
e Analysis of variance (ANOVA) uses F-tests to statistically assess the equality of means when you have three or
more groups.
Analysis of variance (ANOVA) assesses the differences between group means. It is a statistical hypothesis test
that determines whether the means of at least two populations are different. Ata minimum, you need a continuous
dependent variable and a categorical independent variable that divides your data into comparison groups to perform
ANOVA.
The simplest type of ANOVA test is one-way ANOVA. This method is a generalization of t-tests that can assess the
a
Ww TechXnowledge
Publications
EF Aland DS-l (MU) —y—
Explorator Analysis
Data—
——5-22
————— ——
— SS
Boxplot of Score
40
35 4 mi
304 |
© 254
n Q
o
4 ;
204 |
15;
104
Method 4 Method 2 Method 3 Method 4
| Teaching Method
Fig. 5.5.1
ANOVA tells you whether the differences between group means are statistically significant.
Statisticians consider ANOVA to be a special case of least squares regression, which is a specialization of the
general linear model. All these models minimize the sum of the squared errors.
5.5.2 F Test
® The term F-test is based on the fact that these tests use the F-values to test the hypotheses. An F-statistic is the
ratio of two variances and it was named after Sir Ronald Fisher. Variances measure the dispersal of the data
points around the mean. Higher variances occur when the individual data points tend to fall further from the
mean.
e An F-value is the ratio of two variances, or technically, two mean squares. Mean squares are simply variances
that account for the degrees of freedom (DF) used to estimate the variance. F-values are the test statistic for F-
tests.
e Consider, Variances are the sum of the squared deviations from the mean. If you have a bigger sample, there are
more squared deviations to add up. The result is that the sum becomes larger and larger as you add in more
observations. By incorporating the DF, mean squares account for the differing numbers of measurements for
each estimate of the variance. Otherwise, the variances are not comparable, and the ratio for the F-statistic is
meaningless.
The F-test in One-Way ANOVA
We want to determine whether a set of means are all equal. To evaluate this with
an F-test, we need to use the
proper variances in the ratio. Here’s the F-statistic ratio for one-way ANOVA.
F = between proups variance
within group variance
W Tech Knowledge
Publications
is
Exploratory Data Analys
WF Aland ps-1 (Mu) 5-23
Let's consider following example.
Analysis of variance
Total 39 202.09
Modei Summary
Means
To be able to conclude that not all group means are equal, we need a large F-value to reject the null hypothesis.
Is ours large enough?
Atricky thing about F-values is that they are a unitless statistic, which makes them hard to interpret. Our F-value
of 3.30 indicates that the between-groups variance is 3.3 times the size of the within-group variance. The null
hypothesis value is that variances are equal, which produces an F-value of 1. Is our F-value of 3.3 large enough to
reject the null hypothesis?
We don’t know exactly how uncommon our F-value is if the null hypothesis is correct. To interpret individual F-
values, we need to place them in a larger context. F-distributions provide this broader context and allow us to
calculate probabilities.
First, let's assume that the null hypothesis is true for the population. At the population level, all four group
means are equal. Now, we repeat our study many times by drawing many random samples from this population
using the same one-way ANOVA design (four groups with 10 samples per group). Next, we perform one-way
ANOVA on all of the samples and plot the distribution of the F-values. This distribution is known as a sampling
distribution, which is a type of probability distribution.
If we follow this procedure, we produce a graph that displays the distribution of F-values for a population where
the null hypothesis is true. We use sampling distributions to calculate probabilities for how unlikely our sample
statistic is if the null hypothesis is true. F-tests use the F-distribution.
Fortunately, we don’t need to go to the trouble of collecting numerous random samples to create this graph!
Statisticians understand the properties of F-distributions so we can estimate the sampling distribution using the
F-distribution and the details of our one-way ANOVA design.
Our goal is to evaluate whether our sample F-value is so rare that it justifies rejecting the null hypothesis for the
entire population. We'll calculate the probability of obtaining an F-value that is at least as high as our study's
value (3.30).
say Tech Knowledge
Publications
: -] (MU
ef Aland DS iM) 5-24 Exploratory Data Analysis
This probability has a name—the P value! A low probability indicates that our sample data are unlikely when
the null hypothesis is true.
You use box plots to show some of the most important features of a dataset, such as the following:
o Minimum value
o Maximum value
o Quartiles
Quartiles separate a dataset into four equal sections.
The first quartile (Q:) is a value such that the following is true: 25 percent of the observations in a dataset are
less than the first quartile. 75 percent of the observations are greater than the first quartile.
The second quartile (Qz) is a value such that : 50 percent of the observations in a dataset are less than the second
quartile. 50 percent of the observations are greater than the second quartile. The second quartile is also known
as the median.
ions in a dataset are less than the third
© The third quartile (Qs) is a value such that : 75 percent of the observat
quartile.
quartile.25 percent of the observations are greater than the third
ially different from the rest of
© You can also use box plots to identify outliers. These are values that are substant
so it's important to identify them before
the dataset. Outliers can cause problems for traditional statistical tests,
performing any type of statistical analysis.
5.6.2 Histograms
ws. With a histogram, the
e You use histograms to gain insight into the probability distribution that a dataset follo
represented by a vertical bar.
dataset is organized into a series of individual values or ranges of values, each
m, it's easy to see
* The height of the bar shows how frequently a value or range of values occurs. With a histogra
how the data is distributed.
W Tech Knowledge
Puolicartions
SF Aland Ds-1 (MU) 5-25 Exploratory Data Analysis
Xis the independent variable, and Yis the dependent variable. mis the slope, which represents the change
in ¥ due to a given change in X. b is the intercept, which shows the value of Y when X equals zero. The fig. 5.6.1 shows a
scatter plot between two variables in which the relationship appears to be linear.
°
oO
25 - °
oO
w 20-7 oO S
> °
154 S
10, °
Qo
2 4 6 8 10
x
e The points on the scatter plot very nearly form a straight line. It bends a little to the left and bends a little to the
right, but it's roughly straight. This shows that the relationship is linear, with a positive slope.
e — The following fig. 5.6.2 shows a scatter plot between two variables in which Y appears to be rising more rapidly
than x.
250 5 o
200 + o
N 4604 °
> 50 5
100 4 °
°
50 ] , 0 ° °
2 4 6 8 10
x
e See the curve? This relationship is clearly not linear. It is in fact a quadratic relationship. A quadratic relationship
takes the form Y = aX? + bX +c.
e The following fig. 5.6.3 shows a scatter plot in which there doesn’t appear to be any relationship
between X and Y.
Oo
8- °
J o 0
p 6 , °
474° 6
2
2 kl T as T
2 4 6 8 10
xX
Fig. 5.6.3 : Scatter plot with no relationship between the variables X and Y.
The variables in the scatter plot shown are unrelated or independent; you can see this by the lack of any pattern
in the data.
W TechKnowledge
Publicacions
BF Aland DS-1 (MU)
5-26 Exploratory Data Analysis
e In addition to showing the relationship betwe
en two v arlables, a scatter plot can
outliers. The following fig.5.6.4 shows a also show the presence of
d ataset with one observation that is substantially
other observations, different from the
504 3
40-
¥ 30 oo
20 4 ; 5 0 °
10 + 9
2 4 6 8 10
xX
Review Questions
[email protected] What are the tools required for exploratory data analysis.
@.10 Write short note on Graphical/Non graphical Methods.
O00
sR
INTRODUCTION TO
MACHINE LEARNING
Introduction to Machine Learning, Types of Machine Learning: Supervised (Logistic Regression, Decision Tree, Support
Vector Machine) and Unsupervised (K Means Clustering, Hierarchical Clustering, Association Rules) Issues in Machine
learning, Application of Machine Learning Steps in developing a Machine Learning Application.
Self-Learning Topics : Real world case studies on machine learning
Machine learning is a subfield of artificial intelligence (AI). The goal of machine learning generally is to
understand the structure of data and fit that data into models that can be understood and utilized for deriving
business decisions.
Although machine learning is a field within computer science, it differs from traditional computational
approaches. In traditional computing, algorithms are sets of explicitly programmed instructions used by
computers to calculate or problem solve.
Machine learning algorithms instead allow for computers to train on data inputs and use statistical analysis in
order to output values that fall within a specific range. Because of this, machine learning facilitates computers in
building models from sample data in order to automate decision-making processes based on data inputs.
Machine learning is a continuously developing field.
Any technology in use today has benefitted from machine learning. Facial recognition technology allows social
media platforms to help users tag and share photos of friends. Optical character recognition (OCR) technology
converts images of text into movable type.
Recommendation engines, powered by machine learning, suggest what movies or television shows to watch next
based on user preferences. Self-driving cars that rely on machine learning to navigate may soon be available to
consumers.
¢ In supervised learning, the computer is provided with example inputs that are labeled with their desired
tputs. T The purpose of this method is for the algorithm to be able to “learn” by comparing its actual output with
outputs.
the “taught outputs to find errors, and modify the model accordingly.
Supervised learning therefore uses
patterns to predict label values on additional
unlabeled data,
images of sharks Jabeled as fish and
er vam with supervised learning, an algorithm may be fed data with
be
” cok, oceans labeled as water. By being trained on this data, the supervised learning algorithm should
able to later identify unlabeled shark images as fish and unlabeled ocean images as water.
A cornman use case of supervised learning is to use historical data to predict statistically likely future events. It
may use historical stock market information to anticipate upcoming fluctuations, or be employed to filter out
nk emails. In supervised learning, tagged photos of dogs can be used as input data to classify untagged photos
of dogs.
Correlation and regression are commonly used techniques for investigating the relationship among quantitative
variables. Correlation is a measure of association between two variables that are not designated as either
dependent or independent. Regression at a basic level is used to examine the relationship between one
dependent and one independent variable. Because regression statistics can be used to anticipate the dependent
variable when the independent variable is known, regression enables prediction capabilities.
* — Logistic regression is another technique borrowed by machine learning from the field of statistics.
¢ — Itis the go-to method for binary classification problems (problems with two class values). In this post you will
discover the logistic regression algorithm for machine learning.
e _ Logistic regression is named for the function used at the core of the method, the logistic function.
e The logistic function, also called the sigmoid function was developed by statisticians to describe properties of
. population growth in ecology, rising quickly and maxing out at the carrying capacity of the environment. It’s an
S-shaped curve that can take any real-valued number and map it into a value between 0 and 1, but never exactly
at those limits.
1/ (1+ e*-value)
Where e is the base of the natural logarithms and value is the actual numerical value that you want to transform.
&
Below is a plot of the numbers between -5 and 5 transformed into the range 0 and 1 using the logistic function.
. [
~—_f
7 Fig. 6.2.1 : Logical Regression
W Tech Knowledge
Pubtications
SF Aland DS-1 (MU) 6-3 Introduction to Machine Learning
* Logistic regression uses an equation as the representation, very much like linear regression.
Greek capital letter
* — Input values (x) are combined linearly using weights or coefficient values (referred to as the
Beta) to predict an output value (y).
e Akey difference from linear regression is that the output value being modelled is a binary values (0 or T)rather
than a numeric value.
Below is an example logistic regression equation :
y = e(b0+b1*x) / (1 + e%(b0 + b1*x))
for the single input
Where y is the predicted output, b0 is the bias or intercept term and b1 is the coefficient
value (x).
be learned from
e Each column in your input data has an associated b coefficient (a constant real value) that must
your training data.
the
e The actual representation of the model that you would store in memory or in a file is the coefficients in
equation (the beta value or b’s).
e Logistic regression models the probability of the default class (e.g. the first class).
e For example, if we are modeling people’s sex as male or female from their height, then the first class could be
male and the logistic regression model could be written as the probability of male given a person’s height, or
more formally:
P(sex = male|height)
e Written another way, we are modeling the probability that an input (X) belongs to the default class (Y=1), we can
write this formally as:
e Logistic regression is a linear method, but the predictions are transformed using the logistic function. The
impact of this is that we can no longer understand the predictions as a linear combination of the inputs as we
can with linear regression, for example, continuing on from above, the model can be stated as:
p(X) = e4(b0+b1*X) / (1+ e*(b0 + b1*X)).
¢ For general use, decision trees are employed to visually represent decisions and show or inform decision
making. When working with machine learning and data mining, decision trees are used as a predictive model.
These models map observations about data to conclusions about the data’s target value.
¢ The goal of decision tree learning is to create a model that will predict the value of a target based on input
variables.
e Inthe predictive model, the data’s attributes that are determined through observation are represented by the
branches, while the conclusions about the data’s target value are represented in the leaves,
e When “learning” a tree, the source data is divided into subsets based on an attribute value test, which is repeated
on each of the derived subsets recursively. Once the subset at a node has the equivalent value as its target value
has, the recursion process will be complete.
e Let's look at an example of various conditions that can determine whether or not someone should go fishing.
This includes weather conditions as well as barometric pressure conditions.
WZ TechXnowledge
Pudlications
d DS-1(MU Introduction to Machine Learning
14 Alan (MU) 6-4 ———
—_— —
° ihe sim een tree depicted in the Fig, 6.2.2, an example is classified by sorting it through the tree to
the appropriate leaf node. This then returns the classification assoclated with the particular leaf, which in this
suitable for going
a Yes or a No. The tree classifies a day’s conditions based on whether or not it is
se is either a Yes
hi ad
ishing.
e — Atrue classification tree data set would have a lot more features than what is outlined above, but relationships
should be straightforward to determine.
e When working with decision tree learning, several determinations need to be made, including what features to
choose, what conditions to use for splitting, and understanding when the decision tree has reached a clear
ending.
Conditions
CSD QS &
Fig. 6.2.2 : Decision Tree Example
TechKnowledya
Pubtications
= Aland DS-I (MU) 68 Introduction to Machine Learning
WwW
Best hyperplane
o oO Sa~
O
O
A
Oo
/ AA
OSs A
A
However, you can create many different ML models to separate the two classes, and not all of them are of equal
value.
e In some datasets, the classes are not “linearly separable,” which means they can’t be separated with a straight
line. In such cases, ML engineers use “kernel tricks,” mathematical transformations that can make the data
linearly separable.
e — Similarly, the polynomial kernel can be applied to other datasets where the boundaries between classes are
more complicated. For example, the dataset below was generated with the make_moons() function of Python’s
Scikit-Learn machine learning library. Evidently, the two classes are not linearly separable. But an SVM with a
third-degree polynomial kernel can clearly separate the two classes with a smooth boundary (right figure).
15 1.57
Q le
1.0 a C @ aa iy a
D Ls} Pane di a
0.5+—1ng i a
Xo a} i an ne pa 4
0.0 ao a “| 4 .| og Pa 4
Q A a4 ° a 4 a 4
weal ayaa
05 Mh te
4 |
0 +
-1.5 -1.0-05 00 05 10 15 20 25 45-10-05 00 GS 10 15 20 25
Xy xX. 1
Fig. 6.2.4: The kernel trick can help SVMs classify classes that are not linearly separable
e Despite the advent of more advanced machine learning algorithms such as deep neural networks, support vector
machines remain popular because of their fast training time, low compute requirements, and ability to learn with
fewer training examples.
Ww TechKnowtadga
Publicacions
SF Aland DS-1 (MU) 6-6 Introduction to Machine Learning
SS
The goal of unsupervised learning may be as straightforward as discovering hidden patterns within a dataset,
but it may also have a goal of feature learning, which allows the computational machine to automatically
discover the representations that are needed to classify raw data.
Unsupervised learning is commonly used for transactional data. You may have a large dataset of customers and
their purchases, but as a human you will likely not be able to make sense of what similar attributes can be drawn
from customer profiles and their types of purchases,
With this data fed into an unsupervised learning algorithm, it may be determined that women of a certain age
range who buy unscented soaps are likely to be pregnant, and therefore a marketing campaign related to
pregnancy and baby products can be targeted to this audience in order to increase their number of purchases.
Without being told a “correct” answer, unsupervised learning methods can look at complex data that is more
expansive and seemingly unrelated in order to organize it in potentially meaningful ways.
Unsupervised learning is often used for anomaly detection including for fraudulent credit card purchases, and
recommender systems that recommend what products to buy next. In unsupervised learning, untagged ph otos
of dogs can be used as input data for the algorithm to find likenesses and classify dog photos together.
K-means algorithm explores for a preplanned number of clusters in an unlabelled multidimensional dataset, it
concludes this via an easy interpretation of how an optimized cluster can be expressed.
o Firstly, the cluster centre is the arithmetic mean (AM) of all the data points associated with the cluster.
These two
o Secondly, each point is adjoint to its cluster centre in comparison to other cluster centres.
interpretations are the foundation of the k-means clustering model.
In simple terms, k-means clustering enables us to cluster the data into several groups by detecting the distinct
categories of groups in the unlabelled datasets by itself, even without the necessity of training of data.
This is the centroid-based algorithm such that each cluster is connected to a centroid while following the
objective to minimize the sum of distances between the data points and their corresponding clusters.
As an input, the algorithm consumes an unlabelled dataset, splits the complete dataset into k-number of clusters,
and iterates the process to meet the right clusters, and the value of k should be predetermined.
Specifically performing two tasks, the k-means algorithm
o Assigns every data point to its nearest k-centre, and the data points, closer to a particular k-centre, make a
cluster. Therefore, data points, in each cluster, have some similarities and far apart from other clusters.
By specifying the value of k, you are informing the algorithm of how many means or centres you are looking for.
Again repeating, if k is equal to 3, the algorithm accounts it for 3 clusters.
TechKnowledga
Publications
w Al and DS-I (MU) 6-7 Introduction to Machine Learning
After that, k-means determines the centre by accounting the mean of all data points referred to that cluster
centre. It reduces the complete variance of the intra-clusters with respect to the prior step. Here, the “means”
defines the average of data points and identifies a new centre in the method of k-means clustering.
The algorithm gets repeated among the steps 2 and 3 till some paradigm will be achieve d such as the sum of
distances in between data points and their respective centres are diminished, an appropriate number of
iterations is attained, no variation in the value of cluster centre or no change in the cluster due to data points.
Ona core note, three criteria are considered to stop the k-means clustering algorithm
An algorithm can be brought to an end if the centroids of the newly constructed clusters are not altering. Even
after multiple iterations, if the obtained centroids are same for all the clusters, it can be concluded that the
algorithm is not learning any new pattern and gives a sign to stop its execution/training toa dataset.
The training process can also be halt if the data points stay in the same cluster even after the training the
algorithm for multiple iterations.
At last, the training on a dataset can also be stopped if the maximum number of iterations is attained, for
example, assume the number of iterations has set as 200, then the process will be repeated for 200 times (200
iterations) before coming to end.
For a large number of variables present in the dataset, K-means operates quicker than Hierarchical clustering.
While redetermining the cluster centre, an instance can modify the cluster.
Ww
Moreover, it is fast, robust and uncomplicated to understand and yields the best outcomes when datasets are
A
It is not directly applicable to categorical data since only operatable when mean is provided.
Also, Euclidean distance can weight unequally the underlying factors.
mk
The algorithm is not variant to non-linear transformation, i.e provides different results with different portrayals
of data.
Ww Tech Knowledge
Publications
— Es
ine Learning
te {and DS-1 (MU) Introduction to Mach
6-8
=6.2.2(B) Hierarchical Clustering
i
i
Her hical i is an alternative approach to k-means clustering for Identifying groUPs ina data set. In
a cl ustering
does not require us
hlerarch ical clusteri ng will create a hierarchy of clusters and therefore
contrast to k-means,
to pre-specify the number of clusters,
k-means clustering in that its results can be
» Furthermore, hierarchical clustering has an added advantage over
.
based representation called a dendrogram
easily visualized using an attractive tree- DIANA
, Cluster Dendrogram
{
sais a
SeeSs CSSeg G39
Gs
cea
ODODESELOUVAUVAL
OOF Ng TO Pees
's q2t
= =Os BSE
GSE 92 FOE GY ode CQc SSR ir= 2e8 ES80
Ssos § Suee°S=0esegsgsso |
LSL ESE S 2chsse
2 cefy-~
Eoss sr aa ot 3
SOS ES QReSe cE OMS OSG
6CHSxXSSs
2
oO5sO9 2 e-3
-3 & ZO ,
AGNES 32s s
o2Zz g
own) clustering.
m-up) versus DIANA (top-d
Fig. 6.2.5 : AGNES (botto
ve NESting) works in a bottom-up
e cl us te ri ng : Co mmo nly referred to as AGNES (AGglomerati
1. Agglomerativ (leaf). At each step of the
That is, each observ ation is initiall y considered as a single-element cluster
manner. bigger cluster (nodes). This
most similar are combined into a new
algorithm, the two clusters that are the a tree which
of just one single big cluster (root). The result is
procedure is iterated until all points are a member
4 dendrogram.
can be displayed using lysis) works in a top- down
rin g : Co mm on ly ref err ed to as DIANA (Divise ANA
Divisive hierarchical cluste
ich all observations are included in a
2.
ers e of AG NE S. It beg ins wit h the roo t, in wh
manner, DIANA is like the rev sters that are considered mos
t
rent cluster is split into two clu
algorithm, the cur ster.
hetgle
sin cluste
erogen eachproste
eour.s.AtThe is iterated until all observations are in their own clu
cesp s OF
WF Tech Knowledge
Publications
SF Aland DS-I (MU) 6-9 Introduction to Machine Learning
1, K-means clustering produces a specific number of clusters for the disarranged and flat catase where
Hierarchical clustering builds a hierarchy of clusters, not for just a partition of objects under various Clustering
methods and applications.
2. K-means can be used for categorical data and first converted into numeric by assigning rank, where Hierarchical
clustering was selected for categorical data but due to its complexity, a new technique is considered to assign
rank value to categorical features,
3. _K-means are highly sensitive to noise in the dataset and perform well than Hierarchical clustering where it
is less sensitive to noise in a dataset.
4. Performance of the K-Means algorithm increases as the RMSE decreases and the RMSE decreases as the number of
clusters increases so the time of execution increases, in contrast to this, the performance of Hierarchical
clustering is less.
5. K-means are good for a large dataset and Hierarchical clustering is good for small datasets.
e Association rule learning is a rule-based machine learning method for discovering relations between variables in
large databases. Goal is to identify strong relations discovered in datasets using some measures such as
confidence or lift.
e Anassociation rule is an implication expression of the form X-+Y, where X and Y are seperate item sets. A more
concrete example based on consumer behaviour would be {Diapers}—{Beer} suggesting that people who buy
diapers are also likely to buy beer. To evaluate the "interest" of such an association rule, different metrics have
been developed. The current implementation make use of the confidence and lift metrics as which we just
mentioned above linkcode
e [fa customer buys bread, he’s 70% likely of buying milk
e In the above association rule, bread is the antecedent and milk is the consequent. Simply put, it can be
understood as a retail store’s association rule to target their customers better. If the above rule is a result of a
thorough analysis of some data sets, it can be used to not only improve customer service but also improve the
company’s revenue.
e Additional to above the association rule based learning techniques are also used in many applications such as
recommendation systems, medical diagnosis, protein sequence, census data or even crime prevention.
¢ The Apriori algorithm is one of the main technique of association rule mining, which can be basically described
as finding the most frequent itemsets in a dataset.
Main Concepts of Association Rules / Apriori Algorithm
1. Support
Support is an indication of how frequently the item set appears in the dataset. In other words, this is an indication of
how popular an itemset is in a dataset
Transactions containing both X and Y
Support ({X}— {Y}) = Total number of transactions
Ww Tech Knowledge
Publicacians
SF Aland Ds-1(MU) Introduction to Machine Learning
6-10
2. Confidence
Confidence is an indication of how often the rule has been found to be true In other words, confidence says how
likely item Y is purchased when item X is purchased
Lift is a metric to measure the ratio of X and Y occur together to X and Y occurrence if they were statistically
independent. In other words, lift illustrates how likely item Y is purchased when item X is purchased, while
controlling for how popular item Y is.
A Lift score that is close to 1 indicates that the antecedent and the consequent are independent and occurrence
of antecedent has no impact on occurrence of consequent.
A Lift score that is bigger than 1 indicates that the antecedent and consequent are dependent to each other, and
the occurrence of antecedent has a positive impact on occurrence of consequent.
A Lift score that is smaller than 1 indicates that the antecedent and the consequent are substitute each other that
means the existence of antecedent has a negative impact to consequent or visa versa.
Conviction
Conviction measures the implication strength of the rule from statistical independence Conviction score is a ratio
between the probability thatX occurs without Y while they were dependent and the actual probability of X existence
without Y. For instance; if (French fries) --> (beer) association has a conviction score of 1.8; the rule would be
incorrect 1.8 times as often (80% more often) if the association between totally independent.
1-support (C)
Conviction (A— C) = 1 ~ confidence (A— C)’ range
: [0, -%]
The IF component of an association rule is known as the antecedent. The THEN component is known as the
consequent. The antecedent and the consequent are disjcint; they have no items in common.
=p=
Antecedents Consequents
{
Fig. 6.2.6
wy Tech Knowledge
Publications
Learning
Introduction to Machine
SF Aland DS-1 (MU) 6-11
We don't want our algorithm
R to make inaccurate or faulty pred ictions. Hence the quality of data is essential
yp ssin g which includes
to enhance the output. Therefore, we need to ensure that the process of data preproce ‘ererutmiost Level
th the utmo ve
removing outliers, filtering missing values, and removing unwanted features, is done wi
of perfection.
2. Under fitting of Training Data
e relationship between input and output
This process occurs when data is unable to establish an accurat
variables. It simply means trying to fit in undersized Jeans.
It signifies the data is too simple to establish a precise relationship. To overcome this issue:
o = Maximi ze
the trainin g time
Enhance the complexity of the model
98
ay Tech Knowledge
Pudlications
BF Aland Ds-1 (MU) 6-12 Introduction to Machine Learning
One day you decided to explain to a child how to distinguish between an apple and a watermelon. You will
take an apple and a watermelon and show him the difference between both based on their color, shape, and
taste. In this way, soon, he will attain perfection in differentiating between the two.
But on the other hand, a machine-learning algorithm needs a lot of data to distinguish. For complex
problems, it may even require millions of data to be trained. Therefore we need to ensure that Machine
learning algorithms are trained with sufficient amounts of data.
6. Slow Implementation
This is one of the common issues faced by machine learning professionals. The machine learning models
are highly efficient in providing accurate results, but it takes a tremendous amount of time. Slow programs,
data overload, and excessive requirements usually take a lot of time to provide accurate results.
Further, it requires constant monitoring and maintenance to deliver the best output.
7. Imperfections in the Algorithm When Data Grows
So you have found quality data, trained it amazingly, and the predictions are really concise and accurate.
The model may become useless in the future as data grows. The best model of the present may become
inaccurate in the coming Future and require further rearrangement.
So need regular monitoring and maintenance to keep the algorithm working. This is one of the most
exhausting issues faced by machine learning professionals.
eae
Automatic FF
“aus > Language Pet oo
-\ Transtation Rey 2k he
: Recognition a
Application of
Machine leaming
( “Product ae A
-fecommend: -®
: -ations | é
mA Malware Ve
ee\ Filtering: |:
W TechKnowledge
Pablicatiaons
RT SE LEE “S
* _ If we want to visit a new place, we take help of Google Maps, which shows us the correct path with the
shortest route and predicts the traffic conditions.
* — It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily congested with
the help of two ways :
o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.
e Everyone who is using Google Map is helping this app to make it better. It takes information from the user
and sends back to its database to improve the performance.
4. Product recommendations
* Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some product
on Amazon, then we started getting an advertisement for the same product while internet surfing on the
same browser and this is because of machine learning.
* Google understands the user interest using various machine learning algorithms and suggests the product
as per customer interest.
e As similar, when we use Netflix, we find some recommendations for entertainment series, movies, etc.,
and
this is also done with the help of machine learning.
5. Self-driving cars:
¢ One of the most exciting applications of machine learning is self-driving cars. Machine learning plays a
significant role in self-driving cars. Tesla, the most popular car manufacturing company is working on self-
driving car.
e It is using unsupervised learning method to train the car models to detect people and objects while driving.
we TechKnowlsdge
Publications
SF Aland DS-1 (MU) 6-14 Introduction to Machine Learning
6. Email Spam and Malware Filterin
g
Whenever we receive a new email, it is filtered automatically as important, normal, and spam. We always
receive an important mail in our Inbox with the impo tant symbol and spam emails in our spam box, and the
technology behind this is Machine learning, Below are some spam filters used by Gmail :
oO Content Filter
o Header filter
oO General blacklists filter
Oo Rules-based filters
© Permission filters
Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree, and Naive Bayes
classifier are used for email spam filtering and malware detection.
We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As the name
suggests, they help us in finding the information using our voice instruction. These assistants can help us in
various ways just by our voice instructions such as Play music, call someone, Open an email, Scheduling an
appointment, etc.
Machine learning is making our online transaction safe and secure by detecting fraud transaction.
Whenever we perform some online transaction, there may be various ways that a fraudulent transaction
can take place such as fake accounts, fake ids, and steal money in the middle of a transaction. So to detect
this, Feed Forward Neural network helps us by checking whether it is a genuine transaction or a fraud
transaction.
For each genuine transaction, the output is converted into some hash values, and these values become the
input for the next round. For each genuine transaction, there is a specific pattern which gets change for the
fraud transaction hence, it detects it and makes our online transactions more secure.
Machine learning is widely used in stock market trading. In the stock market, there is always a risk of up and
downs in shares, so for this machine learning's long short term memory neural network is used for the
prediction of stock market trends.
In medical science, machine learning is used for diseases diagnoses. With this, medical technology is
growing very fast and able to build 3D models that can predict the exact position of lesions in the brain.
Ww Tech Knowledge
Publications
BF Aland DS-1 (MU) Introduction to Machine Learning
6-15 _
11. Automatic Language Translation
° Nowadays, if we visit a new place and we are not aware of the language then it‘tis oenot ages.
a problem at sall, cuts
oocle' as for
our known ear ine Learal e that
this also machine learning helps us by converting the text into
which Is 4 Neura
(Google Neural Machin e Translation) provide this feature,
translates the text into our familiar language, and it called as automatici translati ion.
:
ing algorithm, which is
* — The technology behind the automatic translation is a sequence to sequence learning
, language.
used with image recognition and translates the text from one language to another !anguae
6.3.2 Steps in developing a Machine Learning Application
ied, . sequenc es. Below are the
Building a machine learning application is an iterative process and follows a set of
steps involved in for developing machine learning applications:
1. Problem framing
This first step is to frame a machine learning problem in terms of what we want to predict and what kind of
observation data we have to make those predictions. Predictions are generally a label or a target orev it may
be a yes/no label (binary classification) or a category (multiclass classification) or a real number (regression).
Collect and clean the data
* — Once we frame the problem and identify what kind of historical data we have for prediction modeling, the
next step is to collect the data from a historical database or from open datasets or from any other data
sources,
¢ Not all the collected data is useful for a machine learning application. We may need to clean the irrelevant
data, which may affect the accuracy of prediction or may take additional computation without aiding in the
result.
Prepare data for ML application
Once the data ready for the machine learning algorithm, we need to transform the data in the form that the ML
system can understand. Machines cannot understand an image or text. We need to convert it into numbers. It
also requires building data pipeline depending on the machine learning application needs.
Feature engineering
e Sometimes a raw data may not reveal all the facts about the targeted label. Feature engineering is a
technique to create additional features combining two or more existing features with an arithmetic
operation that is more relevant and sensible.
e For example: In a compute engine, it is common for RAM and CPU usage to reach 95%, but something is
messy when RAM usage is at 5% and CPU is at 93%. We can use a ration of RAM to CPU usage as a new
feature, which may provide a better prediction. If we are using deep learning, it will automatically build
features itself; we do not need explicit feature engineering.
5. Training a model
e Before we train the model, we need to split the data into training and evaluation sets, as we need to monitor
how well a model generalizes to unseen data. Now, the algorithm will learn the pattern and mapping
between the feature and the label.
e The learning can be linear or non-linear depending upon the activation function and algorithm. There are a
few hyper parameters that affect the learning as well as training time such as learning rate, regularization,
batch size, number of passes (epoch), optimization algorithm, and more.
Ww TechKnowledge
Publications
BF Aland Ds-1 (MU) 6:16 Introduction to Machine Learning
6. Evaluating and improving model accura
cy
Accuracy is a measure to know how good or bad a model is doing on an unseen validation set. Based on the
e ‘ ;
current learnings, we need to evaluate how a model is doing on a validation set. Depending on the
application, we can use different accuracy metrics. For e.g. for classification we may use, precision and
recall or F1 Score; for object detection, we may use IoU (interaction over union).
If a model is not doing well, we may classify the problem in either of class 1) over-fittin g and 2) under-
fitting.
* When a model is doing well on the training data, but not on the validation data, it is the over-fitting
Scenario. Somehow model is not generalizing well. The solution for the problem includes regularizing
algorithm, decreasing input features, eliminating the redundant feature, and using resampling techniques
like k-fold cross-validation.
* — Inthe under-fitting scenario, a model does poor on both training and validation dataset. The solution to this
may include training with more data, evaluating different algorithms or architectures, using more number
of passes, experimenting with learning rate or optimization algorithm.
¢ — After an iterative training, the algorithm will learn a model to represent those labels from input data and
this model can be used to predict on the unseen data.
Wy TechKnowledge
Publications
a“
Review Questions
Q.1 What are Types of Machine
Learning?
Q.2 Write short note on linear regr
ession.
Q.3 Explain decision tree with Suitable
diagram.
Q.4 — Write short note on Support
vector machine.
Q.5 What are the steps for working of the k-me
ans algorithm?
Q.6 — Explain Stopping Criteria for K-Me
ans Clustering.
Q.7 What are the key features of k-means clust
ering.
Q.8 — Explain limitations with K-Means clust
ering.
Q.9 What are the disadvantages of K-means Clust
ering?
Q.10 Write short note on Hierarchical clustering.
000