0% found this document useful (0 votes)
238 views223 pages

Aids I Book Sem 6

Uploaded by

ap9619319
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
238 views223 pages

Aids I Book Sem 6

Uploaded by

ap9619319
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 223

Artificial Intelligence and Data

Science - 1 (AI and DS — 1)


(Code - ITC604)

Semester VI: Information Technology (Mumbai Merit)

Strictly as per the New Revised Syllabus (Rev-2019 ‘C’ Scheme) of


Mumbai University w.e.f. academic year 2021-2022
(As per Choice Based Credit and Grading System)

Prof. Purva Raut


M. Tech. (Computer Engineering)
Ex. Assistant Professor,
Department of Information Technology,
D.J. Sanghvi College of Engineering, Mumbai,
Maharashtra, India

ME225A_ Price% 275/-

S= TechKnowledge
“FP publications
——$_

AGA 0K
Mumbai University
Third Year of Information Technology (2019 Cours e J
Subject Code Subject Name Credits

ITC604 ‘
Artificial Intelligence and Data Science 03
- 1
Farneoen
Course Objectives :
—_
1. | To introduce the students’ with different issues involved in trying to define
and simulate
intelligence,
2.
ae
To familiarize the students’ with specific, well known Artificial . ;
Intelligence methods, algorithms
and knowledge representation schemes.

3 To introduce students’ different techniques which will help them build simple intelligent
systems
based on AI/IA concepts.
4,
.
To introduce students to data science and problem solving with . vgs
data science and statistics.
5
To enable students to choose appropriately from a wider range of exploratory . .
and inferential
methods for analyzing data, and interpret the results contextually.

$ To enable students to apply types of machine learning methods


for real world problems.
Course Outcomes : On successful completion, of course,
learner/student will be able to:
1. Develop a basic understanding of the building blocks
of Al as presented in terms of intelligent
agents

2. Apply an appropriate problem-solving method


and knowledge-representation scheme.
3. Develop an ability to analyze and formalize the
problem (as a state space, graph, etc.). They will
be able to evaluate and select the appropriate search
method.
4, Apply problem solving concepts with data
science and will be able to tackle them
from a
statistical perspective.

5. Choose and apply appropriately from a wider


range of exploratory and inferential methods
analyzing data and will be able to evaluate and for
interpret the results contextually.
6. Understand and apply types of machine learning
methods for real world problems.
e——
Gatoe
pa sUtaa se
Module Course Module / Contents Periods

0 Prerequisite

Nil

Introduction to AI

Introduction : Introduction to Al, Al techniques, Problem Formulation.


Intelligent Agents: Structure of Intelligent agents, Types of Agents, Agent 04

Environments PEAS representation for an Agent.

Self-Learning Topics : Identify application areas of Al (Refer Chapter 1)


Search Techniques

Uninformed Search Techniques : Uniform cost search, Depth Limited Search,


Iterative Deepening, Bidirectional search. Informed Search Methods: Heuristic
functions, Best First Search, A*, Hill Climbing, Simulated Annealing. Constraint 09
Satisfaction Problem Solving: Crypto-Arithmetic Problem, Water Jug, Graph
Coloring. Adversarial Search: Game Playing, Min-Max Search, Alpha Beta
Pruning. Comparing Different Techniques.
Self-learning Topics : IDA*, SMA* (Refer Chapter
p 2

Knowledge Representation using First Order Logic

Knowledge and Reasoning : A Knowledge Based Agent, WUMPUS WORLD


Environment, Propositional Logic, First Order Predicate Logic, Forward and
Backward Chaining, Resolution. Planning as an application of a knowledge
06
based agent. Concepts of Partial Order planning, Hierarchical Planning and
Conditional Planning.

Self-learning Topics : Representing real world problems as_ planning


problems. (Refer Chapter 3)

Introduction to DS

Introduction and Evolution of Data Science, Data Science Vs. Business Analytics
Vs. Big Data, Data Analytics, Lifecycle, Roles in Data Science Projects. 04

Self-learning Topics : Applications and Case Studies of Data Science in various


Industries (Refer Chapter 4)
Module Course Module / Contents

5 Exploratory Data Analysis

Introduction to exploratory data analysis, Typical data formats. Types of EDA,


Graphical/Non graphical Methods, Univariate/multivariate methods
Correlation and covariance, Degree of freedom Statistical Methods for 08
Evaluation including ANOVA.

Self-learning Topics : Implementation of graphical EDA methods.

(Refer Chapter 5)

Introduction to ML

Introduction to Machine Learning, Types of Machine Learning: Supervised


(Logistic Regression, Decision Tree, Support Vector Machine) and Unsupervised
(K Means Clustering, Hierarchical Clustering, Association Rules) Issues in
08
Machine learning, Application of Machine Learning Steps in developing a
Machine Learning Application.
Self-learning Topics : Real world case studies on machine learning

(Refer Chapter 6)

Total 39

ooo

7
:
WF Aland DS-1 (MU) 1 Table of Contents

Chapter 1: INTRODUCTION TO Al 1-1 to 1-33

Syllabus : Introduction ; Introduction to Al, Al techniques, Problem Formulation. Intelligent Agents


; Structure of Intelligent
agents, Types of Agents, Agent Environments PEAS representation for an Agent.
Self-Learning Topics ; Identify application areas of Al.

11 Introduction to Artificial Intelligence


a
1.2 Foundations of Al
1-1
1.2.1 Acting Humanly : The Turing Test Approach 1-1
1.2.2 Thinking Humanly : The Cognitive Modelling Approach 1-3
1.2.3 Thinking Rationally : The "Laws of Thought” Approach 1-3
1.2.4 Acting Rationally ; The Rational Agent Approach 1-3
13 Categorization of Intelligent Systems
1-4
1.4 Components of Al 1-4
1.4.1 Computational Intelligence Vs Artificial Intelligence
1-6
15 Artificial Intelligence Techniques 1-6
1.6 Problem Formulation 1-10
1.6.1 Components of Problems Formulation 1-10
1.6.2 Example of 8-Puzzle Problem 1-11
1.6.3 Example of Missionaries and Cannibals Problem 1-11
1.6.4 Vacuum-Cleaner Problem 1-12
1.65 Example of Real Time Problems 1-13
1.7 Intelligent Agents 1-13
1.7.1 What is an Agent ? 1-13
1.7.2 Definitions of Agent 1-15
1.7.3 Intelligent Agent 1-16
1.7.3(A) Structure of Intelligent Agents 1-16
1.8 Rational Agent
1-17
1.9 Environments Types and PEAS Properties of Agent 1-19
1.9.1 Environments Types 1-19
1.9.2 PEAS Properties of Agent 1-22
1.10 Types of Agents
1-24
1.10.1 Simple Reflex Agents 1-24
1.10.2 Model-based Reflex Agents 1-26

1.10.3. Goal-based Agents 1-27


1.10.4 Utility-Based Agents 1-27

1.10.5 Learning Agents 1-28

wy TechKnowledge
Puoticatiaqas
||
WF Aland ps - 1 (MU)
Pay

2
¢

NIL Self-Leaming Topics : ent pletion areas of Ant ”


11141 Application Areas of Artificial Trite
li gernce sressssssessnsseseeneeressncerssssseeiet ee mn .
1.11.2 Sub Areas/ Domains of Artificial pti:
al tay
1113 ent Trend:
Current Trends in Artificial
fical Intelligence
Intellig sessessssnssssrscsssesseressennssennseenaseernt
; —

Chapter 2: SEARCH TECHNIQUES “1 —.


to 2-64
Syllabus; Uninformed Search Techniques : Uniform cost search, ive Deepening, Bidirectional Search,
Depth Limited Search, Iterative Deep
Informed Search Methods : Heuristic functions, Best First oe ealing . Const
Search, A*, Hill Climb ing, Sees tke Play ing raint SatisfSear
Min-Max action,
Problem Solving : Crypto-Arithmet
ic Problem, Water Jug, Graph
Coloring. Adversarial Search :
Alpha Beta Pruning, Com ~
paring Different Technique
s.
Self-Learning Topics : IDA*
, SMA‘,
2.1 Measuring Performance of Problem Solvi
ng Algorithm / Agent
2.2 Node Representation in Search “ad
Tree
2.3 Uninformed Search net

2.4 Depth First Search (DFS) ae

2.4.1 Concept ‘a
2.4.2 Implementation
ae

24.3 Algorithm as
2.4.4 Performance Evaluation ag
2.5 Breadth First Search (BFS) 2-4

2.5.1 2-5
Concept
2-5
2.5.2 Process
2.5.3 Implementation 2-5

2.5.4 Algorithm
2-5

2.5.5 2-5
Performance Evaluation
2.6 2-5
Uniform Cost Search (UCS)
2-6
2.6.1 Concept
2.6.2 Implementation 2-6
2-6
2.6.3 Algorithm
2-6
2.6.4 Performance Evaluation
26
2.7 Depth Limited Search (DLS)
-
2.7.1 Concept
2-7
2.7.2 Process
2.7.3 2-7
EmplementatlOnt roniennnnnniitninninnanninitinapeans
innni 2.7
2.7.4 MBO wenn
a 2-7
8G OE meena
it
oOerrerr—rvrrvrcrv 2-8
titi 2-8
Se TechKrowledge
Publicatrans
We Aland DS - 1 (MU) 3 Table of Contents
2.8 Iterative Deepening DFS (IDDFS)
2-8
2.8.1 Concept
a6
2.8.2 Process
a8
2.8.3 Implementation
#10
2.8.4 Algorithm
2-10
2.8.5 Pseudo Code
2-40
2.8.6 Performance Evaluation
2-10
2.9 Bidirectional Search
2-11
2.9.1 Concept
2-11
2.9.2 Process 2-11
2.9.3 Implementation
2-11
2.9.4 Performance Evaluation
2-11
2.9.5 Pros of Bidirectional Search
2-12
2.9.6 Cons of Bidirectional Search
2-12
2.10 Comparing Different Techniques
2-12
2.10.1 Difference between BFS and DFS
2-12
2.11 Informed Search Techniques
2-13
2.12 Heuristic Function
2-14
2.12.1 Example of 8-puzzle Problem
2-15
2.12.2 Example of Block World Problem
2-15
2.12.3. Properties of Good Heuristic Function
2-17
2.13 Best First Search
2-17
2.13.1 Concept
2-17
2.13.2 Implementation
2-18
2.13.3. Algorithm : Best First Search
2-18
2.13.4 Performance Measures for Best First Search
2-19
2.13.5 Greedy Best First Search
2-19
2.13.6 Properties of Greedy Best-first Search
2-20
2.14 A* Search
2-20
2.14.1 Concept
2-20
2.14.2. Implementation
2-20
2.14.3. Algorithm (A*)
2-21
2.14.4 Behaviour of A* Algorithm
2-22
2.14.5 Admissibility of A*
2-23
2.14.6 | Monotonicity
2-24
2.14.7 Properties of A*
2-24
2.14.8 Example: 8 Puzzle Problem using A* Algorithm
2-24

Wee
Wal and DS -4 (MU) A Table of Contents

214.9 — Caparison among Best First Search, A* search and Greedy Best First Search ...mormsumuninnnnnnnnnenrssn 2-27
2.15 Memory Rounded Heuristic Searches nnn.
ea]
2.15.1 Iterative Deepening A* (IDA*) {Self Study Topics} wou 2-27
2.15.2 Simplified Memory-Bounded A* (SMA‘) {Self Study Topic} ae)
| 2.15.3 Advantages of SMA* over A* and IDA‘ el
2.15.4 Limitation of SMA* th
| 2.16 Local Search Algorithms and Optimization Problems
2:31
| 2.16.1 Hill Climbing
231
2.16.1(A) Simple Hill Climbing ost
2.16.1(B) Steepest Ascent Hill Climbing 2-33
2.16.1(C) Limitations of Hill Climbing 2-38
2.16.1(D) Solutions on Problems in Hill Climbing. 2-35
2.16.2 Simulated Annealing 2-35
2.16.2(A) Comparing Simulated Annealing with Hill Climbing 2-36
2.16.3 Local Beam Search
237
. 2.17 Crypto-Arithmetic Problem
2-39
z 2.18 Constraint Satisfaction Problem 2-41
2.18.1 Graph Coloring 2-42
2.18.2 Varieties of CSPs 2-43
| 2.18.3 Varieties of Constraints 2-43
| 2.18.4 — Backtracking in CSPs 2-43
| 2.18.5 Improving Backtracking Efficiency 2-44
2.18.6 Water Jug
2-47
2.19 Adversarial Search 2-49
2.20 Environment Types 2-49
a 2.21 Al Game - Features
2-49
2.21.1 ZeroSum Game 2-50
2.21.2 Non-Zero Sum Game 2-50
2.21.2(A) Positive Sum Game 2-50
2.21.2(B) Negative Sum Game 2-51
2.22 Relevant Aspects of Al Game 2-51
} 2.23 Game Playing
2-51
2.23.1 Type of Games 2-52
2.23,1(A) Chess 2-53
| 2.23.1(B) Checkers 2-54
| 2.23.2 Whatis Game Tree? 2-54

| Fees
|
|
\
WF Aland DS - 1 (MU) 5 Table of Contents
eee eee

2.24 — MiniMax Algorithm


saad
2.24.1 Minimax Algorithm
2-55
2.24.2 Properties of Minimax Algorithm a52
2.25 Alpha Beta Pruning
2-58
2.25.1 Example of a-B pruning
ae
2.25.2 Properties of a-B
2-93
Chapter 3: KNOWLEDGE REPRESENTATION USING FIRST ORDER LOGIC 3-1 to 3-60
Syllabus : Knowledge and Reasoning : A Knowledge Based Agent, WUMPUS WORLD Environment, Propositional Logic, First
Order Predicate Logic, Forward and Backward Chaining, Resolution. Planning as an application of knowledge based agent.
Concepts of Partial Order planning, Hierarchical Planning and Conditional Planning. Self-Learning Topics: Representing real
world problems as planning problems.

3.1 A Knowledge Based Agent. 3-1


3.1.1 Architecture of a KB Agent 3-2
3.2 The WUMPUS World Environment 3-3
3.2.1 Description of the WUMPUS World 3-4
3.2.2 PEAS Properties of WUMPUS World 3-5
3.2.3 Exploring a WUMPUS World 3-6
3.3 Logic 3-8
3.3.1 Role of Reasoning in Al 3-9
3.4 Representation of Knowledge using Rules 3-9
3.5 Propositional Logic (PL) 3-12
3.5.1 Syntax 3-12

3.5.2 Semantics 3-13


3.5.3 What is Propositional Logic ? 3-13
3.5.4 PL Sentence - Example 3-13

3.5.5 Inference Rules 3-14

3.5.6 HornClause 3-16


3.5.7 Propositional Theorem Proving 3-16

3.5.8 Advantages of Propositional Logic 3-17

3.5.9 Disadvantages of Propositional Logic 3-17

3.6 First Order Predicate Logic 3-17


3.6.1 Syntactic Elements, Semantic and Syntax 3-18
3.7 Comparison between Propositional Logic and First Order Logic 3-19
3.8 Inference in FOL 3-20

3.8.1 Forward Chaining 3-20

3.8.2 Backward Chaining 3-21

3.9 Difference Between Forward Chaining and Backward Chaining 3-23


WZ TechKnowledge
Publications
x
———
cae,

WF Aland DS- 1 (MU) 6


Unification and Lifting
3.10.1. Unification mete na 3-99
310.2 LIRING sssiensmmninnsnsinmnnrtsiitinninieenec. aE
EER 3-23
3.11. Resolution eet 3-24
3.11.1 The Resolution Procedure Hn 3-26
3.11.2 Conversion from FOL Clausal Normal Form 3-26
3.11.3 Facts Representation erowne F-27
3.11.4 Exarmplesins s eosnses esineiesunmnmec SEER 3-27
3.12 Planning as an Application of Knowledge Based Agent
mre 328
3.12.1 Simple Planning Agent moreenRS
3.13 Planning Problem esessrssssesssenvissssnnrsesirtisttintieseessecc
a35
3.13.1 | Why Planning? on 36
3.13.1(A) Problem Solving and Planning
~~.
3.14 Goal of Planning
39
3.14.1 Major Approaches
348
3.15 Planning Graphs 99
3.16 Planning as State-Space Search
3-41
3.16.1 Example of State Space Search
3-42
3.17 Classification of Planning with State Space Search
3-44
3.18 Progression Planners
3-44
3.19 Regression Planners
3-45
3.19.1 Heuristics for State-Space Search
3-46
3.20 Total Order Planning (TOP)
3-46
3.21 Partial Order Planning
3-47
3.21.1 POP asa Search Problem 3-47
3.21.2 Consistent Plan is a Solution for POP Problem 3-48
3.22 —_ Hierarchical Planning
3-48
3.22.1 POP One Level Planner 3-49
3.22.2. Hierarchy of Actions 3-50
3.22.3. Planner as8
3,23 Planning Languages ae
3.23.1. Example of Block World Puzzle -
3.23.2 Example of the Spare Tire Problem -

3.24 Self Learning Topics : Representing Real World Problems as Planning 51


3,25 Multi-Agent Planning
3.57
3.26 Conditional Planning rate
BF Aland DS - 1 (MU) 7 Table of Contents
Chapter 4: INTRODUCTION TO DS 4-1 to 4-10

Syllabus : Introduction and Evolution of Data Science, Data Science Vs, Business Analytics Vs. Big Data, Data Analytics, Lifecycle,
Roles in Data Science Projects, Self-Learning Topics : Applications and Case Studies of Data Science in various Industries.

4.1 Introduction of Data Science +1


4.2 Evolution of Data Science 4-1
43 Data Science Vs. Business Analytics Vs. Big Data 4-3
4.3.1 Data Mining vs Data Science 4-3
4.4 Data Analytics 4-4
4.5 Lifecycle Ait
4.5.1 Phases of Data Analytics Lifecycle 4-4
4.6 Roles in Data Science Projects
4-6
4.7 Applications of Data Science 4-8
48 Case Studies of Data Science in various Industries 4-9
Chapter 5: EXPLORATORY DATA ANALYSIS 5-1 to 5-26

Syllabus : Introduction to exploratory data analysis, Typical data formats. Types of EDA, Graphical/Non graphical Methods,
Univariate/multivariate methods Correlation and covariance, Degree of freedom, Statistical Methods for Evaluation including
ANOVA.
Self-Learning Topics: Implementation of graphical EDA methods,

5.1 Introduction to Exploratory Data 5-1


5.1.1 Typical Data Formats 5-2
§.1.1(A) Visualising Distributions 5-2
5.1.2 Typical Values
5-4
5.1.3 Unusual Values 5-6
5.1.4 Missing Values 5-7
5.2 Types of EDA 5-7
5.2.1 Types Of Exploratory Data Analysis 5-7
5.2.2 Tools Required For Exploratory Data Analysis 5-9
5.3 Graphical/Non Graphical Methods
5-9
5.3.1 Multivariate Non-Graphical EDA
5-10
5.3.2 Univariate graphical EDA
5-10
5.3.3 Correlation and Covariance
5-11
5.3.4 A categorical and Continuous Variable 5-12
5.3.5 Two Categorical Variables 5-16
5.3.6 Two Continuous Variables
5-17
5.3.7 Covariance and Correlation Matrices 5-20
we Tech Knowledge
Publications
W Aland DS - 1 (MU) 8 Table of Contents
eee SSS
54 Degree of Freedom
921
55 Statistical Methods for Evaluation Including ANOVA 5-21
5.5.1 Simplest Form and Basic Terms of ANOVA Tests 5-21
| 5.5.2 F Test 5-22
| 5.6 Self-Learning Topics : Implementation of graphical EDA methods 5-24
5.6.1 Box plots 5-24
5.6.2 Histograms 5-24
5.6.3 Scatter Plots 5-24
5.6.4 Normal probability plots 5-26

Chapter 6: INTRODUCTION TO MACHINE LEARNING 6-1 to 6-17

Syllabus : Introduction to Machine Learning, Types of Machine Learning: Supervised (Logistic Regression, Decision Tree,
Support Vector Machine) and Unsupervised (K Means Clustering, Hierarchical Clustering, Association Rules) Issues in Machine
learning, Application of Machine Learning Steps in developing a Machine Learning Application.
Self-Learning Topics : Real world case studies on machine learning
6.1 Introduction to Machine Learning
6-1
6.2 Types of Machine Learning
6-1
6.2.1 Supervised Learning 6-2
6.2.1(A) Logistic Regression 6-2
=e 6.2.1(B) Decision Tree
6-3
Ee 6.2.1(C) Support Vector Machine 6-4
6.2.2 Unsupervised Learning 6-5
6.2.2(A) K Means Clustering 6-6
6.2.2(B) Hierarchical Clustering 6-8
6.2.2(C) Association Rules 6-9
6.3 Issues In Machine Learning 6-10
6.3.1 Application of Machine Learning 6-12
6.3.2 Steps in developing a Machine Learning Application 6-15
6.4 Self-Learning Topics : Real world case studies on machine learning
6-16

000
INTRODUCTION TO AI

Introduction : Introduction to Al, Al techniques, Problem Formulation. Intelligent Agents


: Structure of Intelligent
agents, Types of Agents, Agent Environments PEAS representation for
an Agent.
Self-Learning Topics : Identify application areas of AL

1.1 Introduction to Artificial Intelligence


John McCarthy who has coined the word “Artificial Intelligence” in 1956, has defined
Al as “the science and
engineering of making intelligent machines’, especially intelligent computer programs.
Artificial Intelligence (Al) is relevant to any intellectual task where the machine needs
to take some decision or
choose the next action based on the current state of the system, in short act intelligently
or rationally. As it has a
very wide range of applications, it is truly a universal field.
In simple words, Artificial Intelligent System works like a Human Brain, where a machine
or software shows
intelligence while performing given tasks; such systems are called intelligent systems or expert
systems. You
can say that these systems can “think” while generating output!!!
Al is one of the newest fields in science and engineering and has a wide variety of application
fields. Al
applications range from the general fields like learning, perception and prediction to the specific field,
such as
writing stories, proving mathematical theorems, driving a bus on a crowded street, diagnosing diseases,
and
playing chess.
Al is the study of how to make machines do thing which at the moment people do better. Following are the four
approaches to define Al.

1.2 Foundations of Al

University Question

Q. Explain different definitions of artificial intelligence according to different categories. |


In general, artificial intelligence is the study of how to make machines do things which at the moment human
do
better. Following are the four approaches to define Al.
Historically, all four approaches have been followed by different group of people with different methods.

1.2.1 Acting Humanly : The Turing Test Approach

University Question

Q. Explain Turing test designed for satisfactory operational definition of Al. MU - May 16 |
BF Aland Ds <1 (MU) 1.2 Introduction to Ay
=<

Definition 1: “The art of creating machines that perform functions that pesjulves taletiigente Wheh performed
by people.” (Kurzweil, 1990)
| Definition 2: "The study of how to make computers do things at which, at the moment, people are better.” (Rich
and Knight, 1991)
| ® — To judge whether the system can act like a human, Sir Alan Turing had designed a test known as Turing test.
i ‘ As shown in Fig. 1.2.1, in Turing test, a computer needs to interact with a human pee ae by answering his
| questions in written format. Computer passes the test if a human interrogator, cannot identify whether the
written responses are from a person or a computer. Turing test is valid even after 60 year of research.

Al System
Interrogator

Fig. 1.2.1: Turing Test Environment

e For this test, the computer would need to possess the following capabilities :
1. Natural Language Processing (NLP) : This unit enables computer to interpret the English language and
communicate successfully.
2. Knowledge Representation : This unit is used to store knowledge gathered by the system through input
devices.

3. Automated Reasoning : This unit enables to analyze the knowledge stored in the system and makes new
inferences to answer questions.

Machine Learning : This unit learns new knowledge by taking current input from the environment and
adapts to new circumstances, thereby enhancing the knowledge base of the system.
To pass total Turing test, the computer will also need to have computer vision, which is required to perceive
| | objects from the environment and Robotics, to manipulate those objects.

Capabilities a Computer needs to Possess

Natural
|
i Knowledge Automated Machine |
Language Computer . ted
Processing Representation| | Reasoning Leaming vision Repetics

Fig. 1.2.2: Capabilities a Computer needs to Possess

Fig. 1.2.2 lists all the capabilities a computer needs to have in order to exhibit artificial intelligence. Mentioned
above are the six disciplines which implement most of the artificial intelligence,
4 Aland DS - 1 (MU) 1-3 Introduction to Al

1.2.2 Thinking Humanly : The Cognitive Modelling Approach

Definition 1: "The exciting new effort to make computers think ... machines with minds, in the full and literal
sense”, (Haugeland, 1985)
Definition 2 : “The automation of activities that we associate with human thinking, activities such as decision
making, problem solving, learning ...” (Hellman, 1978)
¢ Cognitive science : It is inter disciplinary field which combines computer models from Artificial Intelligence
with the techniques from psychology in order to construct precise and testable theories for working of human
mind.
e In order to make machines think like human, we need to first understand how human think. Research showed
that there are three ways using which human’s thinking pattern can be caught.
1, Introspection through which human can catch their own thoughts as they go by.
2. Psychological experiments can be carried out by observing a person in action.
3. Brain imaging can be done by observing the brain in action.
* — By catching the human thinking pattern, it can be implemented in computer system as a program and if the
program's input output matches with that of human, then it can be claimed that the system can operate like
humans.
1.2.3. Thinking Rationally : The “Laws of Thought” Approach
Definition 1 : “The study of mental faculties through the use of computational models”. (Charniak and
McDermott, 1985)
Definition 2 : “The study of the computations that make it possible to perceive, reason, and act”.
‘ @ The laws of thought are supposed to implement the operation of the mind and their study initiated the field
called logic. It provides precise notations to express facts of the real world.
¢ — It also includes reasoning and “right thinking” that is irrefutable thinking process. Also computer programs
based on those logic notations were developed to create intelligent systems.
There are two problems in this approach:
1. This approach is not suitable to use when 100% knowledge is not available for anywy prablewe.
2. As vast number of computations was required even to implement a simple human reasoning process; practically,
all problems were not solvable because even problems with just a few hundred facts can exhaust the
computational resources of any computer.
1.2.4 Acting Rationally : The Rational Agent Approach
Definition 1 : “Computational Intelligence is the study of the design of intelligent agents”. (Poole et at, 1998)

Definition 2: “Al... is concerned with intelligent behaviour in artifacts”. (Nilsson, 1998)

Rational Agent
e Agents perceive their environment through sensors over a prolonged time period and adapt to change to create
and pursue goals and take actions through actuators to achieve those goals. A rational agent is the one that does
“right” things and acts rationally so as to achieve the best outcome even when there is uncertainty in knowledge.
e The rational-agent approach has two advantages over the other approaches
1. As compared to other approaches this is the more general approach as, rationality can be achieved by
selecting the correct inference from the several available.
W TechKuowledge
PuGlicadlions
Intro0duction ntotoA Al
___ 1-4
BF‘ Al and Ds - 1 (MU)
letely general and can be
is mathematic ally well defined and comp
2. Rationality has specific standards and hand, is very subjective and
eve it. Human behav ior, on the other
used to develop agent designs that achi
cannot be proved mathematically.
on the reasoning expected from
y and thinking rationally are based
| e The two approaches namely, thinking humanl intelligent
while ; the other two actin g huma nly and acting rationall y are based on the
| intelligent syste ms
behaviour expected from them.
nally approach.
In our syllabus we are going to study acting ratio

1.3 Categorization of Intelligent Systems


can be based
forms of Al. The critical categories of AI
As Al is a very broad concept, there are diffe rent types or are three
am is able to do. Under this consideration there
on the capacity of intelligent program or what the progr
main categories:

1. Artificial Narrow Intelligence/ Weak Al


is built to
general purpose intelligence. An intelligent agent
Weak Al is Al that specializes in one area. It is nota
Al. For
c task is termed as narrow intelligence or weak
solve a particular problem or to perform a specifi
not
beat the chess grandmaster, and since then we have
example, it took years of Al development to be able to
can do, which is does extremely well.
been able to beat the machines at chess. But that is all it
2. Artificial General Intelligence / Strong Al
es in performing any intellectual task that
Strong Al or general Al refers to intelligence demonstrated by machin
weak Al. Using artificial general
human can perform. Developing strong Al is much harder than developing
planning, problem solving,
intelligence machines can demonstrate human abilities like reasoning,
s, corporations’ are working
comprehending complex ideas, learning from self experiences, etc. Many companie
on developing a general intelligence but they are yet to complete it.
3. Artificial Super Intelligence
the
As defined by a leading Al thinker Nick Bostrom, “Super intelligence is an intellect that is much smarter than
|
best human brains in practically every field, including scientific creativity, general wisdom and social skills.”
|

Super intelligence ranges from a machine which is just a little smarter than a human to a machine that is trillion
times smarter. Artificial super intelligence is the ultimate power of Al.

1.4 Components of AI

Al is a vast field for research and it has got applications in almost all possible domains. By keeping this in mind,
components of Al can be identified as follows : (refer Fig. 1.4.1)

1. Perception
2. Knowledge representation
3. Learning
4. Reasoning
5. Problem Solving
6. Natural Language Processing (language-understanding).

OF lectt awmiedse
pubticartons
BF Aland DS- 1 (MU) 1: Introduction to AI

Va
Intelligence
Linguistic intelligence

a
Leaming
isa
Perception
&
Problem solving

Fig. 1.4.1 : Components of AI

Perception
In order to work in the environment, intelligent agents need to scan the environment and the various objects in it. Agent
scans the environment using various sens‘e organs like camera, temperature sensor, etc. This is called as perception.
After capturing various scenes, perceiver analyses the different objects in it and extracts their features and relationships
among them.

Knowledge representation

The information obtained from environment through sensors may not be in the format required by the system. Hence, it
need to be represented in standard formats for further processing like learning various patterns, deducing inference,
comparing with past objects, etc. There are various knowledge representation techniques like Prepositional logic and
first order logic.

Learning

Learning is a very essential part of Al and it happens in various forms. The simplest form of learning is by trial
and
error. In this form the program remembers the action that has given desired output and discards the other trial
actions and learns by itself. It is also called as unsupervised learning. In case of rote learning, the program simply
remembers the problem solution pairs or individual items. In other case, solution to few of the problems is given
as
input to the system, basis on which the system or program needs to generate solutions for new problems.
This is
known as supervised learning.

Reasoning
Reasoning is also called as logic or generating inferences form the given set of facts. Reasoning
is carried out
based on strict rule of validity to perform a specified task. Reasoning can be of two types, deductive
or inductive.
The deductive reasoning is in which the truth of the premises guarantees the truth of the conclusion
while, in
case of inductive reasoning, the truth of the premises supports the conclusion, but it cannot
be fully dependent
on the premises. In programming logic generally deductive inferences are used. Reasoning
involves drawing
inferences that are relevant to the given problem or situation.

Problem-solving
Al addresses huge variety of problems. For example, finding out winning moves on the board games,
planning
actions in order to achieve the defined task, identifying various objects from given images,
etc. As per the types
of problem, there is variety of problem solving strategies in Al. Problem solving methods are
mainly divided into
general purpose methods and special purpose methods. General purpose methods are applicable to
wide range
of problems while, special purpose methods are customized to solve particular type of problems.

W TechKnewledge
Publications
WF Aland ps1 (MU) 1-6 Introduction ¢ oAl
=
=e

6. Natural Language Processing


Natural Language Processing, involves machines or robots to understand and ei me language
iad * a human
speak, and infer knowledge from the speech input. It also involves the active partic s ™ ; chine in the
form of dialog ic, NLP aims at the text or verbal output from the machine or robot. The input and output of an
NLP system can be speech and written text respectively.

1.4.1 Computational Intelligence Vs Artificial Intelligence

Computational Intelligence (CI) Artificial Intelligence (Al)


| Computational Intelligence is the study of the design | Artificial Intelligence is study of making machines which can
of intelligent agents do things which at presents human do better.
Cl involves numbers and computations. Al involves designs and symbolic knowledge
representations,
————__;
CI constructs the system starting from the bottom | Al analyses the overall structure of an intelligent system by
level computations, hence follows bottom-up | following top down approach.
approach,

Cl concentrates on low level cognitive function | Al concentrates of high level cognitive structure design.
implementation.

1.5 Artificial Intelligence Techniques


¢ — Following are the artificial intelligence techniques that behave intellige
ntly and can handle all the limitations of
knowledge.
i 1. Describe and match 4. Constraint satisfaction
ij 2. Goal reduction "5. Generate and test
3. Tree searching 6. Rule based system

* — Biological-inspired AI Techniques :

o Neural networks
Oo Genetic algorithms .
© Reinforcement learning
Let us understand them one by one,
1. Describe and match:
In this technique, system's behaviour is explained
in terms of a finite state model and computation
model.
* Finite state model : It consists of a set
of States, a set of input events and the
Based on the current state and an input relations between them.
event, the next state of the model can be
determined.
* Computation model ; It is a finite state machine
which in cludes a set of States, a start state, a transi
function and an input alphabet. The transition funct tion
ion provides mapping between input symbols
current states to a next and
state,
8 Transition relation : If a pair of states (S, S')
is such that one move takes the system from S
transition relation to S', then the
is represented by $ > S’, State-transition system is called
deterministic if every state has
at most one successor; It is called non-deterministic
if at least one state has more than one successor.

ray Tech Knowledge


Publications
BF Aland DS-1(MU) 1-7 Introduction to Al
¢ — Representation of the computational system includes start and end state descriptions and a set of possible
transition rules that might be applied. Problem is to find the appropriate transition rules.
e Example : Problem of Towers of Hanoi with two discs. The state transitions are shown for the same.
Problem Definition
Move the disks from the leftmost post to the rightmost post with following conditions :
* Alarge disk can never be put on top of a smaller one; Initial state Goal state
e Only one disk can be moved at a time, from one peg to another; _ ak | | _ . | | ca
e The middle post is only to be used for intermediate storage.
e Complete the task in the smallest number of moves possible. Fig. 1.5.1
e Possible state transitions in the Towers of Hanoi puzzle with 2 disks.
(1,2){] (]

at)
/
et Lt+— LLL 2) 1)

/
Likumea am
INPb
when — yy
[] (0) [1,2] {1} () [2] (1] [2] {] O [1,2] []
Fig. 1.5.2
¢ In this problem, the optimal solution is the sequence of transitions from the start state downward to the
extreme left branch of the transition tree.
2. Tree searching
* Tree searching is a very commonly used technique to solve many problems. Goal reduction, constraint
networks are few of the areas. Tree is searched through many nodes to obtain the goal node and it gives the
path from source node to destination node.
* — Ifeach node of the entire tree is explored while search for the goal node, it is called as exhaustive
search. All
the searching techniques of Al are broadly classified as uninformed searching and informed searching.

Un-informed Searching Techniques | Informed Searching techniques

Depth First Search Hill Climbing


Breadth First Search Best First Search
Uniform Cost Search A* Search
Depth Limited Search Iterative deepening A*

Iterative Deepening DFS Beam Search


Bidirectional Search AO* Search

Ww Tech Knowledge
Puolicatians
WF Aland Ds <1 (Mu) 1-8 Introduction tq Al
<==

3. Goal reduction

In goal reduction technique, the main goal is divided into sub goals. [t requires some procedures to follow.
Goal reduction procedures are an alternative to declarative and logic based representations.
Goal reduction process includes hierarchical sub division of goals into sub goals is carried out until the sub
goals have an immediate solution. An AND-OR tree/graph structure can represent relations between goals
and sub-goals, alternative sub-goals and conjoint sub-goals.
There are different goal levels. Higher-level goals are higher in the tree, and lower level goals are lower in
the tree. There are directed arcs from a higher level to lower level nodes, representing the reduction of
higher level goal to lower level sub goals. Nodes are at the leaf nodes of the tree. These goals cannot be
divided further.

Example
An AND-OR tree structure to represent facts such as “enjoyment”, “earning/save money”, “old age” etc.

Improve enjoyment of life (>)


OR

Provide for old eage CY e&> Improve standard E>


And ofliving Work less hard

Eam more money

Go on strike E> Ces) Improve productivity


And-Or tree/graph structure

Fig. 1.5.3
The AND-OR tree structure describes following things :
Hierarchical relationships between goals and sub-goals
The “Earn more money”, is a sub-goal of “Improve
standard of living’, which in turn is a sub-goal of
“Improve
enjoyment of life”.
Alternative ways of trying to solve a goal
The “Go on strike” and “Improve productivity”
are alternative ways of trying to “Earn more money
”,
Conjoint sub-goals
These are the goal which depends on more than one
sub-goals,
To “Provide for old age”, not only need to “Earn
more money”, but also need to “Save money”,
4. Constraint satisfaction
Constraint satisfaction is a process of gener
ating solution that Satisfies all the speci
fied constraints for a
given problem. There are variable which needs to get assigned value
s from a specific domain.
To generate the solution it uses backtracking mecha
nism. The optimal solution is the one which is
generated in minimum number of backtracks
and satisfies all the constraints, thereby assig
values to each of the variable. ning proper

Ww Tech Knowledge
Publications
BF Aland DS- 1 (MU) 1-9 Introduction to Al
There are multiple fields having application of constraint satisfaction. Artificial Intelligence, Programming
Languages, Symbolic Computing, and Computational Logic are few of them to name.
Example : N-qucen problem, as in this problem the queens are the variables to whom a position in an n*n
matrix need to be assigned, which is a value for those variables, Also the problem states some conditions on
placement of those queens (variable), ic. no two queens can clash either horizontally, vertically or
diagonally.
5. Generate and test

It is the simplest form of searching technique. Generate-and-test method first generates a node in the
search tree and then checks for whether it’s a goal node. It involves two processes as shown in Fig. 1.5.4.
1. Generate the successor node.
2. Test to match it with each of the proposed goal node.
As there are many unfruitful nodes gets generated during this process, it's not a very efficient technique.

Example : Problem of opening a combination lock by trial and error technique without knowing the
combination.

Generate Solution Node

Goal Test ; =

Fig. 1.5.4 : Generate and Test

6. Rule based system

Rule based systems are the simplest and most useful systems. In its simplest form they have set of rules, an
interpreter and input from the environment. Rules are of the form IF <condition> THEN <action>.
a - Interpreter.

Conditions

Fig. 1.5.5 : Rule Based System

Rule based system components are as follows:


1. Working memory or knowledge base : It stores the facts of the outside world that are observed
currently, It may take a form of a triplet< Object, Attribute, Value>. For example : To represent the fact
that “The colour of my car is black”, the knowledge base will store<car, colour black> triplet,
2. Rule base : Rule base is the basic part of the system. It contains all the rules. Rules are generated from
the domain knowledge and can be used or modified based on the current observations.
For example : IF <(temperature, over, 20)>
THEN <add (ocean, swimmable, yes)>

WF TechKnowledge
Publications
e
WF Aland bs-1(MU) 1-10 _
J j
arroduction =
to A
While using a particular rule, first the conditions are matched to the current observations and i
satisfied, then rule may be fired.
3. Interpreter : It performs the matching of the rule with respect to the current precepts observed in the
environment, It has three repetitive tasks to follow : Retrieval of the matching rule, refinement of the
rule and execution of the rule by performing the corresponding action.

1.6 Problem Formulation

University Question
Q. Explain how you will formulate search problem.
CECE)
* — Given a goal to achieve; problem formulation is the process of deciding what states to be considered and what
actions to be taken to achieve the goal. This is the first step to be taken by any problem solving agent
* State space : The state space of a problem is the set of all states reachable from the initial state by executing any
sequence of actions. State is representation of all possible outcomes.
* The state space specifies the relation among various problem states thereby, forming a directed network or
graph in which the nodes are states and the links between nodes represent actions.
° State Space Search: Searching in a given space of states pertaining to a problem under consideration is called a
State space search.
¢ Path: A path is a sequence of states connected by a sequence of actions, ina given state space.

1.6.1 Components of Problems Formulation

University Question
Q. Explain steps in problem formulation with example.

Problem can be defined formally using five components as follows :

1. Initial state 2. Actions


3. Successor function 4. Goaltest
5. Path cost

1, Initial state : The initial state is the one in which the agent starts in.
2. Actions : It is the set of actions that can be executed or applicable in
all possible states. A description of what
each action does; the formal name for this is the transition model.
3. Successor function : It is a function that returns a state on executi
ng an action on the current state.
4. Goal test : It is a test to determine whether the current state is a goal state.
In some problems the goal test can
be carried out just by comparing current state with the defined goal state,
called as explicit goal test. Whereas,
in some of the problems, state cannot be defined explicitly but needs to be
generated by carrying out some
computations, it is called as implicit goal test.
For example : [n Tic-Tac-Toe game making diagonal or vertical or horizont
al combination declares the winning
state which can be compared explicitly; but in the case of chess game, the goal
state cannot be predefined but it’s
a scenario called as “Checkmate”, which has to be evaluated implicitl
y.

W Tech Knowledge
Puplicacions
*

¥F Aland DS-1(MU) 111 Introduction to Al

5. Path cost: Itis simply the cost associated with each step to be taken to reach to the goal state. To determine the
cost to reach to cach state, there is a cost function, which is chosen by the problem solving agent.
Problem solution : A well-defined problem with specification of initial state, goal test, successor function, and
path cost. It can be represented as a data structure and used to implement a program which can search for the
goal state. A solution to a problem is a sequence of actions chosen by the problem solving agent that leads from
the initial state to a goal state, Solution quality is measured by the path cost function.
Optimal solution : An optimal solution is the solution with least path cost among all solutions.
A gencral sequence followed by a simple problem solving agent is, first it formulates the problem with the goal
to be achieved, then it searches for a sequence of actions that would solve the problem, and then executes the actions
one ata time.

1.6.2 Example of 8-Puzzle Problem

UT ELS eee tlela|


Q. Formulate 8-puzzie problem.
ie LC
* — Fig. 1.6.1 depicts a typical scenario of 8-puzzle problem. It has a 3 x 3 board with tiles having 1 through 8
numbers on it. There is a blank tile which can be moved forward, backward, to left and to right. The aim is to
arrange all the tiles in the goal state form by moving the blank tile minimum number of times.

1 2 3 1 2 3

4 8 - 4 5 6

7 6 5 7 8 -

Initial State Goal State


Fig. 1.6.1: A scenario of 8-Puzzle Problem
This problem can be formulated as follows :
e States : States can represented by a 3 x 3 matrix data structure with blank denoted by 0.
1. Initial state : {{1, 2, 3},{4, 8, 0},{7, 6, 5}}
Z. Actions : The blank space can move in Left, Right, Up and Down directions specifying the actions.
3. Successor function : If we apply “Down” operator to the start state in Fig. 1.6.1, the resulting state has the
5 and the blank switching their positions.
4. Goal test: {{1, 2, 3},{4, 5, 6},{7, 8, 0}}
5. Path cost: Number of steps to reach to the final state.
e Solution:
{{1, 2, 3}, {4, 8, O}, {7, 6, 5}} — {{1, 2, 3}, {4, 8, 5}, {7, 6, 0}} > {{1, 2, 3},
{4, 8, 5}, {7, 0, 6}} — {{1, 2, 3}, (4, 0, 5}, {7, 8, 6}} {{1, 2, 3}, (4, 5, 0}, (7, 8, 6}} > {{1, 2, 3}, (4, 5, 6}, (7, 8, 0}
Path cost = 5 steps

1.6.3. Example of Missionaries and Cannibals Problem

e The problem statement as discussed in the previous section. Let's formulate the problem first.

W TechKnowladge
Pudlicatians
a
AF Aland Ds -1 (MU) 1-12 Introduction ty Al
a
* States +: In this problem, state can be data structure having triplet (1, J. k) representing the number of
missionaries, cannibals, and canoes on the left bank of the river respectively.
n the left bank of the river,
1, Initial state: It is (3, 3, 1), as all missionaries, cannibals and canoes are 0
2. Actions: Take x number of missionaries a ndy number of cannibals
r will have two
3. Successor function : If we take one missionary, one cannibal the other side of the rive
missionaries and two cannibals left.

4. Goal test: Reached state (0, 0, 0)


5. Path cost: Number of crossings to attain the goal state.
e = Solution:
The sequence of actions within the path :

(3,3,1) ~(2,2,0) —(3,2,1) -(3,0,0) -(3,1,1) —{ 1,1,0) (2,2,1) —(0,2,0) —(0,3,1) -(0,1,0) —(0,2,1) —{0,0,0)
Cost = 11 crossings

1.6.4 Vacuum-Cleaner Problem

States : In vacuum cleaner problem, state can be represented as [<block>, clean] or [<block>, dirty]. The agent
can be in one of the two blocks which can be either clean or dirty. Hence there are total 8 states in the vacuum cleaner
world.

1. Initial State : Any state can be considered as initial state. For example, [A, dirty]

2. Actions: The possible actions for the vacuum cleaner machine are left, right, absorb, idle.
3. Successor function : Fig. 1.6.2 indicating all possible states with actions and the next state.
R

| |=))a
82 | 8B I] 8B)

i( jad | lel)
wys ers

Fig. 1,6.2 : The state space for vacuum world

4. Goal Test ; The aim of the vacuum cleaner Is to clean both the blocks. Hence the goal test if [A,
Clean] and [B,
Clean].
5. Path Cost : Assuming that each action/ step costs 1 unit cost. The path cost is number of actions/ steps taken.

rar TechKnowledge
Publications
14 Aland DS - 1 (MU) 1-13 Introduction to Al
1.6.5 Example of Real Time Problems

* There are varieties of real time problems that can be formulated and solved by searching. Robot
Navigation,
Rout Finding Problem, Travelling Salesman Problem (TSP), VLSI design problem, Automatic
Assembly
Sequencing, etc. are few to name.
* There are number of applications for route finding algorithms. Web sites, car navigation systems
that provide
driving directions, routing video streams in computer networks, military operations planning, and airline travel-
planning systems are few to name. All these systems involve detailed and complex specifications.
¢ For now, let us consider a problem to be solved by a travel planning web site; the airline travel problem.
* State : State is represented by airport location and current date and time. In order to calculate the path cost
state may also record more information about previous segments of flights, their fare bases and their status
as
domestic or international,
1. _ Initial state : This is specified by the user's query, stating initial location, date and time.
2. Actions : Take any flight from the current location, select seat and class, leaving after the current
time,
leaving enough time for within airport transfer if needed,
3. Successor function : After taking the action i.e. selecting fight, location, date, time; what is the next
location
date and time reached is denoted by the successor function. The location reached is considered
as the
current location and the flight's arrival time as the current time.
4. Goal test : Is the current location the destination location?
Path cost : In this case path cost is a function of monetary cost, waiting time, flight time, customs
and
immigration procedures, seat quality, time of day, type of airplane, frequent-flyer mileage awards
and so
on.

1.7 Intelligent Agents


1.7.1 What is an Agent ?
* Agent is something that perceives its environment through sensors and acts upon that environmen
t through
effectors or actuators. Fig. 1.7.1 shows agent and environment.
e Take a simple example of a human agent. It has five Agent
senses : Eyes, ears, nose, skin, tongue. These senses
sense the environment are called as sensors. Sensors
collect percepts or inputs from environment and passes
it to the processing unit.
¢ Actuators or effectors are the organs or tools using orice ai candies
which the agent acts upon the environment. Once the ta agent the environment
sensor senses the environment, it gives this information
to nervous system which takes appropriate action with
the help of actuators. Environment
e Incase of human agents we have hands, legs as actuators
or effectors. Fig. 1.7.1: Agent and Environment

W Tech Knowledge
Publications
oe
WF Aland DS - 1 (MU) 1-14 Introduction to Ay
= LSS ==

Fig. 1.7.2 shows generic robotic agent structure,

Fig. 1.7.2 : Generic Robotic Agent Architecture

After understanding what an agent is, let’s try to figure out sensor and actuator for a robotic agent, can you think
of sensors and actuators in case of a robotic agent?
The robotic agent has cameras, infrared range finders, scanners, etc. used as sensors, while various types of
motors, screen, printing devices, etc. used as actuators to perform action on given input.

eyes, ears, and other


organs for sensors;

Human
hands, legs, mouth, and other
body parts for actuators =

Aeon
cameras and infrared
Tange finders for sensors; ,

Robotic

various motors
{ie }
= ye foractuators 9,

Fig. 1.7.3 : Sensors and Actuators in Human and Robotic Agent

¢ The agent function is the description of what all functionalities the agent is supposed to do. The agent
function provides mapping between percept sequences to the desired actions. It can be represented as
[f: P* => A]
e Agent program is a computer program that implements agent function in an architecture suitable language.
Agent programs needs to be installed on a device in order to run the device accordingly. That device must have
some form of sensors to sense the environment and actuators to act upon it. Hence agent is a combination of the
architecture hardware and program software.
Agent = Architecture + Program
e Take a simple example of vacuum cleaner agent. You might have seen vacuum cleaner agent in “WALL-
E"(animated movie). Let's understand how to represent the percept's (input) and actions (outputs) used in case
of a vacuum cleaner agent.

Ww Tech Knowledge
Publications
PF Al and DS - 1 (MU) 1-15 Introduction to Al
As shown in Fig, 1.7.4, there are two blocks A and B
having
some dirt. Vacuum cleaner agent supposed to sense the
dirt
and collect it, thereby making the room clean. In order to do
that the agent must have a camera to see the dirt and a
mechanism to move forward, backward, left and right to reach
08S 08S
to the dirt. Also it should absorb the dirt. Based on the percepts,
© ©
actions will be performed. For example : Move left, Move right,
absorb, No Operation. Fig. 1.7.4 : Vacuum cleaner Agent

Hence the sensor for vacuum cleaner agent can be camera, dirt sensor and the actuator can be motor to make it
move, absorption mechanism. And it can be represented as
[A, Dirty], [B, Clean], [A, absorb],[B, Nop], etc.

1.7.2 Definitions of Agent

There are various definitions exist for an agent. Let’s see few of them.
IBM states that agents are software entities that carry out some set of operations on behalf of a user or another
program.
FIPA : Foundation for Intelligent Physical Agents (FIPA) terms that, an agent is a computational process that
implements the autonomous functionality of an application.
Another definition is given as “An agent is anything that can be viewed as perceiving its environment through
sensors and acting upon the environment through effectors”.

Agent will perform


all the tasks on
your behalt

|
a
|
Solving
# experiments
{ : and
AM \ assignments
a
Washing clothes Cleaning service

Fig. 1.7.5 : Interactive Intelligent Agent

By Russell and Norvig, F. Mills and R. Stufflebeam’s definition says that “An agent is anything that is capable of
acting upon information it perceives. An intelligent agent is an agent capable of making decisions about how it
acts based on experience”.
From above definitions we can understand that an agent is : (As per Terziyan, 1993)

o Goal-oriented o Creative

o Adaptive o Mobile

© Social o — Self-configurable

Ww Tech Knowledge
Puolications
:
VF Aland DS <1 (MU) 1-16 Introduction to ay
(peice eens eee ie

| 1.7.3 Intelligent Agent


* In the human agent example, we read that there is something called as “Nervous System” which helps in
deciding an action with the assistance of effectors, based on the input given by sensors. In robotic agent, nt, we
| q
have software's which demonstrates the functionality of nervous system.
Intelligent agent is the one which can take input from the environment through its sensors and act upon the
environment through its actuators. Its actions are always directed to achieve a goal.

The basic abilities of an intelligent agent are to exist to be self-governed, responsive, goal-oriented, etc.
In case of intelligent agents, the software modules are responsible for exhibiting intelligence. Generally observed
capabilities of an intelligent agent can be given as follows :
o Ability to remain autonomous (Self-directed)
o Responsive
© Goal-Oriented
e — Intelligent agent is the one which can take input from the environment through its sensors and act upon the ~
environment through its actuators. Its actions are always directed to achieve a goal.

1.7.3(A) Structure of Intelligent Agents


e Fig. 1.7.6 shows the general structure of an intelligent agent.

Observe the input


Y

Scan DB for
corresponding action

Database of input
and actions"

Action is
given as output
y

Sei intemal stata to


-——-| the appropriate
action

Fig. 1.7.6 : General Structure of Intelligent Agent


From Fig. 1.7.6 it can be observed how agent and environment interact with each other. Every time environment
changes the agent first observes the environment through its sensors and get the input, then scans the database
of input and actions for the corresponding action for given input and lastly sets the internal state to the
appropriate action.
Let's understand this working with a real life example, Consider you are an agent and your surroundings
is an
environment. Now, take a situation where you are cooking in kitchen and by mistake you touch a hot pan.
We
will see what happens in this situation step by step. Your touch sensors take input from environment (i.e.
you
have touched some hot element), then it asks your brain if it knows “what action should be taken when
you go
near hot elements?” Now the brain will inform your hands (actuators) that you should immediately take
it away
from the hot element otherwise it will burn. Once this signal reaches your hand you will take your hand away
from the hot pan,
The agent keeps taking input from the environment and goes through these states every time. In above example,
if your action takes more time then in that case your hand will be burnt.

W TeckKnowledge
Publications
WF Aland Ds-1 (MU) 1-17 Introduction to Al
So the new task will be to find solution if the hand is burnt. Now, you think about the states which will be
followed in this situation, As per Wooldridge and Jennings, “An Intelligent agent is one that Is capable of taking
flexible self-governed actions”,
_They say for an intelligent agent to meet design objectives, flexible means three things :
1. Reactiveness 2. Pro-activeness
3. Social ability

1, _Reactiveness : It means giving reaction to a situation in a stipulated time frame. An agent can perceive the
environment and respond to the situation in a particular time frame. In case of reactiveness, reaction within
situation time frame is more important. You can understand this with above example, where, if an agent
takes more time to take his hand away from the hot pan then agents hand will be burnt.
2. _Pro-activeness : It is controlling a situation rather than just responding to it. Intelligent agent show goal-
directed behavior by taking the initiative. For example : If you are playing chess then winning the game is
the main objective. So here we try to control a situation rather than just responding to one-one action
which means that killing or losing any of the 16 pieces is not important, whether that action can be helpful
to checkmate your opponent is more important.
3. Social ability : Intelligent agents can interact with other agents (also humans). Take automatic car driver
example, where agent might have to interact with other agent or a human being while driving the car.
Following are few more features of an intelligent agent.
o Self-Learning : An intelligent agent changes its behaviour based on its previous experience. This agent
keeps updating its knowledge base all the time.
o Movable/Mobile: An Intelligent agent can move from one machine to another while performing actions.
o —_Self-governing : An Intelligent agent has control over its own actions.

1.8 Rational Agent

University Question

Q. Define rationality and rational agent. Give an example of rational action performed by any intelligent agent.

For problem solving, if an agent makes a decision based on some logical reasoning, then, the decision is called as
a “Rational Decision”. The way humans have ability to make right decisions, based on his/her experience and
logical reasoning; an agent should also be able to make correct decisions, based on what it knows from the
percept sequence and actions which are carried out by that agent from its knowledge.
Agents perceive their environment through sensors over a prolonged time period and adapt to change to create
and pursue goals and take actions through actuators to achieve those goals. A rational agent is the one that
does “right” things and acts rationally so as to achieve the best outcome even when there is uncertainty in
knowledge.
A rational agent is an agent that has clear preferences, can model uncertainty via expected values of variables or
functions of variables, and always chooses to perform the action with the optimal expected outcome for itself
from among all feasible actions. A rational agent can be anything that makes decisions, typically a person, a
machine, or software program.

Tech Knowledge
Publications
and DS - 1 (MU) 1-18 Introduction to AJ
® Rationality depends on four main criteria: First is the performance measure which defines the criterion of
success for an agent, second is the agent's prior knowledge of the environment, and third is the action performed
by the agent and the last one is agent's percept sequence to date.
© Performance measure is one of the major criteria for measuring success of an agent's performance. Take a
vacuum-cleaner agent's example. The performance measure of a vacuum-cleaner agent can depend upon various
factors like it's dirt cleaning ability, time taken to clean that dirt, consumption of electricity, etc.
e For every percept sequence a built-in knowledge base is updated, which is very useful for decision making,
because it stores the consequences of performing some particular action. If the consequences direct to achieve
desired goal then we get a good performance measure factor, else, if the consequences do not lead to desired
goal state, then we get a poor performance measure factor. ;

(a) Agent's finger is hurt while using Nail andhammer —(b) Agent is using Nail and hammer efficiently

Fig. 1.8.1 |
e For example, see Fig. 1.8.1. If agent hurts his finger while using nail and hammer, then, while using it for the
next time agent will be more careful and the probability of not getting hurt will increase. In short agent will be
able to use the hammer and nail more efficiently.
e Rational agent can be defined as an agent who makes use of its percept sequence, experience and knowledge to
maximize the performance measure of an agent for every probable action. It selects the most feasible action
which will lead to the expected results optimally.

Environmental

[ input | —» >| |'Goal

t=» Alternate possible actions


rp Selected actions which leads to
Agent optimal expected results

Fig. 1.8.2 : Rational Agent

7 Tech Knowledge
Publicacior?
¥F Al and DS - 1 (MU)
.,

1-19 Introduction to AJ
.

1.9 Environments Types and PEAS Properties of Agent


1.9.1 Environments Types

University Question
Q. Describe different types of environments applicable to Al agents. MU - Dec. 13, May 15
1. Fully observable vs. Partially observable
The first type of environment is based on the
observability. Whether the agent sensors can have
access to complete state of environment at any given Fully observable
time or not, decides if it is a fully observable or (vs. partially observable)

partially observable environment.


Deterministic
In Fully observable environments agents are able to (vs, stochastic)
gather all the necessary information required to take
actions. Also in case of fully observable Episodic
environments agents don’t have to keep records of (vs. sequential)

internal states. For example, Word-block problem, 8- seas


puzzle problem, Sudoku puzzle,
etc. in all these |: Static (vs. dynamic) .

problem worlds, the state is completely visible at any


point of time.
Discrete (vs. continuous)
Environments are called partially observable when
sensors cannot provide errorless information at any
Single agent
given time for every internal state, as the (vs. multi-agent):
environment is not seen completely at any point of
time. Known v Ss, unknown
~ 25
Also there can be unobservable environments where
the agent sensors fail to provide information about
internal states. Fig. 1.9.1 : Environment types

For example, In case of an automated car driver system, automated car cannot predict what the other
drivers are thinking while driving cars. Only because of the sensor’s information gathering expertise it is
possible for an automated car driver to take the actions.
2. Single agent vs. Multi-agent
The second type of an environment is based on the number of agents acting in the environment. Whether
the agent is operating on its own or in collaboration with other agents decides if it is a Single agent or a
multi-agent environment.
For example : An agent playing Tetris by itself can be a single agent environment, whereas we can have an
agent playing checkers in a two-agent environment. Or in case of vacuum cleaner world, only one machine
is working, so it’s a single agent while in case of car driving agent, there are multiple agents driving on the
road, hence it’s a multi-agent environment.
Multi-agent environment is further classified as Co-operative multi-agent and Competitive multi-agent.
Now, you might be thinking in case of an automated car driver system which type of agent environment do
we have?

W TeckKnowledga
Publications
Al and DS- 1 (MU -
¥ . (MU) x 1-20 _ Introduction to A)
e Let's understand it with the help of an automated car driving example. For a car driving
system 'X' other
car say 'Y’ is considered as an Agent. When 'Y' tries to maximize its performance measure
and the input
taken by car 'Y' depends on the car 'X'. Thus it can be said that for an automated car driving system
we have
a cooperative multi-agent environment.
e Whereas in case of “chess game” when two agents are operating as opponents, and trying to maximize
their
own performance, they are acting in competitive multi agent environment.

Deterministic vs. Stochastic

e An environment is called deterministic environment, when the next state of the environment can be
completely determined by the previous state and the action executed by the agent.
¢ For example, in case of vacuum cleaner world, 8-puzzle problem, chess game the next state of the
environment solely depends on the current state and the action performed by agent.
¢ Stochastic environment generally means that the indecision about the actions is enumerated in terms of
probabilities. That means environment changes while agent is taking action, hence the next state of the
world does not merely depends on the current state and agent’s action. And there are few changes
happening in the environment irrespective of the agent's action. An automated car driving system has a
: stochastic environment as the agent cannot control the traffic conditions on the road.
e In case of checkers we have a multi-agent environment where an agent might be unable to predict the
action of the other player. In such cases if we have partially observable environment then the environment
is considered to be stochastic.
e Jf the environment is deterministic except for the actions of other agents, then the environment is
strategic. That is, in case of game like chess, the next state of environment does not only depend upon the
current action of agent but it is also influenced by the strategy developed by both the opponents for future
moves.
e¢ We have one more type of environment in this category. That is when the environment types are not fully
observable or non-deterministic; such type of environment is called as uncertain environment.

Episodic vs. Sequential


e An episodic task environment is the one where each of the agent's action is divided into an atomic
incidents or episodes. The current incident is different than the previous incident and there is no
dependency between the current and the previous incident. In each incident the agent receives an input
from environment and then performs a corresponding action.
e Generally, classification tasks are considered as episodic. Consider an example of pick and place robot
agent, which is used to detect defective parts from the conveyor belt of an assembly line. Here, every time
agent will make the decision based on current part, there will not be any dependency between the current
and previous decision.
In sequential environments, as per the name suggests, the previous decision can affect all future decisions.
The next action of the agent depends on what action he has taken previously and what action he is
supposed to take in future.
For example, in checkers where previous move can affect all the following moves. Also sequential
environment can be understood with the help of an automatic car driving example where, current decision
can affect the next decisions. If agent is initiating breaks, then he has to press clutch and lower down the
gear as next consequent actions.
———
EF TecaKncslety
Publications
WF Aland Ds-1 (MU) 1-21 Introduction to AI
5. Static vs. Dynamic
e You have learnt about static and dynamic terms In previous semesters with respect to web pages. Same
way we have static (vs. dynamic) environments. If an environment remains unchanged while the agent is
performing given tasks then it is called as a static environment. For example, Sudoku puzzle or vacuum
cleaner environment are static in nature,
* — Ifenvironment is not changing over the time but, an agent's performance is changing then, it is called as a
semi-dynamic environment. That means, there is a timer exist in the environment who is affecting the
performance of the agent.
e — For example, In chess game or any puzzle like block word problem or 8-puzzle if we introduce timer, and if
agent's performance is calculated by time taken to play the move or to solve the puzzle, then it is called
as
semi-dynamic environment.
e Lastly, if the environment changes while an agent is performing some task, then it is called dynamic
environment.
e In this type of environment agent's sensors have to continuously keep sending signals to agent about the
current state of the environment so that appropriate action can be taken with immediate effect.
¢ Automatic car driver example comes under dynamic environment as the environment keeps changing all
the time.

6. Discrete vs. Continuous


* You have seen discrete and continuous signals in old semesters. When you have distinct, quantized, clearly
defined values of a signal it is considered as discrete signal.
¢ Same way, when there are distinct and clearly defined inputs and outputs or precepts and actions, then
itis
called a discrete environment. For example : chess environment has a finite number of distinct
inputs and
actions.
e When a continuous input signal is received by an agent, all the precepts and actions cannot
be defined
beforehand then it is called continuous environment. For example : An automatic car driving
system.
7. Known vs. Unknown
¢ Ina known environment, the output for all probable actions is given. Obviously, in
case of unknown
environment, for an agent to make a decision, it has to gain knowledge about
- how the environment
works.
¢ Table 1.9.1 summarizes few task environment and their characteristics.

Table 1.9.1 : Task Environments

Task Car driving| Part-Picking |Cross word puzzle| Soccer game | Checkers -
environment Robot vei with’ clock
Observable Partially Partially fully partially Fully
Agents Multi agent Single agent single Multi agent Multi agent
(cooperative) (competitive) | (competitive)
Deterministic Stochastic Stochastic deterministic Strategic Strategic
Episodic Sequential Episodic sequential sequential Sequential

W bers tcatioa ne
WF Aland DS - 1 (MU) 1-22 Introduction to Ay

Task Car driving | . Part-Picking {Crossword puzzle} Soccer game Checkers


environment Robot with clock
Static Dynamic Dynamic static Dynamic Semi
Discrete Continuous Discrete Discrete Continuous Discrete
Known and| Unknown Known Known Known Known
Unknown

1.9.2 PEAS Properties of Agent

4
Q. _ Give PEAS description for a robot soccer player. Characterize its environment. CUEaG
Q. __ What are PEAS descriptor ? Give PEAS descriptors for Part — picking Robot.
PEAS : PEAS stands for Performance Measure, Environment, Actuators, and Sensors. It is the short form
used for performance issues grouped under Task Environment.
You might have seen driverless/ self driving car videos of Audi/ Volvo/ Mercedes, etc. To develop such
driverless cars we need to first define PEAS parameters.
Performance Measure : It the objective function to judge the performance of the agent. For example, in case of
pick and place robot, number of correct parts in a bin can be the performance measure.
Environment: [t the real environment where the agent need to deliberate actions,
Actuators : These are the tools, equipment or organs using which agent performs actions in the environment.
This works as the output of the agent.
Sensors : These are the tools, equipment or organs using which agent captures the state of the environment.
This works as the input to the agent.
To understand the concept of PEAS, consider following examples.
(A) Automated Car driving agent

1. Performance measures which should be satisfied by the automated car driver :


i) Safety: Automated system should be able to drive the car safely without dashing anywhere.
ii) Optimum speed : Automated system should be able to maintain the optimal speed
depending upon
the surroundings.
iil) Comfortable journey : Automated system should be able to give a comfort
able journey to the end
user, i.e. depending upon the road it should ensure the comfort of the end
user.
Maximize profits : Automated system should provide good mileage
on various roads, the amount of
energy consumed to automate the system should not be very
high, etc. such features ensure that the
user is benefited with the automated features of the system and
it can be useful for maximizing the
profits.
2. Environment

i) Roads : Automated car driver should be able to drive on any


kind of a road ranging from city roads to
highway.

wW Tech Knowledge
Publications
¥F Aland DS-1 (MU) 1-23 Introduction to AI

li) Traffic conditions : You will find different set of traffic conditions for different type of roads.
Automated system should be able to drive efficiently in all types of traffic conditions. Sometimes traffic
conditions are formed because of pedestrians, animals, etc.
iii) Clients : Automated cars are created depending on the client's environment. For example, in some
countries you will sce left hand drive and in some countries there is a right hand drive. Every
country/state can have different weather conditions. Depending upon such constraints automated car
driver should be designed.
Actuators are responsible for performing actions/providing output to an environment.
In case of car driving agent following are the actuators :
(i) Steering wheel which can be used to direct car in desired direction (i.e. right/left)
(ii) Aaccelerator, gear, etc. can be useful to increase or decrease the speed of the car.
(iii) Brake is used to stop the car.
(iv) Light signal, horn can be very useful as indicators for an automated car.
Sensors : To take input from environment in car driving example cameras, sonar system, speedometer,
GPS, engine sensors, etc. are used as sensors.

(B) Part-picking ARM robot

1. Performance measures : Number of parts in correct container.


2. Environment : Conveyor belt used for handling parts, containers used to keep parts, and Parts.
3. Actuators : Arm with tooltips, to pick and drop parts from one place to another.
4. Sensors : Camera to scan the position from where part should be picked and joint angle sensors which are
used to sense the obstacles and move in appropriate place.
(C) Medical diagnosis system

1. Performance Measures
Healthy patient : system should make use of sterilized instruments to ensure the safety (healthiness) of
the patient,
Minimize costs : the automated system results should not be very costly otherwise overall expenses of the
patient may increase, Lawsuits. Medical diagnosis system should be legal.
Environment: Patient, Doctors, Hospital Environment
Sensors : Screen, printer
Actuators : Keyboard and mouse which is useful to make entry of symptoms, findings, patient's answers to
given questions. Scanner to scan the reports, camera to click pictures of patients.
(D) Soccer Player Robot
1. Performance Measures : Number of goals, speed, legal game.
2 Environment : Team players, opponent team players, playing ground, goal net.

3. Sensors : Camera, proximity sensors, infrared sensors.
4 Actuators : Joint angles, motors.

Ww TechKnowledge
Publications
I EEEEDESC#C#“#‘“#S#NSC”#””” OE]

WF Aland Ds-1 (MU) 1-24


Introduction to Aj
1.10 Types of Agents
a
Depending upon the degree of intelligence
and ability to achieve the goal, agents are categ
types. These five types of agents are orized into five basic
depicted In the Fig, 1.10.1.

Le NSS
simple reflex agents. any
T NR KlE ety PS Mtarenutesteyb

model-based reflex agents


yay, 88l based agents...
ccitlity-based agents -.. Az,

.., Jeaming agents...


MESS
ee eetPMMAEh Neer,

Fig. 1.10.1: Types of agents


|
Let us understand these agent types one by
one.
1.10.1 Simple Reflex Agents
f
~ ee 7m as
; Simple Reflex Agent
.
inpul/percept
Sensors <—

_. What action |. .
Should.
be taken 2.
output / action
Effectors

J
Fig. 1.10.2 : Simple reflex agents
An agent which performs actions based on the current input
only, by ignoring all the previous inputs is called
simple reflex agent. as

It is a totally uncomplicated type of agent. The simple reflex


agent' s function is based on the situation and
corresponding action (condition- action protocol). If the condi Its
tion is true, then matching action is taken witho
considering the percept history. ut

* You can understand simple reflexes with the help of a real


life example, say some object approaches eye then,
you will blink your eye. This type of simple reflex is called natura
l/innate reflex,
Consider the example of the vacuum cleaner agent. It is a simple
reflex agent, as its decision is based only on
whether the current location contains dirt. The agent functio
n is tabulated in Table 1.10.1.

OF lechtoontetss
Fublicacions
WF Aland Ds -1 (MU) 1-25 Introduction to Al

Few possible input sequences and outputs for vacuum cleaner world with 2 locations are considered for
simplicity.

Table 1.10.1

Figure Input sequence Output / action


{location, content } Right, left, suck, no-op

A B {A, clean} Right

A B {B, clean} Left

A B {A, dirt} Suck

BD

A B {B, dirt} Suck

&S
A B Input sequence Output / action
ff) {location, content} Right, left, suck, no-op

A B {A, clean}{A, clean} Right

A B {A, clean}{A, dirt} Suck

B
al

In case of above mentioned vacuum agent only one sensor {s used and that is a dirt sensor. This dirt sensor can
detect if there is dirt or not. So the possible inputs are ‘dirt’ and ‘clean’.

v TochKnowledgs.
Publications
W Aland DS - 1 (MU) 1-26 Introduction to Ay
a eS SS 08080

Also the agent will have to maintain a database of actions, which will help to decide what output should He given
by an agent. Database will contain conditions like : If there is dirt on the floor to left or right then find out if there
is dirt in the next location and repeat these actions till the entire assigned area is cleaned then, vacuum cleaner
should suck that dirt. Else, dirt should move. Once the assigned area is fully covered, no other action should be
taken until further instruction.
If the vacuum cleaner agent keeps searching for dirt and clean area, then, it will surely get trapped in an infinite
loop. Infinite loops are unavoidable for simple reflex agents operating in partially observable savtrontients By
randomizing its actions the simple reflex agent can avoid these infinite loops. For example, on receiving {clean}
as input, the vacuum cleaner agent should either go to left or right direction.
If the performance of an agent is of the right kind then randomized behaviour can be considered as rational in
few multi-agent environments.

1.10.2 Model-based Reflex Agents

University Question
Q. Explain model based Reflex agent with block diagram. DEES)
Partially observable environment cannot be handled well by simple reflex agents because it does not keep track
on the previous state. So, one more type of agent was created that is model based reflex agent.
An agent which performs actions based on the current input and one previous input is called as model-based
agent. Partially observable environment can be handled well by model-based agent.
From Fig. 1.10.3 it can be seen that once the sensor takes input from the environment, agent checks
for the
current state of the environment. After that, it checks for the previous state which shows
how the world is
developing and how the environment is affected by the action which was taken by the
agent at earlier stage. This
is termed as model of the world.
f ~
i
input/percept
+) Previous state F-- | Sensors «—

How the world What is the current


is development 7. State of the environment ? m
co a
x
What is the effect
3
-| of my action ? »
ee
5
eo

Vv
=
_

Condition - Action protocol What action


ls = Aire eee} * [should be taken 7

Effect output / action

L
‘01
Model-based Reflex Agents a

s
Je
Fig. 1.10.3 : Model-based reflex agents
Once this is verified, based on the condi
tion-action protocol an action is deci
effectors and the effectors give this outpu ded. This decision is given to
t to the envir onment,
The knowledge about “how the world
is chan ging” is called as a model
model while working is called as the “mod of the world. Agent which uses such
el-b ased agent”.

W Tech Knowledge
Pudtications
WF Aland DS - 1 (MU) 1-27 Introduction to Al
* Consider a simple example of automated car driver system. Here, the world keeps changing
all the time. You
must have taken a wrong turn while driving on some or the other day
of your life. Same thing applies for an
agent. Suppose if some car “Xx” js overtaking our automated driver agent “A”,
then speed and direction in which
"X" and “A” are moving their steering wheels is important. Take a scenario where
agent missed a sign board as it
was overtaking other car. The world around that agent
will be different in that case,
¢ Internal model based on the input history should be maintained
by model-based reflex agent, which can reflect
at least some of the unobserved aspects of the current state. Once this is
done it chooses an action in the same
way as the simple reflex agent.

1.10.3 Goal-based Agents

f sey ,
input/percept
_ Previous state _ Sensors «

How the world What Is the current


is development 2. ® state of the environment ? *

jUeWUOJJAUg
What is the effect
What will be the state if
of my action? isis some action A’ is performed ?

= __ What action
}) Should be taken next 2.

output / action
Effectors
Goal-based Agents
\ J

Fig. 1.10.4 : Goal-based agents

* Model-based agents are further developed based on the “goal” information. This new type
of agent is called as
goal-based agent. As the name suggests, Goal information will illustrate the situations that
is desired. These
agents are provided with goals along with the model of the world. All the actions selected by the agent
are with
reference of the specified goals. Goal based agents can only differentiate between goal states and non-goal states.
Hence, their performance can be 100% or zero.
e The limitation of goal based agent comes with its definition itself. Once the goal is fixed, all the actions are taken
to fulfil it. And the agent loses flexibility to change its actions according to the current situation.
¢ You can take example of a vacuum cleaning robot agent whose goal is to keep the house clean all the time.
This
agent will keep searching for dirt in house and will keep the house clean all the time. Remember M-O the
cleaning robot from Wall-E movie which keeps cleaning all the time no matter what is the environment or the
Healthcare companion robot Baymax from Big Hero 6 which does not deactivate until user says that he/she is
satisfied with care.

1.10.4 Utility-Based Agents

University Question

Q. _ Explain utility based agents with the help of neat diagram. MU - May 13, Dec. 19

W TechKnowledge
Pudlications
W Aland DS=1(MU) 1-28 Introduction to Ay
ry
Utility function Is used to map a state to a measure of utility of that state. We can define a measure for
determining how advantageous a particular state Is for an agent. To obtain this measure utility function can be
used,
The term utility Is used to depict how “happy” the agent Is to find out a generalized performance measure,
various world states according to exactly how happy they would make an agent is compared.
Take one example; you might have used Google maps to find out a route which can take you from source location
to your destination location In least possible time. Same logic is followed by utility based automatic car driving
agent.
Goals utility based automatic car driving agent can be used to reach given location safely within least possible
time and save fuel. So this car driving agent will check the possible routes and the traffic conditions on these
routes and will select the route which can take the car at destination in least possible time safely and without
consuming much fuel.
( ‘ Pm

> Input/percept
, Previous state... | Sensors ¢
Wiki Pecillo CoS el |

How the world . <a What is the current _


ja development; [state
of the environment 2.
in s oa oe What will be the state if
Gogol my ay JOM, 2. ly some action A’ is attormed ?

__ How happyI will be


iiiln such a state 2045
output/
...- What action action wl

| _Utilty-based Agents Effectors ae

Fig. 1.10.5 : Utility-based agents

1.10.5 Learning Agents

Q. Explain the leaming agent with the help of suitable diagram.


Q. Explain the structure of learning agent architecture, What Is role of critic in leaming?
Q. What are the basic building blocks of leaming agent 7 Explain each of them with a neat block diagram.

Why do you give mock tests ? When you get less marks for some
question, you come to know that you have
made some mistake in your answer. Then you learn the correct
answer and when you get that same question in
further examinations, you write the correct answer and avoid
the mistakes which were made in the mock test.
This same concept is followed by the learning agent.
Learning based agent is advantageous in many cases, becaus
e with its basic knowledge it can initially operate in
an unknown environment and then it can gain knowledge from the enviro
nment based on few parameters and
perform actions to give better results.

a Tock
Publicacions
WF Aland Ds - 1 (MU) 1-29 Introduction to Al
Si

Pertormance standard
vr ‘

—*, inpuvipercept | |’
Critle Sensors «<— a1
Meta) We
Feedback
changes
Performance
knowledge
Leaming
goals

[een arn} —— 4
Problem generator |
cunt ction

Learming Agents f J
_.

Fig. 1.10.6 : Learning agents


e Following are the components of learning agent:
1. Critic
2 Learning element
3. Performance element
4 Problem generator

1. Critic : It is the one who compares sensor's input specifying effect of agent’s action
on the environment
with the performance standards and generate feedback for leaning element.
_ 2. Learning element : This component is responsible to learn from the differenc
e between performance
standards and the feedback from critic. According to the current percept it
is supposed to understand the
expected behavior and enhance its standards
3. Performance element : Based on the current percept received from sensors
and the input obtained by the
learning element, performance element is responsible to choose the
action to act upon the external
environment.
4. Problem generator : Based on the new goals learnt by learning agent,
problem generator suggests new or
alternate actions which will lead to new and instructive understanding,

1.11 _Self-Learning Topics : Identify application areas of Al


1.11.1 Application Areas of Artificial Intelligence
* You must have seen use of Artificial Intelligence in many SCI-FI
movies. To name a few we have | Robot, Wall-E,
The Matrix Trilogy, Star Wars, etc. movies. Many a times
these movies show positive potential of using AI and
someti
mes also emphasize the dangers of using Al. Also there are games
based on such movies, which show us
many probable applications of Al.
* — Artificial Intelligence is commonly used for problem
solving by analyzing or/and predicting output for a system
Al can provide solutions for constraint satisfaction proble .
ms. It is used in wide range of fields for example in
diagnosing diseases, in business, in education, in contro
lling a robots, in entertainment field, etc.

TochKnowledge
Publications
Susi
SNESSLs

Introduction t
W Aland DS +1 (MU) 1-30 a eccrine

Ea
jp
e Fig. 1.11.1 shows few fields in which we have applications of artificial intelligence. There can be many fields
\ which Artificially Intelligent Systems can be used.

Education Entertainment Medical Military

Automated = Voice
-. Business planning and :
| scheduling technology
|
i | Fig. 1.11.1: Fields of Al Application

1. Education

Training simulators can be built using artificial intelligence techniques. Software for pre-school children are
developed to enable learning with fun games. Automated grading, Interactive tutoring, instructional theory are
the current areas of application.
2. Entertainment
Many movies, games, robots are designed to play as a character. In games they can play as an opponent when
human player is not available or not desirable.
3. Medical
14 Al has applications in the field of cardiology (CRG), Neurology (MRI), Embryology (Sonography), complex operations
of internal organs, etc. It can be also used in organizing bed schedules, managing staff rotations, store and retrieve
!
information of patient Many expert systems are enabled to predict the decease and can provide with medical
prescriptions.
Military
>

Training simulators can be used in military applications. Also areas where human cannot reach or in life stacking
conditions, robots can be very well used to do the required jobs. When decisions have to be made quickly taking
into account an enormous amount of information, and when lives are at stake, artificial intelligence can provide
crucial assistance. From developing intricate flight plans to implementing complex supply systems or creating
training simulation exercises, Al is a natural partner in the modern military.
5. Business and Manufacturing
Latest generation of robots are equipped well with the performance advances, growing integration of vision and
an enlarging capability to transform manufacturing,

6. Automated planning and scheduling


Intelligent planners are available with AI systems, which can process large datasets and can consider all
the
constraints to design plans satisfying all of them.
7. Voice Technology
Voice recognition is improved a lot with Al. Systems are designed to take voice inputs which are very
much
applicable in case of handicaps. Also scientists are developing an intelligent machine to emulate activities of a
skillful musician. Composition, performance, sound processing, music theory are some of the major areas of
research,

Ww Tech Knowledge
Publications
Ww Aland DS +1 (MU) 1-31 Introduction to Al

8. Heavy Industry
Huge machines involve risk in operating and maintaining them. Human robots are better replacing human
operators. These robots are safe and efficient. Robot are proven to be effective as compare to human in the jobs
of repetitive nature, human may fail due to lack of continuous attention or laziness.

1.11.2 Sub Areas/ Domains of Artificial Intelligence


Al Applications can be roughly classified based on the type of tools/approaches used for inoculating intelligence
in the system, forming sub areas of Al. Various sub domains/ areas in intelligent systems can be given as follows;
Natural Language Processing, Robotics, Neural Networks and Fuzzy Logic. Fig. 1.11.2 shows these areas in
Intelligent Systems.

Natural
language Robotics
- processing Veal

~potwonks
“Neural
| | Fuzzy ogis

Fig. 1.11.2 : Sub-areas in Intelligent Systems

Natural language processing : One of the application of Al is in field of Natural Language Processing (NLP).
NLP enables interaction between computers and human (natural) language. Practical applications of NLP are in
machine translation (e.g. Lunar System), information retrieval, text categorization, etc. Few more applications
are extracting 3D information using vision, speech recognition, perception, image formation.
Robotics : One more major application of Al is in Robotics. Robot is an active agent whose environment is the
physical world. Robots can be used in manufacturing and handling material, in medical field, in military, etc. for
automating the manual work. ,

Neural networks : Another application of Al is using Neural Networks. Neural Network is a system that works
like a human brain/nervous system. It can be useful for stock market analysis, in character recognition, in image
compression, in security, face recognition, handwriting recognition, Optical Character Recognition (OCR), etc.
Fuzzy logic : Apart from these Al systems are developed with the help of Fuzzy Logic. Fuzzy Logic can be useful
in making approximations rather than having a fixed and exact reasoning for a problem. You must have seen
systems like AC, fridge, washing machines which are based on fuzzy logic (they call it “6 sense technology!”).

1.11.3 Current Trends in Artificial Intelligence

Artificial Intelligence has touched each and every aspect of our life. From washing machine, Air conditioners, to
smart phones everywhere Al is serving to ease our life, In industry, Al is doing marvellous work as well. Robots are
doing the sound work in factories. Driverless cars have become a reality. WiFi-enabled Barbie uses speech-
recognition to talk and listen to children, Companies are using Al to improve their product and increase sales, Al saw
significant advances in machine learning, Following are the areas in which Al is showing significant advancements.

ser TechKnowledga
Publications
CC

WF Atanas <1 (Mu) 1-32 Introduction to Aj


1. Deep Learning
Convolutional Neural Networks enabling the concept of deep learning is the top most area
of focus in Artificial
intelligence in todays’ era. Many problems and applications areas of Al like, natural
language and text
processing, speech recognition, computer vision, information retrieval,
and multimodal information Processing
empowered by multi-task deep learning.
2. Machine Learning
The goal of machine learning Is to program computers to use example
data or past experience to solve a given
} problem, Many successful applications of machine learning include
systems that analyse past sales data to
predict customer behaviour, optimize robot behaviour so that
a task can be completed using minimum
resources, and extract knowledge from bioinformatics
data.
| 3. Al Replacing Workers
1

In industry where there are safety hazards, robots


are doing a good job. Human resources are getting
replaced
| | by robots rapidly. People are worries to see that the white
color jobs of data processing are being done
exceedingly well by intelligent programs. A study from The
National Academy of Sciences brought together
! techno
logists and economists and social scientists to figure
out what's going to happen.
| | 4. Internet of Things (1oT)
The concepts of smarter homes, smarter cars and
smarter world is evolving rapidly with the invent
| internet of things. The future is no far when each ion of
and every object will be wirelessly connected to
| something in
order to perform soma smart actions without
any human instructions or interference. The worry
| | mined data can potentially be exploited. is how the
{
| 5. Emotional Al

| Emotional Al, where Al can detect human emotio


ns, is another u pcoming and importan
t area of reserch.
Computers’ ability to understand speech will
lead to an almost s eamless interaction
computer. With increasingly accurate cameras,
between human and
voice and facial recognition, computers
are better able to detect
our emotional state. Researchers are explo
ring how this new knowledge can be used
depression, to accurately predict medical in education, to treat
diagnoses, and to improve customer servi
ce and shopping online.
6. Alin shopping and customer service
Using Al, customers’ buying patterns,
behavioral patterns can be studied and
purchase or can help customer to figu systems that can predict the

for him.

7. Ethical Al

With all the evolution happening in techn


ology in every walk oflife, ethics must
be considered at the forefront of
research. For example, in case of driv
erless car, while driving, if the decision
dash a cat or a lady having both in an has to be made between weather to
uncontrollable distance in front of the
cases how the programming should deci car, is an ethical decision. In such
de who is more valuable, is a question.
solved by computer engineers or research scien These are not the problems to be
tists but someone has to come up with an answ
er.

TechKnowledgs
Publications
BF Aland DS- 1 (MU) 1-33 Introduction to Al

Review Questions
Q.1 Explain various techniques for solving problems by
searching.
Q.2 What are the various Al techniques ?

Q3 What are the components of problem formulation


?
Q.4 Define in your own words, the following terms
:
1. Agent 2. Agent functi on
3. Agent program 4. Autonomy
Q.5 What is an Agent ?
a6 What is intelligent agent ?

Q.7 Write a short note on : Structure of intelligent agents.

Q.8 What is Rational Agent ?

Q.9 What are various agent environments ?


Q.10 Explain PEAS representation with example.

Q.11 Give the classification of agents.


Q.12 Explain various types of intelligent agents, state limitations of each and how it is overcome in other
type of
agent.
Q.13 Explain simple reflex agent architecture.
Q.14 Explain structure of an agent which keeps track of the world.
Q.15 Explain Goal based agent.

O00
SEARCH TECHNIQUES
is (
Lkpee
Syllabus

th Limited Search, Iterative Deepening, Bidirect iona)


Uninformed Search Techniques : Uniform cost search, Dep
es

rch, A*, Hill Climbing, Simulated Annealing,


search.Informed Search Methods : Heuristic functions, Best First Sea
Problem, Water Jug, Graph Coloring. Adversariay
Constraint Satisfaction Problem Solving : Crypto-Arithmetic
Comparing Different Techniques.
Search : Game Playing, Min-Max Search, Alpha Beta Pruning.
Self-Learning Topics : IDA*, SMA*.

Introduction
and select the most
* Search is an indivisible part of intelligence. An intelligent agent is the one who can search
game like chess,
appropriate action in the given situation, among the available set of actions. When we play any
move, but the intelligent one who
cards, tic-tac-toe, etc; we know that we have multiple options for next

|
searches for the correct move will definitely win the game.
any expert system; all they required to do is
e Incase of travelling salesman problem, medical diagnosis system or
minimum cost and efforts.
| to carry out search which will produce the optimal path, the shortest path with

| e Hence, this chapter focuses on the searching techniques used in


informed and informed search techniques.
Al applications. Those are known as un-

| 2.1 Measuring Performance of Problem Solving Algorithm / Agent


| There are variety of problem solving methods and algorithms available in Al. Before
studying any of these
performance of all these
algorithms in detail, let's consider the criteria to judge the efficiency of those algorithms. The
algorithms can be evaluated on the basis of following factors.
completeness
1. Completeness : If the algorithm is able to produce the solution if one exists then it satisfies
criteria.

2. Optimality : If the solution produced is the minimum cost solution, the algorithm is said to be optimal.

3. Time complexity : It depends on the time taken to generate the solution. It is the number of nodes generated
during the search.

4, Space complexity : Memory required to store the generated nodes while performing the search.
Complexity of algorithms is expressed in terms of three quantities as follows :
search
4. b: Called as branching factor representing maximum number of successors a node can have in the
tree.

2. d:Stands for depth of the shallowest goal node.


3. m:!tis the maximum depth of any path in the search tree.
WF Aland ps1 (MU) 2-2 Search Techniques
2.2 Node Representation in Search Tree
* Inorder to carry out search, first we need to build the search tree. The nodes are the various possible states in
the state space.
¢ The connectors are the indicators of which all states are directly reachable from current state, based
on the
successor function.
¢ — Thus the parent child relation is build and the search
tree can be generated. Fig. 2.2.1 shows the representati
ofa tree node as a data structure in 8-puzzle on
problem.

7/21,4
5 f=) Parent node
8/3]1
Node
State =)

(2) Child node

Fig. 2.2.1: Node representation of state in searching


¢ Node is the data structure from which the search tree is constructed.
Each node has a parent, a state, and
children nodes directly reachable from that node.
* For each node of the tree, we can have following structure
components :
1. State / Value: The state in the state space to which the node corres
ponds or value assigned to the node.
2. Parent node : The node in the search tree that generated this
node.
3. Number of children : Indicating number of actions that can be
taken to generate next states (children
nodes).
4. Path cost: The cost of the path from the initial state to the node.

2.3. Uninformed Search

University Questions
Q. — Compare different uniformed search strategies.
Q, Write short note on Uniform.
MU - Dec: 14
¢ The term “uninformed” means they have only information about
what is the start state and the end state along
with the problem definition,
® — These techniques can generate successor states and can distingu
ish a goal state froma non-goal state.
¢ — All these search techniques are distinguished by the order in which nodes
are expanded.
¢ The uninformed search techniques also called as “blind search”.

2.4 Depth First Search (DFS)

Q. _ Explain Depth First Search Technique with an example.

WF TechKnewledya
Publications
SE
=e Ala nd Ds- 1(MU) 2-3 Search Technigy
2.4.1 Concept
:
In depth-
©Pth-first search, the search tree is expanded depth wise; Le. the deepest node :
Sear ch tree 8s expanded. As the leaf node is reached, the se arch backtracks to previousin the
node.current
The prbranch
ogres. ofoftt ,
arch is illustrated
in Fig. 2.4.1, :

ine explored nodes are shown in light gray. Explored nodes with no descendants in the fringe are removeg fron
"ory: Nodes at depth three have no successors and Mis the only goal node.
Process

Q (A)

a) oO) ©&

HY) ©
HY) © ©
OY &
Fig. 2.4.1 : Working of Depth first search on a binary tree

2.4.2. Implementation
e DFS uses a LIFO fringe i.e. stack. The most recently generated node, which is on the top in the fringe, is chosen
first for expansion. As the node is expanded, it is dropped from the fringe and its successors are added.
e So when there are no more successors to add to the fringe, the search “back tracks” to the next deepest node that
is still unexplored. DFS can be implemented in two ways, recursive and non-recursive. Following is the algorithm
for the same.

2.4.3 Algorithm
a) Non recursive implementation of DFS
4. Push the root node ona stack
2. while (stack is not empty)
a) popanode from the stack;
WF Aland DS-1 (MU) 2-4 Search Techniques
(i) ifnodeisa goal node then return
success;
(ii) push all children of node onto the stack;
3. return failure
(b) Recursive implementation of DFS
DFS(c) :
1. Ifnode is a goal, return success;
2. for each child c of node

a) if DFS(c) is successful,
(i) return success
3: return failure;

2.4.4 Performance Evaluation

e Completeness : Complete, if mis finite.


¢ Optimality : No, as it cannot guarantee the shallowest
solution.
¢ Time Complexity : A depth first search, may generate all of the O(b™)
nodes in the search tree, where m is the
maximum depth of any node; this can be much greater than the size of the state
space.
¢ Space Complexity: For a search tree with branching factor b and maximum depth m, depth
first search requires
storage of only O(b™) nodes, as ata time only the branch, which is getting explored, will reside in memory.

Ex. 2.4.1: Consider following graph.


Starting from state A execute DFS. The goal node Is G. Show the order in which the nodes are expanded.
Assume that the alphabetically smaller node is expanded first to break ties.

/™.
M6
Ns

—H
Soln. :

® A)
© (8)

(0)

Fig. P. 2.4.1
Ww TechKnowledge
sulicecion:
a h .

BF Aland bDs-1 (MU) 2- Search Techniques

uw
2.5 Breadth First Search (BFS)

Wace
Q, Explain breadth first algorithm.
CEE
2.5.1 Concept

° Asthe name suggests, in breadth-first search technique, the tree is expanded breadth wise.
* — The root node is expanded first, then all the successors of the root node are expanded, then their successors, and
so on.

In turn, all the nodes at a particular depth in the search tree are expanded first and then the search will proceed
for the next level node expansion.
Thus, the shallowest unexpanded node will be chosen for expansion. The search process of BFS is illustrated in
Fig. 2.5.1.
2.5.2. Process

=>0 © © 0
o oO
=O 0 =O =O

Fig. 2.5.1 : Working of BFS on binary tree

2.5.3 Implementation

¢ In BFS we use a FIFO queue for the fringe. Because of which the newly inserted nodes
in the fringe will
automatically be placed after their parents.
Thus, the children nodes, which are deeper than their parents, go to the back of the
queue, and old nodes, which
are shallower, get expanded first. Following is the algorithm for the same.

2.5.4 Algorithm
1. Put the root node ona queue
2. while (queue is not empty)
(a) remove a node from the queue
(i) if (mode is a goal node) return success;
(ii) putall children of node onto the queue;
3. return failure;

2.5.5 Performance Evaluation

* Completeness : It is complete, provided the shallowest goal node


is at some finite depth.
¢ Optimality : It is optimal, as it always finds the shallowest solutio
n.
* Time complexity : 0(b¢), number of nodes in the fringe,
¢ Space complexity : 0(b*), total number of nodes explored,

Ww Tech Knowledge
Publications
=—
2-6 Search Techniques
F Al and DS-1 (MU)

26 ooo
Uniform Cost Search (UCS)

2.6.1 Concept
¢ Uniform cost search is a breadth first search with all paths having same cost. To make it work in real time
conditions we can have a simple extension to the basic implementation of BFS. This results in an algorithm that
is optimal with any path cost.
e In BFS as we always expand the shallowest node first; but in uniform cost search, instead of expanding the
shallowest node, the node with the lowest path cost will be expanded first. The implementation details are as
follow.
2.6.2 Implementation
e¢ Uniform cost search can be achieved by implementing the fringe as a priority queue ordered by path cost. The
algorithm shown below is almost same as BFS; except for the use of a priority queue and the addition of an extra
check in case a shorter path to any node is discovered.
e The algorithm takes care of nodes which are inserted in the fringe for exploration, by using a data structure
having priority queue and hash table,
e The priority queue used here contains total cost from root to the node. Uniform cost search gives the minimum
path cost the maximum priority. The algorithm using this priority queue is the following.
2.6.3 Algorithm
e Insert the root node into the queue.
e While the queue is not empty :
(i) Dequeue the maximum priority node from the queue.
(If priorities are same, alphabetically smaller node is chosen)
(ii) Ifthe node is the goal node, print the path and exit.
Else

eee
e —_Insertall the children of the dequeued node, with their total costs as priority.
® The algorithm returns the best cost path which is encountered first and will never go for other possible paths.
The solution path is optimal in terms of cost.
¢ As the priority queue is maintained on the basis of the total path cost of node, the algorithm never expands a
node which has a cost greater than the cost of the shortest path in the tree.

eer
e The nodes in the priority queue have almost the same costs at a given time, and thus the name “Uniform Cost
Search”.
2.6.4 Performance Evaluation

* Completeness : Completeness is guaranteed provided the cost of every step exceeds some small positive
constant.
¢ Optimality : It produces optimal solution as nodes are expanded in order of their path cost.

* Time complexity : Uniform-cost search considers path costs rather than depths; so its complexity is does not
merely depends on b and d. Hence we consider C* be the cost of the optimal solution, and assume that every
), which can be
action costs at least €. Then the algorithm's worst-case time and space complexity is O(b °/ €
much greater than bd.
* Space complexity : O(b“/€), indicating number of node in memory at execution time.
—_
W TechKnowledge
Puolicatians
WF Aland ps. amu 2-7 Search Techniques——
2.7 Depth Limited Search (DLS)
2.7.1 Concept
* In order to avoid the infinite loop condition arising in DFS, in depth limited search
technique, depth-first search
is carried out with a predetermined depth limit.
* — The nodes with the specified depth limit are treated as If they don’t have any successors. The
depth limit solves
the infinite-path problem.
* Butas the search is carried out only till certain depth in the search tree, it introduces problem of incomplete
ness.
* Depth-first search can be viewed as a special case of depth-limited search with depth limit equal
to the depth of
the tree. The process of DLS is depicted in Fig, 2.7.1.

2.7.2. Process

If depth limit is fixed to 2, DLS carries out depth first search till second level in
the search tree.
Q O A a
65 Gd @€d G-%
oO | *
L
s

Fig. 2.7.1 : DIS working with depth limit


2.7.3 Implementation

e Asin case of DFS in DLS we can use the same fringe implemented
as queue.
¢ Additionally the level of each node needs to be calculated to check
whether it is within the specified depth limit.
Depth-limited search can terminate with two conditions :
1. Ifthe solution is found.

2. If there is no solution within given depth limit.

2.7.4 Algorithm

* Determine the start node and the search depth.


* — Check if the current node is the goal node

Ifnot: Do nothing
[Fyes ; return
* Check if the current node is within the specified search depth
If not; Do nothing
If yes : Expand the node and save all of its successors in a stack.
* Call DLS recursively for all nodes of the stack and go back to Step 2.

Ww TechKnowledge
Publications
RF Aland DS-i (MU) Search Techniques
= 2-8

2.7.5 Pseudo Code


pooleanDLS (Node node, int limit, int depth)

{
if (depth > limit) return failure;
if (node is a goal node) return success;
for each child of node
{
if (DLS(child, limit, depth + 1))
return success;

}
return failure;
}
2.7.6 Performance Evaluation
e Completeness : Its incomplete if shallowest goal is beyond the depth limit.
e Optimality : Non optimal, as the depth chosen can be greater than d.
¢ Time complexity : Same as DFS, 0 (b'), where | is the specified depth limit.
e Space complexity: Same as DFS, 0(b'), where | is the specified depth limit.

2.8 Iterative Deepening DFS (IDDFS)

2.8.1 Concept
e Iterative deepening depth first search is a combination of BFS and DFS. In [DDFS search happens depth wise but,
at a time the depth limit will be incremented by one. Hence iteratively it deepens down in the search tree.
¢ It eventually turns out to be the breadth-first search as it explores a complete layer of new nodes at each
iteration before going on to the next layer.
* It does this by gradually increasing the depth limit-first 0, then 1, then 2, and so on-until a goal is found; and thus
guarantees the optimal solution. Iterative deepening combines the benefits of depth-first and breadth-first
search. The search process is depicted in Fig. 2.8.1.
Limit= 0 (A)

—~
Limit=2 (A) SAX
ee oe
(6) © © ©
oO) ©
(A) (A)
6) ‘C)

() @
Fig. 2.8.1 ; Search process in IDDFS (contd...)

WP leciknatadya
Publications
=

AS Aland DS4 (MU) 9.9 Search Techniques

or a on ( DN (3) (Q) ee @ @
om (K)

Limit=3 -*(A) (A) (A) (A)


8) (C) © (S) (3) (S)
Oo © (Dp) = ©

WH ©

Fig. 2.8.1 : Search process in IDDFS

e Fig. 2.8.1 shows four iterations of on a binary search tree, where the solution is found on the fourth iteration.

2.8.2 Process

function 1 iterative : Depending search (problem) returns a solution, or failure


for depth = 0 to do

result — Depth - Limited - Search (problem, depth)

if result + cutoff then return result

Fig. 2.8.1 the iterative depending


search algorithm, which repeatedly applies depth limited search with
increasing limits. It terminates when a solution is found or if the depth limited search returns failure, meaning that no
solution exists.

id TechKrowledge
Pubticarians
EF Aland DS-1 (MU) »10 search Techniques
2.8.3 Implementation

inbas exactly the same implementation as that of DLS. Additionally, iterations are required to increment the
depth limit by one in every recursive call of DLS.

2.8.4 Algorithm
e Initialize depth limit to zero.
e Repeat Until the goal node is found,

(a) Call Depth limited search with new


depth limit.
(b) Increment depth limit to next level.

2.8.5 Pseudo Code

IDDFSQ)

{
limit = 0;
found = false;
while (not found)

{
found = DLS(root, limit, 0);
Jimit = limit+ 1;

i
}
2.8.6 Performance Evaluation

¢ Completeness : IDDFS is complete when the branching factorb is finite.

¢ Optimality : [tis optimal when the path cost is a non-decreasing function of the depth of the node.

¢ Time complexity:

o Doyou think in IDDFS there is a lot of wastage of time and memory in regenerating the same set of nodes
again and again.
° It may appear to be waste of memory and time, but it's not so. The reason is that, in a search tree with
almost same branching factor at each level, most of the nodes are in the bottom level which are explores
very few times as compared to those on upper level.
© The nodes on the bottom level that is level ‘d’ are generated only once, those on the next to bottom level are
generated twice, and so on, up to the children of the root, which are generated d times. Hence the time
complexity is O(b‘).
* Space complexity : Memory requirements of IDDFS are modest, i.e. O(b*).

Note: As the performance evaluation Is quite satisfactory on all the four parameters, IDDFS Is the preferred uninformed

search method when the search space Is large and the depth of the solution Is not known.
Ee TechKnowledye
> Publications
2-11 Search Techniques
¥ Al and DS-1 (MU) =

2.9 Bidirectional Search


ce

2.9.1 Concept
called
In bidirectional search, two simultaneous searches are run. One search starts from the initial state,
forward search and the other starts from the goal state, called backward search. The search process terminates when
in bidirectiona|
the searches meet at a common node of the search tree. Fig. 2.9.1 shows the general search process
search.

2.9.2 Process

Fig. 2.9.1: Search process in bidirectional search

2.9.3 Implementation

e _In Bidirectional search instead of checking for goal node, one need to check whether the fringes of the two
searches intersect; as they do, a solution has been found.
e When each node is generated or selected for expansion, the check can be done. It can be implemented with a
hash table, to guarantee constant time.
e For example, consider a problem which has solution at depth d = 6. If we run breadth first search in each
direction, then in the worst case the two searches meet when they have generated all of the nodes at depth 3.
Ifb=10.
e _ This requires a total of 2,220 node generations, as compared with 1,111,110 for a standard breadth-first search.

2.9.4 Performance Evaluation

e Completeness: Yes, if branching factor b is finite and both directions use breadth first search.
e Optimality : Yes, if all costs are identical and both directions use breadth first search.
e Time complexity ; Time complexity of bidirectional search using breadth-first searches in both directions is
O(b*/2).
e Space complexity : As at least one of the two fringes need to kept in memory to check for the common node, the
space complexity is O(b‘/*).

Ww Tech Knowledge
Pubticactans
°
¥ Aland DS-I (MU) 2-12 Search Techniques

2.9.5 Pros of Bidirectional Sea


rch
e Itismuch more efficient.
e — Reduces space and time requirements as, we perform two b‘/ searches, instead of one b‘ search.
e Example:
o Suppose b = 10, d = 6, Breadth first search will exami
ne10° = 1, 000, 000 nodes.
o Bidirectional search will examine 2 x 103 = 2, 000
nodes.
¢ One can combine different search strategies in different directions to avail better performance.

2.9.6 Cons of Bidirectional Search

e The search requires generating predecessors of states.


¢ Overhead of checking whether each new node appears in the other search is involved.
e Forlarge d, is still impractical!
e For two bi-directional breadth-first searches, with branching factor b and depth of the solution d we have
memory requirement of b‘/ for each search.

2.10 Comparing Different Techniques


Table 2.10.1 depicts the comparison of all uninformed search techniques basis on their performance evaluation.
As stated in Chapter 1, the algorithms are evaluated on four criteria viz. completeness, optimality, time complexity
and space complexity.
The notations used are as follows :
eb: Branching factor
e dd: Depth of the shallowest solution

ST
e m:Maximum depth of the search tree
e |; Depth limit

TE TT
Table 2.10.1 : Comparison of tree-search strategies basis on performance Evaluation

Parameters BFS | UniformCost | DFS | DLS_|-IDDFS | Bidirectional

TIS
Completeness Yes Yes No No Yes Yes

No No Yes Yes
TE
Optimality Yes Yes
O(b“*/£) O(b") | O(b) | O(b4 O(b4/2)
Time Complexity | O(b‘)
O(b “/€) O(b™) | O(b') | O(b‘) O(b472)
Space Complexity | O(b*)
SE

2.10.1 Difference between BFS and DFS


Sr.No. BFS : : DFS
DFS stands for “Depth First Search”.
mf

1. BFS Stands for “Breadth First Search”.


2. BFS traverses the tree level wise. i.e. each node near | DFS traverses tree depth wise. ie. nodes in
to root will be visited first. The nodes are explored | particular branch are visited till the leaf node and
left to right. then search continues branch by branch from left to
right in the tree.
————

WF TechKnowledga
Publications
=
iF Al and Ds. (MU)
2-13
ii
ese Oe a
Search Techniques
—e
BFS
DFS
3.
Breadth First’ Search js implemented using | Depth
-——_}_eue which is FIFO list, First Search is pleemen
impl menteg
ted
using Stack which is LIFO list.
4. Thisis isj a Single step algorithm, wherein
the visited | This is two step algorithm.
nar In first stage, the Visiteq
are removed from the queue and then | vertices are
pushed onto the stack and later on
isplayed at once,
when there is no vertex further to visit thos
aes e are
popped out.
S.
> _| BFS requires more memory comp
are to DFS. DFS require less memory compare to BFS.
6. Applications of BFS:
Applications of DFS:
* — To find Shortest path
e Useful in Cycle detection
e Single Source & All pairs shortest
paths e InConnectivity testing
* In Spanning tree
e Finding a path between V and W in the graph.
* In Connectivity
¢ Useful in finding spanning trees & forest.
7. BFS always provides the shallowest
path solution. DFS does not guarantee the shallowest path
" solution.
8. No backtracking is required in BFS.
Backtracking is implemented in DFS.
9. BFS is optimal and complete if bran
ching factor is | DFS is neither complete
finite. nor optimal even in case of
finite branching factor.
3 th
10. BFS can never get trapped into infinite
loops. DFS generally gets trapped into infinite
loops, as
search trees are dense.
11. Example:
Example :
A
A
/\
/\
B C
B Cc
/ /N /
D E
/\
F
DB E F
A, B, C, D, E, F
A,B, D, C, E, F

2.11 Informed Search Techniques

Qa. Write short note on Informed search. Te eae


e Informed searching techniques is a further extension of basic un-informed search
techniques. The main idea is to
generate additional information about the search state space using the knowledg
e of problem domain, so that
the search becomes more intelligent and efficient. The evaluation function is developed for each
state, which
quantities the desirability of expanding that state in order to reach the goal.
All the strategies use this evaluation function in order to select the next state under consideration, hence the
name “Informed Search". These techniques are very much efficient with respect to time and space requirements
as compared to uninformed search techniques.
Ww Tech Knowledge
Publicarlons
WF Aland DS-1 (MU) Search Techniques
==

2.12
—_—_—
Heuristic Function

University Quéstions
Q. Explain heuristic function with example,
a
Qa What is heuristics function 2 How wil you find suitable heuristic function ? Give suitable example. [IUEEEESEI
Q, Explain heuristic function with example,
TEAC aks
Q Define heuristic function. Give an example heuristics function
for block world problem. Drea
A heuristic function is an evaluation function, to
which the search state is given as input and it
tangible representation generates the
of the state as output.
It maps the problem state descri
ption to measures of desirability, usually represented as number weights. The
value ofa heuristic function at a given node in the search process gives a good estimate of that node being on the
desired path to solution.
It evaluates individual problem state and determines how much promising the state is. Heuristic functions are
the most common way of imparting additional knowledge of the problem states to the search algorithm.
Fig. 2.12.1 shows the general representation of heuristic function.
wah, ;

State / Node 'n' Value of node n, 'e'


ee cuiriéie F
mn hn) cti
na semen3)

Fig. 2.12.1 : General representation of Heuristic function


The representation may be the approximate cost of the path from the goal node or number
of hopes required to
reach to the goal node, etc.
The heuristic function that we are considering in this syllabus, for a node n is, h(n)
= estimated cost of the
cheapest path from the state at node ntoa goal state.
Example : For the Travelling Salesman Problem, the sum of the distances travelled so

ee
far can be a simple
heuristic function.
Heuristic function can be of two types depending on the problem domain. It can be a Maximiza
tion Function or
Minimization function of the path cost.
In maximization types of heuristic, greater the cost of the node, better is the node while;
in case of minimization
heuristic, lower is the cost, better is the node. There are heuristics of every general
applicability as well as
domain specific. The search strategies are general purpose heuristics.
It is believed that in general, a heuristic will always lead to faster and better solution,
even though there is no
guarantee that it will never lead in the wrong direction in the search tree.
Design of heuristic plays a vital role in performance of search.
As the purpose of a heuristic function is to guide the search process in the most
profitable path among all that
are available; a well designed heuristic functions can provides a fairly good estimate
of whether a path is good or
bad.
However in many problems, the cost of computing the value of a heuristic
function would be more than the
effort saved in the search process. Hence generally there is a trade-off between the
cost of evalu ating a heuristic
function and the savings in search that the function provides.
So, are you ready to think of your own heuristic function definitions? Here is the word of
caution. See how the
function definition impact.
Following are the examples demonstrate how design of heuristi
c function completely alters the scenario of
searching process.

Ww TechKnowledga
Publications
18

| WY Aland Ds-| (Mu)


2-15
Search Techniques
2.12.1 Example of 8-puzzle Problem
* — Remember 8-puzzle problem? Can
we estimate the number of steps required
7
State?? What about designing to solve an 8-puzzle from a given
a heuristic function for it?

7 5 4 1 2
5 6 3 4 5
8 3 1 6 7 8
Start state Goal State
Fig. 2.12.2 ; A scenario of 8-puzzle problem
Two simple heuristic functions
are:
© h, =the number of misplaced tiles. This is also known
as the Hamming Distance. In the Fig. 2.12.2 example,
the start state has h, = 8. Clearly, h, is an acceptable heurist
ic because any tile that is out of place will have
to be moved at least once, quite logical.
Isn’t it?
© hy = the sum of the distances of the tiles from their
goal positions. Because tiles cannot be moved
diagonally, the distance counted is the sum of horizo
ntal and vertical distances. This is also known
as the
Manha ttan Distance. In the Fig. 3.14.2, the start state hash,
=3+1+2+2+2+3+3+2=18. Clearly, h, is
also an admissible heuristic because any
move can, at best, move one tile one step closer to the goal.
As expected, neither heuristic overestimates the true number
of moves required to solve the puzzle, which is
26 (h, + hz). Additionally, it is easy to see from the definit
ions of the heuristic functions that for any given state,
h, will always be greater than or equal to h,. Thus, we can
say that h, dominates hy.
2.12.2 Example of Block World Problem

Q.__Find the heuristics value for a particular state of the blocks world problem.

° Fig. 2.12.3 depicts a block problem world, where the A, B, C.D Stan |A Goal D
letter bricks are piled up on one another and required to be D a
arranged as shown in goal state, by moving one brick ata time. — _
e As shown, the goal state with the particular arrangement of “ 8
blocks need to be attain from the given start state. Now it’s 'B A
time to scratch your head and define a heuristic function that
will distinguish start state from goal state. Confused??

Fig. 2.12.3 : Block Problem

e Let's design a function which assigns + 1 for the brick at right sa a" a
position and - 1 for the one which is at wrong position. a
Consider Fig. 2.12.4 a|

Blocks world
Fig. 2.12.4 : Definition of Heuristic Function “hy”
wW TechKnomledga
Puptications
WF Aland DS-1 (MU) 2-16 Search Techniques
Local heuristic
» +41 foreach block that is resting on the thing it is suppo
sed
to be resting on. 1 for each block that is Testing on a
wrong thing.
1A] 0
Fig. 2.12.5 shows the heuristic values generated 0
by 'D|
heuristic function “h," for various different states in
the 0
state space.
B felfaL [el fal fel
Please observe that, this heuristic is generating same value
for different states. Fig. 2.12.5 : State evaluations using
Due to this kind of heuristic the search may Heuristic function “h,”
end up in
limitless iterations as the state showing most promising D
heuristic value may not hold true or search may end up in san “ -
finding an undesirable goal state as the state evaluation D c
may lead to wrong direction in the search tree. CG B
Let's have another heuristic design for the same problem. B A
Fig. 2.12.6 is depicting a new heuristic function “h”
definition, in which the correct support structure of each Bivoke ware
brick is given +1 for each brick in the support structure. Fig. 2.12.6: Definition of heuristic
And the one not having correct support structure, -1 for function “h,"
each brick in the wrong support structure.

As we observe in Fig. 2.12.7, the same states are considered


again as that of Fig. 2.12.5, but this time using hz, each one
of the state is assigned a unique value generate according to LA] -6
heuristic function h2. 'D | 0
Observing this example one can easily understand that, in
the second part of the example, search will be carried out
| fall Elm
smoothly as each unique state is getting a unique value Fig. 2.12.7 : State evaluations using
assigned to it. Heuristic function “h,”

This example makes it clear that, the design of heuristic plays a vital role in search process, as the whole search
is carried out by considering the heuristic values as basis for selecting the next state to be explored.
The state having the most promising value to reach to the goal state will be the first prior candidate for
exploration, this continues till we find the goal state.

Global heuristic

For each block that has the correct support structure : + 1 to every block in the support structure.

For each block that has the wrong support structure : - 1 to every block in the support structure.

This leads to a discussion of a better heuristic function definition.


Is there any particular way of defining a heuristic function that will guarantee a better performance in search
process??

W TechKnowledga
Publications
4 Al and DS] (MU) 2-17 Search Techniques
=

2.12.3 Properties of Good Heuristic Function


It should generate a unique value for each unique state in search space.
WN >

The values should be a logical indicator of the profitability of the state in order to reach the goal state.
It may not guarantee to find the best solution, but almost always should find a very good solution.

It should reduce the search time; specifically for hard problems like travelling salesman problem where the time
>

required is exponential.
The main objective of a heuristic is to produce a solution in a reasonable time frame that is good enough for
solving the problem, as it’s an extra task added to the basic search process.
The solution produced by using heuristic may not be the best of all the actual solutions to this problem, or it may
simply approximate the exact solution. But it is still valuable because finding the solution does not require a
prohibitively long time. So we are investing some amount of time in generating heuristic values for each State in
search space but reducing the total time involved in actual searching process.
Do we require to design heuristic for every problem in real world? There is a trade-off criterion for deciding
whether to use a heuristic for solving a given problem. It is as follows.
© Optimality : Does the problem require to find the optimal solution, if there exist multiple solutions for the
same?
° Completeness : In case of multiple existing solution of a problem, is there a need to find all of them? As
many heuristics are only meant to find one solution.
° Accuracy and precision : Can the heuristic guarantee to find the solution within the precision limits? Is the
error bar on the solution unreasonably large?
oO Execution time : Is it going to affect the time required to find the solution? Some heuristics converge faster
than others. Whereas, some are only marginally quicker than classic methods.
In many AI problems, it is often hard to measure precisely the goodness of a particular solution. But still it is
important to keep performance question in mind while designing algorithm.
For real world problems, it is often useful to introduce heuristics based on relatively unstructured knowledge. It
is impossible to define this knowledge in such a way that mathematical analysis can be performed.

2.13 Best First Search

2.13.1 Concept

University Question

Q. Explain search strategy to overcome drawbacks of BFS and DFS. CIBC


In depth first search all competing branches are not getting expanded. And breadth first search never gets
trapped on dead end paths. If we combine these properties of both DFS and BFS, it would be “follow a single path
at a time, but switch paths whenever some competing path look more promising than the current one”. This is
what the Best First search is..!!
Best-first search is a search algorithm which explores the search tree by expanding the most promising node
chosen according to the heuristic value of nodes. Judea Pearl described best-first search as estimating the
promise of node n by a “heuristic evaluation function f(n) which, in general, may depend on the description of n,
the description of the goal, the information gathered by the search up to that point, and most important, on any
extra knowledge about the problem domain",
Ww Tech Knowledge
Puorirarrons
¥F Aland ps-1 (Mv) 2-17 Search Techniques
—,

2.12.3 Properties of Good Heuristic Function

It should generate a unique value for each unique state in search space.
wn —

state in order to reach the goal state.


The values should be a logical indicator of the profitability of the
It may not guarantee to find the best solution, but almost always should find a very good solution.
It should reduce the search time; specifically for hard problems like travelling salesman problem where the time
>

required is exponential.
The main objective of a heuristic is to produce a solution in a reasonable time frame that is good enough for
solving the problem, as it’s an extra task added to the basic search process.
The solution produced by using heuristic may not be the best of all the actual solutions to this problem, or it may
not require a
simply approximate the exact solution. But it is still valuable because finding the solution does
prohibitively long time. So we are investing some amount of time in generating heuristic values for each state in
search space but reducing the total time involved in actual searching process.

Do we require to design heuristic for every problem in real world? There is a trade-off criterion for deciding
whether to use a heuristic for solving a given problem. It is as follows.
© Optimality : Does the problem require to find the optimal solution, if there exist multiple solutions for the
same?
° Completeness : In case of multiple existing solution of a problem, is there a need to find all of them? As
many heuristics are only meant to find one solution.
¥
° Accuracy and precision : Can the heuristic guarantee to find the solution within the precision limits? Is the
y error bar on the solution unreasonably large?
i
*
o Execution time : Is it going to affect the time required to find the solution? Some heuristics converge faster
t
than others. Whereas, some are only marginally quicker than classic methods.
In many Al problems, it is often hard to measure precisely the goodness of a particular solution. But still it is
important to keep performance question in mind while designing algorithm.
For real world problems, it is often useful to introduce heuristics based on relatively unstructured knowledge. It
is impossible to define this knowledge in such a way that mathematical analysis can be performed.

2.13 Best First Search

2.13.1 Concept

University Question
Q. Explain search strategy to overcome drawbacks of BFS and DFS. | MU - Dec: 10]
In depth first search all competing branches are not getting expanded. And breadth first search never gets
trapped on dead end paths. If we combine these properties of both DFS and BFS, it would be “follow a single path
at a time, but switch paths whenever some competing path look more promising than the current one”. This is
what the Best First search is..!!
Best-first search is a search algorithm which explores the search tree by expanding the most promising node
chosen according to the heuristic value of nodes. Judea Pearl described best-first search as estimating the
promise of node n by a “heuristic evaluation function f(n) which, in general, may depend on the description of 9,
the description of the goal, the information gathered by the search up to that point, and most important, on any
extra knowledge about the problem domain”.
WwW TechKnowledge
Pupicaciar>
¥F Aland DS-1 (MU) 06 Search Techniques
e Efficient selecction of the current best candidate for extension is typically implemented using a priority queue.
_Efficien
‘ig. 2.12.13.1 de picts the search process of Best first search on an example search tree. The values noted below the
Fig.
nodes are the estimated heuristic values
of nodes

Fig. 2.13.1 : Best first search tree expansion scenario

2.13.2 Implementation

e Best first search uses two lists in order to record the path. These are namely OPEN list and CLOSED list for
implementation purpose.
e OPEN list stores nodes that have been generated, but have not examined. This is organized as a priority queue, in
which nodes are stored with the increasing order of their heuristic value, assuming we are implementing
maximization heuristic. It provides efficient selection of the current best candidate for extension.
e CLOSED list stores nodes that have already been examined. This CLOSED list contains all nodes that have been
evaluated and will not be looked at again. Whenever a new node is generated, check whether it has been
generated before. If it is already visited before, check its recorded value and change the parent if this new value
is better than previous one. This will avoid any node being evaluated twice, and will never get stuck into an
infinite loops.

2.13.3 Algorithm : Best First Search

OPEN = [initial state]


CLOSED = []
while OPEN is not empty
do { i, et ees

Remove the best node from OPEN, call it n, add it to CLOSED.


If n is the goal state, backtrack path to n through recorded parents and return path.
ON,

Create n's successors.


PW

For each successor do:

a. If itis notin CLOSED and it is not in OPEN: evaluate it, add it to OPEN, and record its parent.
b. Otherwise, if it is already present in OPEN with different parent node and this new path is better than
previous one, change its recorded parent.
i. Ifitis notin OPEN add it to OPEN.
W Tech Knowledge
Publications
XF Alandps- (MU) 2-19 Search Techniques
oe:
ii. Otherwise, adjust its priority in OPEN using this new evaluation.

done

This algorithm of Best First Search algorithm just terminates when no path is found. An actual implementation
would of course require special handling of this
case.
2.13.4 Performance Measures for Best First Search

Completeness : Not complete, may follow infinite path if heuristic rates each state on such a path as the best
option. Most reasonable heuristics will not cause this problem however.
Optimality : Not optimal; may not produce optimal solution always.
Time Complexity : Worst case time complexity is still O(bm) where m is the maximum depth.
Space Complexity : Since must maintaina queue of all unexpanded states, space-complexity is also O(bm).

2.13.5 Greedy Best First Search


A greedy algorithm is an algorithm that follows the heuristic of making the locally optimal choice at each stage
with the hope of finding a global optimum.
When Best First Search uses a heuristic that leads to goal node, so that nodes which seems to be more promising
are expanded first. This particular type of search is called greedy best-first search.
In greedy best first search algorithm, first successor of the parent is expanded. For the successor node, check the
following :
1.__ If the successor node's heuristic is better than its parent, the successor is set at the front of the queue, with
the parent reinserted directly behind it, and the loop restarts.
2. _ Else, the successor is inserted into the queue, in a location determined by its heuristic value. The procedure
will evaluate the remaining successors, if any of the parent.
In many cases, greedy best first search may not always produce an optimal solution, but the solution will be
locally optimal, as it will be generated in comparatively less amount of time. In mathematical optimization,
greedy algorithms solve combinatorial problems.
For example, consider the traveling salesman problem, which is of a high
computational complexity, works well with greedy strategy as follows.
Refer to Fig. 2.13.2. The values written on the links are the straight line
distances from the nodes. Aim is to visit all the cities A through F with the
shortest distance travelled.
Let us apply a greedy strategy for this problem with a heuristic as, “At
each stage visit an unvisited city nearest to the current city”. Simple
logic... isn’t it? This heuristic need not find a best solution, but
terminates in a reasonable number of steps by finding an optimal Fig. 2.13.2 : Travelling Salesmen
solution which typically requires unreasonably many steps. Let's verify. Problem example
As greedy algorithm, it will always make a local optimal choice. Hence it will select node C first as it found to be
the one with less distance from the next non-visited node from node A, and then the path generated will be
A>C—D-B-E-F with the total cost = 10 + 18 + 5 + 25 + 15 = 73. While by observing the graph one can find
the optimal path and optimal distance the salesman needs to travel. It turns out to be, A3B—D—E—F-3C where
the cost comes out to be 18 +5 +15+15+18=68.

ray TechKnowledge
Publticarions
WF_ Aland DS-1 (MU) 2-20 Search Techniques
2.13.6 Properties of Greedy Best-first
Search
1. Completeness : It's not comple
. te as, it can get stuck in loops, also is susceptible to wrong start and quality of
heuristic function.

Optimality : It’s not optimal; as it G0es on selecting a single path


and never checks for other possibilities.
Sec. an
3. Time Complexity : 0(b"), buta Sood heuristic can give dramatic improvemen
t.
4. Space Complexity : 0(b"), Itneeds to keep all nodes in memory.
2.14 A* Search

University Questions
Explain A* Algorithm. What is the drawback of At ? Also shows that A‘
is optimally efficient. TUBENaE!
Q. Describe A* algorithm with merits and demerits.
MU - Dec. 13
Q. —_ Explain A’ algorithm with example.
MU - May 14, Dec. 14
a. Explain A* search with example.
Tee me
2.14.1 Concept

e A* pronounced as “Aystar” (Hart, 1972) search method is a combination


of branch and bound and best first
search, combined with the dynamic programming principle.
¢ It's a variation of Best First search where the evaluation of a state or a node not only depends
on the heuristic
value of the node but also considers its distance from the start state. It’s the most widely known
form of best-
first search. A* algorithm is also called as OR graph / tree search algorithm.

¢ In A* search, the value of a node n, represented as f({n) is a combination of g(n), which is the cost of
cheapest
path to reach to the node from the root node, and h(n), which is the cost of cheapest path to reach from the
node
to the goal node. Hence f(n) = g(n) + h(n).

Tema
* As the heuristic can provide only the estimated cost from the node to the goal we can represent h(n) as h*(n);
similarly g*(n) can represent approximation of g(n) which is the distance from the root node observed by A* and
the algorithm A* will have,

f*(n) = g*(n) +h*(n)


® As we observe the difference between the A* and Best first search is that; in Best first search
only the heuristic
estimation of h(n) is considered while A* counts for both, the distance travelled till a particular node and the
estimation of distance need to travel more to reach to the goal node, it always finds the cheapest solution.

* Areasonable thing to try first is the node with the lowest value of g*(n) + h*(n). It turns out that this strategy is
more than just reasonable, provided that the heuristic function h*(n) satisfies certain conditions which are
discussed further in the chapter. A* search is both complete and optimal.

2.14.2 Implementation

A* does also use both OPEN and CLOSED list.

ee
Ww Tech Knowledge.
tH ee STEM

Publications
ee
RF Aland ps1 (Mu) 2-21 Search Technique,
a)

2.14.3 Algorithm (A*)

i. Initialization OPEN list with initial node; CLOSED= 6; 2 = 0,f=h, Found = false;
2. While (OPEN # 6 and Found = false )
t

1, Remove the node with the lowest value of f from OPEN to CLOSED and call it as a Best_Node.
ii If Best_Node = Goal state then Found = true
-
in, else
{?

a. Generate the Succ of Best_Node


b. For each Succ do
@

i. Compute g(Succ) = g(Best_Node) + cost of getting from Best_Node to Succ.


ii, [fSucce OPEN then /* already being generated but not processed */
{4

a. Call the matched node as OLD and add it in the list of Best_Node successors..

b. Ignore the Succ node and change the parent of OLD, if required.
- If (Succ) < g(OLD) then make parent of OLD to beBest_Node and change the values of g and f for OLD

- If g(Succ) >= g(OLD) thenignore

}
a. If Succe CLOSED then /* already processed */
6

i. Call the matched node as OLD and add it in the list of Best_Node successors.

ii. Ignore the Succ node and change the parent of OLD, if required
- If g(Succ) < g(OLD) then make parent of OLD to be Best_Node and change the values of g and f for OLD.

- Propogate the change to OLD’s children using depth firstsearch .


~ If g(Succ) >= g(OLD) then do nothing
}5

a, If Succé OPEN or CLOSED


{6

i. Addit to the list of Best_Node’s successors


ii, Compute f(Succ) = g(Succ) + h(Succ)

iii. Put Succ on OPEN list with its f value


Ww TechKnewledge
Puotications
¥F Aland DS-I (MU) 6:08 Search Techniques
}8 /* for loop*/
}2 /* else if */

}1 /* End while */,

3, If Found = true then report the best path


else report failure
4, Stop

2.14.4 Behaviour of A* Algorithm

As stated already the success of At totally depends upon the design of heuristic function and how well it is able
to evaluate each node by estimating its distance from the goal node. Let us understand the effect of heuristic function
on the execution of the algorithm and how the optimality gets affected
by it.
A. Underestimation

e — If we can guarantee that heuristic function ‘h’ never over estimates actual value from current to goal thatis,
the value generated by h is the always lesser than the actual cost or actual number of hopes required to
reach to the goal state. In this case, A* algorithm is guaranteed to find an optimal path to a goal, if one exists.
e Example:

f = g+h,Hereh is underestimated.

Underestimated

(1+3)B (1+4)C (1+5)D

ete ee
3 moves away from goal

(2+3)E

Soren
3 moves away from goal
(3+3)F

Fig. 2.14.1
° _[fwe consider cost of all arcs to be 1. A is expanded to B, C and D. ‘f values for each node is computed. B is
chosen to be expanded to E. We notice that f(E) = f(C) = 5. Suppose we resolve in favor of E, the path
currently we are expanding. E is expanded to F, Expansion of a node F is stopped as {(F) = 6 so we will now
expand node C.
° Hence by underestimating h(B), we have wasted some effort but eventually discovered that B was farther
away than we thought. Then we go back and try another path, and will find optimal path.
B. Overestimation
® Here h is overestimated that is, the value generated for each node is greater than the actual number of steps
node.
required to reach to the goal
=r Tech Knowledge
Publications
Search Techniques
tas

e
RF Al and DS-I (MU) 2-23 —————
cl

e Example
ian

Overestimated

\
at
st a

(143) B (144)C (145)D

a“
(2+2) E

/
(3+1)F

/
|
(440) G
Fig. 2.14.2
d to E, E to F and F to G fora
e As shown in the example, A is expanded to B, C and D. Now B is expande
D to G with a solution giving a
solution path of length 4. Consider a scenario when there a direct path from
path of length 2.This path will never be found because of overestimating h(D).
overestimating h, one
* Thus, some other worse solution might be found without ever expanding D. So by
cannot guarantee to find the cheaper path solution.

2.14.5 Admissibility of A*

University Questions

Q@. What do you mean by admissible heuristic function? Explain with example.
Q. Write short note on admissibility of A’.
e Asearch algorithm is admissible, if for any graph, it always terminates in an optimal path from initial state to goal
state, if path exists. A heuristic is admissible if it never over estimates the actual cost from current state to goal
state. Alternatively, we can say that A* always terminates with the optimal path in case h(n) is an admissible
heuristic function.
e Aheuristic h(n) is admissible if for every node n, if h(n) s h*(n), where h*(n) is the true cost to reach the goal
state from n. An admissible heuristic never overestimates the cost to reach the goal. Admissible heuristics are by
nature optimistic because they think the cost of solving the problem is less than it actually is.
e An obvious example of an admissible heuristic is the straight line distance. Straight line distance is admissible
because the shortest path between any two points is a straight line, so the straight line cannot over estimate the
actual road distance.
o Theorem: [fh(n) is admissible, tree search using A* is optimal.
o —_- Proof: Optimality of A*with admissible heuristic.
e Suppose some suboptimal goal G2 has been generated and is in the fringe. Let n be an unexpanded node in the
fringe such that n is on a shortest path to an optimal goal G.
Start

we Ne

80
G,
Fig. 2.14.3 : Optimality of A*

ray Tock Knessledge


Punlicacians

W Al and DS-I (MU) 104 search Techniques

f(G2) = g(G2)
since h (G2) = 0
g(G2) > g(G)
since G2 is suboptimal
f(G) = g(G)
since h (G) =0
f(G2) > f(G)
from above
h(n) s_ h*(n)
since h ls admissible
g(n)+h(n) s g (n) +h*(n)
f(n) < F(G)
Hence f (G2) > f (n), and A* will never select G2 for expan
sion
2.14.6 Monotonicity
University Question

| Q. _ Prove that A* is admissible if it uses Mono


tone heuristic.
A heuristic function h is monotone
or consistent if,
Vstates X; and Xjsuch that X, is successor
ofXi,
h (Xi) - h (X)) s cost (Xi, Xj) where,
cost (Xi, X;)actual cost of going from X; to X; and h (goal) = 0
In this case, heuristic is locally admissible ie., consistently finds the minimal path to each state they encounter
in
the search. The monotone property in other words is that search space which is everywhere locally consistent
with heuristic function employed ie., reaching each state along the shortest path from its ancestors. With
monotonic heuristic, if a state is rediscovered, it is not necessary to check whether the new path is shorter. Each
monotonic heuristic is admissible.
A cost function f(n) is monotone if f(n) < f(succ(n)), Vn.
For any admissible cost function f, we can construct a monotone admissible function.
Alternatively, the monotone property: that search space which is everywhere locally consistent with heuristic
function employed i.e., reaching each state along the shortest path from its ancestors.
With monotonic heuristic, if a state is rediscovered, it is not necessary to check whether the new path is shorter.

2.14.7 Properties of A*

1. Completeness : It is complete, as it will always find solution if one exist.


2 Optimality: Yes, it is Optimal.
3 Time Complexity : 0(b"), as the number of nodes grows exponentially with solution cost.
4 Space Complexity : 0(b"),as it keeps all nodes in memory.
2 14.8 Example : 8 Puzzle Problem using A* Algorithm
Start state Goal state

3. 7-6 5 3 6
5 1 2 70 e2
40 8 4 1 8

Ww Tock Knowledge
Publications
= Search Techniques
SF Al and DS-] (MIN) 2-25 a

Evaluation function ff bPe

Start state
; ne f=0+4
| 37 6
5 12
408
up left right
(143) (145) } (145)
37 6 37 6) 376
50 2 512 5 12
418 O48 460
up left right
(243) (243) (2+4)
306 376 37 6
572 O52 5 20
41.8 418 418
left right
(3+2) | (3+4)
O36 360
572 572
418 ‘418
down
(441) 4) right
5 36 5 36 ai
|i O72 70 2] 20
14 418 4-481; =

Fig. 2.14.4 : Solution of 8-puzzle using A* .

¢ The choice of evaluation function critically determines search results.


e — Consider Evaluation function

F(X) = g()+h(X)
h(X) = the number of tiles not in their goal position in a given state X
g(X) = depth of node Xin the search tree

For Initial node f(initial node) = 4

Ex. 2.14.1: Consider the graph given in Fig. 1 below. Assume that the initial state is S_ and the goal state is 7. Find a path
from the initial state to the goal state using A* Search. Also report the solution cost. The straight line distance
heuristic estimates for the nodes are as follows: h(1) = 14, h(2) = 10. h(3) = 8, h(4) = 12, h(5) = 10, h(6) = 10,
h(S) = 15.

Ww Tech Knowledge
Punlicacions
LE t—t™

€F Aland DS-I (MU) 0.26 Search Techniques


Solin. :

(s)15 (s)15
3 4
3 ‘
14 (1) (4) 12
14 (1) (4) #2

Open: 4(12 + 4), 1(144 3)


Closed : $(15)
© °
Open : 5(10 + 6), 1(14 + 3)
Closed : $(15), 4 (12 + 4)

Open: 1(14+3) 6(10+10) 2(10+11) Open: 2(10 + 7) 6(10+10)


Closed: S(15) 4(12 +4) 5(10 +6) Closed : S(5) 4(12 + 4) 5(10 + 6) 1(14 +3)

Open: 3(8 +11), 6(10 + 10) Open : 6(10 + 10)

Closed : $(15),4(12+4),5(10+6), Closed : s(15), 4(12 + 4), 5(10 + 6),


1(14+ 3),2(10+7) 1(14 + 3) 2(10 + 7) 3(8 + 11)

ZW Tech Knowledge
Putlicatians
= Al and DS-1 (Mu)

Open: 7(0+ 13)

[ Closed
: S(15), 4(12 + 4), 5(10 + 6), 1(14 +3), 2(10 4 7), 5(8 +4), 6(10 + 10)
2.14.9 Caparison among Best
First Search, A* search and Greedy Best First
Search
OLIN LESION Questi
on
Q. Compare following informed Searching algorithms based on perfo
rmance measure with justification : Complete,
Optimal, Time complexity and
space complexity.
(a) Greedy best first, (b) A*
(c) recursive best-first(RBFS)

Algorithm Greedy Best First Search | A* search Best First Search


Completeness Not complete complete | Not complete
Optimality Not optimal optimal Not optimal
Time Complexity | O(b™)
O(b™) O(bm)
Space Complexity | 0(b™) O(b™) O(bm)

2.15 Memory Bounded Heuristic Searches


2.15.1 Iterative Deepening A* (IDA*) {Self Study Topics}

Q. Describe IDA* search algorithm giving suitable example.


DUCE |
Concept
As we have learned that A* requires to keep all the nodes in memory,
it becomes very difficult to manage in case
of large and complex state space. In spite of being complete and optimal,
A* cannot be used for large state
problems because of this limitation.
Breadth First search faces same problem of huge memory requirement. But Iterative Deepening (IDDFS)
solved
it by limiting the search depth. Similarly, Iterative Deepening A* (ID A*) eliminates the memory constr
aints of A*
by keeping optimality of the solution intact.
In IDA* each iteration is a depth-first search with a threshold assigned to it. It keeps track of the cost,
f(n) = g(n)
+ h(n), of each node generated. As soon as the cost of newly generated node exceeds a threshold for that
iteration, the node will not be explored further and the search backtracks.
Ww TechKnossledge
Puolicaliens
RF Aland DS-1 (MU) Search Techniques
=

e = The hreshold
threshold : iteration
isis initialized to the heuristic estimate of the initial state, It is increased in each successive
to the total cost of the lowest cost
node that w as pruned during the previous iteratio n. The algorithm terminates
when a goal state is reached
whose total cost¢ Joes not exceed the current threshold.
Algorithm

Nodecurrent node
g the cost to reach current node

f estimated cost of the cheapest path (root.node


..goal)
h(node) estimated cost of the cheapest path (node..goal)
cost(node, succ) step cost function
is_goal(node) goal test

successors(node) node expanding function


procedureida_star(root)
bound := h(root)
loop

t:= search(root, 0, bound)


ift = FOUND then return bound
ift = co then return NOT_FOUND

bound := t

end loop

end procedure
functionsearch(node, g, bound)
f:=g +h(node)
iff > bound then return f
ifis goal(node) then return FOUND

min ;= 0

forsuccinsuccessors(node) do

t:= search(suce, g + cost(node, succ), bound)


ift = FOUND then return FOUND

ift< min then min ;=t


end for
return min
end function

W TechKnowledge
Publications
LD RE a

Search Techniques
XE Aland DS-1 (MU)

Performance evalua —
tion
1 Optimality : IDA* finds optimal solution, if the heuristic function is admissible.
limit. ay
2 finds solution if it exists within the th reshold
Completeness : IDA* js complete, It always
on ential time complexity.
3.
Time Complexity : IDA* expand s the same number of nodes, as A*. So iti has exp
is linear
searches. Hence, iits
ts memory requirement
Space Complexity : IDA* performs a series of depth-first
4
with respect to the maximum search depth.
From the above discussion it is clear that IDA* is optimal in time and space as compared to all other heuristic
search algorithms that find optimal solutions on a tree. As in case of A*, in IDA*, we need not manage OPEN and
CLOSED list hence, it often runs faster than A* and the implementation is much simpler as compared to A*.
; *

2.15.2 Simplified Memory-Bounded A* (SMA*) {Self Study Topic)

AS IDA* does not retain any path history, but the only thing kept back in iterations is the threshold eit value;
it is bound to repeat the same expansions through various iterations. Hence there cannot be an optimize use of
memory. Let's see how SMA* handles this memory management issue.
It has following properties :
1. SMA* uses all the available memory efficiently.
2. SMA* doesn’t generate states repeatedly.

3. SMA* is complete, if the available memory is sufficient to store the shallowest solution path.
4. SMA* is optimal if the available memory is sufficient to store the shallowest optimal solution path.
Otherwise it generates the best solution possible with the available amount of memory.
5. SMA* is optimally efficient when enough memory is available for the entire tree search.
The basic procedure of SMA* is just like A*. It expands the best leaf until memory is full. When there is no
memory left to add newly generated node, it needs to drop one of the early expanded old node. SMA* always
drops the leaf node with the highest f -cost. The dropped node is called as “forgotten node”.
SMA* then backs up the value of the forgotten node to its parent. In this way, the quality of the best path in that
sub-tree is always known to the ancestor of a forgotten sub-tree. This information is used when all other paths
look worse than the path it has forgotten. In that case, SMA* regenerates the forgotten sub-tree.
What if all the leaf nodes have the same f-value? Rare case, but possible. In such situation, to avoid selecting the
same node for deletion and expansion, SMA* expands the “newestbest” leaf and deletes the “oldest worst” leaf.
If there is only one leaf, even these coincide, but in that case, the current search tree must be a single path from
root to leaf that fills all of memory. If the leaf is not a goal node, then even if it is onan optimal
solution path, that
solution is not reachable with the available memory.

When to choose SMA* given a choice?


As SMA* always finds optimal solution; so in following situations SMA* is a perfect choice.
Connected state space, i.e, when state space is graph.
Un-uniform step costs.
Node generation is expensive compared to the overhead of maintai
ning the OPEN and the CLOSED lists.

e ——p
By Tech Knowledge
Publications
|
“f Aland DS-1 (MU)
2-30 Search Techniques
=
Fig. 2.15.1 depicts a typical exa : is
The se mple where memory can store only three nodes. The original search tree.
shown at the top. cond } : /
talf of the figure shows iterations of SMA*, Each node is shown with the f-cost, Le. f
=g +h. The goal nodes ’
best forgotten sub-tree, I, F, J, K are shown in squares. The numbers in parenthesis stand for the f-cost of the
x : . -

er
eee
See

Fig. 2.15.1: Process of SMA* with memory size of 3 nodes

The algorithm proceeds as follows


1. At each step, one successor is added to the lowest f-cost node, which still has some successors unexplored. The
root nodeA is expanded to generate B.
As f(A) = 12 which is less than {(B), which is 15, the algorithm proceeds for further expansion of A, and generates
node G, and f(G) = 13. Now as all the children of A are expanded we can update its f-cost to the minimum of its
children, that is 13. And notice that three nodes are already expanded hence memory is FULL.
W Tech Knowledge
Pudtications
22 ee
ay

Search Technique
¥ Al and DS-1 (MU) 2-31 —

3. Now to proceed further, we can explore the node G. but first need to make sp ace for the new node. As the
algorithm is designed we need to drop the shallowest highest
f-cost leaf pps. That 18 Moe te unten
also add this forgotten descendant with f-cost 15, as shown in parenthesis. We can now a hte solution from ,
18, Unfortunately, this is not the goal node, and again memory is full. Hence, there is no path tos ,
So we sect {(H)
= 00,
:
| Again G is expanded for next child, and generate I with f(I) = 24. Now as Gis explored fully, with ne H and ]
|
|
ii with f-cost oo and 24, we drop Hl and f(G) becomes 24. As I is also a goal node, but this might not be the optima]
solution as still A’s f-cost is only 15.
i

i

As A is once again the most promising node, so B is generated again. We have found that the path through G is
Not so great.
Cis first successor of B. Observe that it is a non-goal node and with maximum f-cost, so f(C) = ©.
In order to expand node B further we need to first drop C. Generate node D with
f(D) = 20, and this value is
forwarded to B and A.
Now, the deepest and lowest cost node D is selected for furthe
r expansion, as this turns out to be the goal node,
search terminates.

2.15.3 Advantages of SMA* over A* and IDA*


SMA* always solves more difficult problems in less amount of memory as compared to A* or IDA*.
SMA* generates minimum number of nodes, thereby avoiding significant overhead.
SMA* outperforms in case of real valued heuristic and in case of highly connected state space.

2.15.4 Limitation of SMA*

As there is a limited amount of memory available in case of SMA*, in case of very hard problems
where the state
space is complex, SMA* needs to switch back and forth con forced to switch
back and forth continually among
many candidate solution paths as only a small subset of all the expanded nodes can
fit in memory. Remember
the problem of thrashing in disk paging systems!! Then the extra time required
for repeated regeneration of the
same nodes.
This means that problems that would be practically solvable by A*, given
unlimited memory, become intractable
for SMA*. Hence in case of SMA*, memory limitations can make
a problem intractable, with respect to
computation time.

2.16 Local Search Algorithms and Optimization Problems

2.16.1 Hill Climbing


University Questions
Q. — Write short note on Hill Climbing algorithms.
MU = Dec. 13
Q. _ Describe Hill Climbing algorithm. What are it’s limitations 2
MU - May 13, Dec. 14
Qa. Explain Hill- Climbing algorithm with example.
MU - May 16
Hill climbing is simply a combination of depth first with
generate and t est where a feedback is
decide on the direction of motion in the search space. used here to
Hill climbing technique is used widely in artificial intelli
; gence, to solve s
which has multiple possible solutions. olving computationally hard problems,

uy TechKnowledge
Publications
Aland DS-1 (MU) Search Techniques
we = 2-32
In the depth-first search, the test function will merely accept or reject a solution. But in hill climbing the test
— is provided with a heuristic function which provides an estimate of how close a given state is to goal
state.
In Hill climbing, each state is provided with the additional information needed to find the solution, i.e. the
Rather, it
heuristic value. The algorithm is memory efficient since it does not maintain the complete search tree.
looks only at the current state and immediate level states
For example, if you want to find a mall from your current location. There are n possible paths with different
directions to reach to the mall. The heuristic function will just give you the distance of each path which is
reaching to the mall, so that it becomes very simple and time efficient for you to reach to the mall.
Goal (Hill top)
Hill climbing attempts to iteratively improve the current state by
means of an evaluation function. “Consider all the possible states
Jaid out on the surface ofa landscape. The height of any point on
the landscape corresponds to the evaluation function of the state
at that point” (Russell & Norvig, 2003). Fig. 2.16.1 depicts the
typical hill climbing scenario, where multiple paths are available to Start
point
reach to the hill top from ground level.
Fig. 2.16.1 : Hill Climbing Scenario

Hill climbing always attempts to make changes that improve the current state. In other words, hill climbing can
only advance if there is a higher point in the adjacent landscape.
Hill climbing is a type of local search technique. It is relatively simple to implement. In many cases where state
space is of moderate size, hill climbing works even better than many advanced techniques.
For example, hill climbing when applied to travelling salesman problem; initially it produces random
combinations of solutions having all the cities visited. Then it selects the better rout by switching the order,
which visits all the cities in minimum cost.
There are two variations of hill climbing as discussed follow.

2.16.1(A) Simple Hill Climbing


It is the simplest way to implement hill climbing. Following is the algorithm for simple hill climbing technique.
Overall the procedure looks similar to that of generate and test but, the main difference between the two is use of
heuristic function for state evaluation which is used in hill climbing. The goodness of any state is decided by the
heuristic value of that state. It can be either incremental heuristic or detrimental one.
Algorithm
1, Evaluate the initial state. If it is a goal state, then return and quit; otherwise make it a current state and go to
Step 2.
Loop until a solution is found or there are no new operators left to be applied (ie. no new children nodes left to
be explored).
a. Select and apply a new operator (i.e. generate new child node)
b. Evaluate the new state:
and quit.
(i) If itis a goal state, then return
current state.
(ii) If itis better than current state then make it a new
go to Step 2.
(iii) If itis not better than the current state then continue the loop,
Ww Tech Knowledge
Publications
Te

Search Technique,
WF Aland ps-1 (Mu) 2-33 eth
curren,
tate that is better than the
As we study the algorithm, we observe that in every pass the first node / $ ution to the
State is considered for further exploration. This strategy may not guarantee t hat most optimal sol
problem, but may save upon the execution time.
2.16.1(B) Steepest Ascent Hill Climbing

Q.___ Write algorithm of steepest ascent hill climbing. And compare It with simple hill climbing. is |
As the name suggests, steepest hill climbing always finds the steepest path to hill top. It does so tS an ms best
node among all children of the current node / state. All the states are evaluated using heuristic function. fan y e time
for steepest ascent hill climbing is as
requirement of this strategy is more as compared to the previous one. The algorithm
follows.
Algorithm
current state.
1. Evaluate the initial state, if it is a goal state, return and quit; otherwise make it as a
to current state:
2. Loop until a solution is found or a complete iteration produces no change
SUCC.
a. SUCC =a state such that any possible successor of the current state will be better than
b. For each operator that applies to the current state, evaluate the new state:
(i) If it is goal; then return and quit
(ii) Ifitis better than SUCC then set SUCC to this state.
c. SUCC is better than the current state — set the current state to SUCC.
e As we compare simple hill climbing with steepest ascent, we find that there is a tradeoff for the time
requirement and the accuracy or optimality of the solution.
Incase of simple hill climbing technique as we go for first better successor, the time is saved as all the successors

e
Pea are not evaluated but it may lead to more number of nodes and branches getting explored, in turn the solution
<}
; found may not be the optimal one.
e While in case of steepest ascent hill climbing technique, as every time the best among all the successors is
selected for further expansion, it involves more time in evaluating all the successors at earlier stages, but the
solution found will be always the optimal solution, as only the states leading to hill top are explored. This also
makes it clear that the evaluation function i.e. the heuristic function definition plays a vital role in deciding the
performance of the algorithm.
2.16.1(C) Limitations of Hill Climbing

@. Explain limitations of Hil Climbing in bret. ase es aa de et re Si


Q." What are the problemsmustrations that ‘occur in til climbing technique? Illustrate with an example. |Dec.
MU = 15 |
Q. Write ehort note on Limitations ofHill-climbing algorithm. he) eae re

¢ —Nowlet’s see what can be the impact of incorrect design of heuristic function on the hill climbing techniques.
e Following are the problems that may arise in hill climbing strategy. Sometimes the algorithms may lead to a
position, which is not a solution, but from which there is no move possible which will lead to a better place on
hill ie, no further state that is going closer to the solution. This will happen if we have reached one of the
following three states.
e Local Maximum : A “local maximum” is a location in hill which is at height from other parts of the hill but is not
the actual hill top. In the search tree, it is a state better than all its neighbors, but there is not next better state
which can be chosen for further expansion. Local maximum sometimes occur within sight of a solution. In such
cases they are called “Foothills”.

W Tech Knowledge
Publicacions
On

W Al and DS-1 (MU)


2-34 Search Techniques

Fig. 2.16.2 : Local Maximum


» Inthe search tree local maximum can
be Seen as follows :

Fig. 2.16.3 : Local maxima in Search Tree


e Plateau: A “plateau” is a flat area at some height in hilly region. There is a large area of same height in plateau.
In the search space, plateau situation occurs when all the neighboring states have the same value. On a plateau, it
is not possible to determine the best direction in which to move by making local comparisons.

| \ “

Fig. 2.16.4 : Plateau in hill climbing


e Inthe search tree plateau can be identified as follows :

¢ Ridge: A “ridge” is an area in the hill such that, it is higher than the surrounding areas, but there is no further
uphill path from ridge. In the search tree it is the situation, where all successors are either of same value or
lesser, it’s a ridge condition. The suitable successor cannot be searched in a simple move.
{© Ridge

Fig. 2.16.6: Ridge in Hill Climbing

* Inthe search tree ridge can be identified as follows :


(s)
VO VY
Fig. 2.16.7

W TechXnowledge
Pavlicacions
a ee

2-35 Search Technique,


WF Aland Ds. (Mu) a,

Fig, 2.16.8 depicts all the different situations together in hill climbing.
f global

Objective function
Plateau

Local maxima

State space

Fig. 2.16.8 : Hill climbing problems scenario

2.16.1(D) Solutions on Problems in Hill Climbing


In order to overcome these problems we can try following techniques. At times combination of two techniques
will provide a better solution.
1. A good way to deal with local maximum, we can back track to some earlier nodes and try a different
direction.
In case of plateau and ridges, make a big jump in some direction to a new area in the search. This can be
strategy is
done by applying two more rules of the same rule several times, before testing. This is a good
dealing with plateau and ridges.
Hill climbing is a local method. It decides what to do next by looking only at the “immediate” consequences of its
choices. Global information might be encoded in heuristic functions. Hill climbing becomes inefficient in large
problem spaces, and when combinatorial explosion occurs. But it uses very little memory, usually a constant
amount, as it doesn’t retain the path.
It is a useful when combined with other methods. The success of hill climbing depends very much on the shape
of the state space. If there are few local maxima and plateau, random-restart hill climbing will find a good
solution very quickly.

2.16.2 Simulated Annealing


Simulated annealing is a variation of hill climbing, Simulated annealing technique can be explained by an analogy
to annealing in solids. In the annealing process in case of solids, a solid is heated past melting point and then
cooled.
With the changing rate of cooling, the solid changes its properties. If the liquid is cooled slowly, it gets
transformed in steady frozen state and forms crystals. While, if it is cooled quickly, the crystal formation will not
get enough time and it produces imperfect crystals.
The aim of physical annealing process is to produce a minimal energy final state after raising the substance to
high energy level. Hence in simulated annealing we are actually going downhill and the heuristic function is a
minimal heuristic. The final state is the one with minimum value, and rather than climbing up in this case we are
descending the valley,
The idea is to use simulated annealing to search for feasible solutions and converge to an optimal solution. In
order to achieve that, at the beginning of the process, some downhill moves may be made. These downhill moves
are made purposely, to do enough exploration of the whole space early on, so that the final solution is relatively
insensitive to the starting state, It reduces the chances of getting caught at a local maximum, or plateau, oF @
ridge.
say Tech
pupsicarieds
ee

4 Aland DS-1 (MU)


2-36 Search Techniques
‘algorithm

1. Evaluate the initial state,


2.
SetT according to an anneal
ing schedule
Select and apply a new
operator
Evaluate the new state:
goal quit
AE = Val(current state)
~Val(new State)
AE<0-— newcurrent state

else new current state with probabil


ity e-“€ /«r
We observe in the algorithm that, if
the next state is better than the current,
state, But in case when the next it readily accepts it as a new current
State is not havin g the desirable value even then it accepts that state with some
probability, e“E / kT where AE ts
the positive cha nge in the energy level, T is temperature and k is Boltzmann’s
constant.

Thus, in the simulated annealing there are very less chances of large uphill moves
than the small one. Also, the
probability of uphill moves decreases with the temperature
decrease.
Hence uphill moves are more likely in the beginning of the annealing process, when the temperatu
re is high. As
the cooling process starts, temperature comes down, in turn the uphill moves.
Downhill moves are allowed any
time in the whole process. In this way, comparatively very small upward moves are allowed till finally, the
process converges to a local minimum configuration, i.e. the desired low point destination
in the valley.
2.16.2(A) Comparing Simulated Annealing with Hill Climbing

‘GQ. Compare and contrast simulated annealing with hill climbing.


A hill climbing algorithm never makes “downhill” moves toward states with lower value and it can be
incomplete, because it can get stuck on a local maximum.
In contrast, a purely random walk, i.e. moving to a successor chosen at random from the set of success or sin
dependent of whether it is better than the current state, is complete but extremely inefficient. Therefore, it is
reasonable to try a combination of hill climbing with a random walk in some way that yields both efficiency and
completeness. Simulated annealing is the answer...!!
As we know that, hill climbing can get stuck at local minima or maxima, thereby halting the algorithm abruptly, it
may not guarantee optimal solution. Few attempts were made to solve this problem by trying hill climbing
considering multiple start points or by increasing the size of neighborhood, but none worked out to produce
satisfactory results, Simulated annealing has solved the problem by performing some downhill moves at the
beginning of search so that, local maximum can be avoided at later stage,

Hill climbing procedure chooses the best state from those available or at least better than the current state for
further expansion. Unlike hill climbing, simulated annealing chooses a random move from the neighborhood. If
the successor state turned out to be better than its current state then simulated annealing will accept it for
further expansion. If the successor state is worse, then it will be accepted based on some probability.

EP TechKnowledge
Publications
Search Techniques
WF Atanas. (Mu) 2.37 =,

2.16.3 Local Beam Search

Q. Write short note on local beam search.


Q. Whats the significance of local minima in neuron learning 7
* — Inall the variations of hill climbing till now, we have considered only one node getting selected at a time for
i further search process. These algorithms are memory efficient in that sense. But when an tafruisful branch Bets
explored even for some amount of time it is a complete waste of time and memory. Also the solution produce
i
i
'

may not be the optimal one.


® — The local beam search algorithm keeps track of k best states by performing parallel k searches. At each step it
gencrates successor nodes and selects k best nodes for next level of search. Thus rather than focusing on only
ane branch it concentrates on k paths which seems to be promising. If any of the successors found to be the goal,
search process stops.
e In parallel local beam search, the parallel threads communicate to each other, hence useful information is passed
among the parallel search threads.
e — Inturn, the states that generate the best successors say to the others, “Come over here, the grass is greener!” The
algorithm quickly terminates unfruitful branches exploration and moves its resources to where the path seems
most promising. In stochastic beam search the maintained successor states are chosen with a probability based
on their goodness.
Algorithm : Local Beam Search
- =n : eer tisecc seo Ee mevasg free _

Step 1: Found = false;


Step 2: NODE = Root node;
Step3: If NODE is the goal node, then Found= true else find SUCCs of NODE, if any with its estimated cost and
store in OPEN list; Fils PW)
Step 4: While (Found = false and not able to proceed further) ea

Sort OPEN list; | is


Select top W elements from OPEN list and put it in W_OPEN list and empty OPEN list;
While (W_OPEN # $ and Found = false)

{
Get NODE from W_OPEN;
if NODE = Goal state then Found = true else aed
{ ins
Find SUCCs of NODE, if any with its estimated cost se cease aee el
store in OPEN list; ‘ .— 4

} // end inner while


}// end outer while
Step 5: If Found = true then return Yes otherwise return No and Stop mane
———+
nay TechKnowledge
Pubiicacians
y Aland DS-I (MU) Search Techniques
2-38
= k=2

it

6 d<=--

Continue till goal state is


found or not able to
'
'
!
1
NS
proceed further v

Fig. 2.16.9 : Process of local Beam Search


e _ Asshown in Fig. 2.16.9, here k = 2, hence two better successors are selected for expansion at first level of search
and at each next level, two better successors will be selected by both searches. They do exchange their
information with each other throughout the search process. The search will continue till goal state is found or no
further search is possible.
¢ It may seem to be that local beam search is same as running k searches in parallel. But it is not so. In case of
parallel searches, all search run independent of each other. While in case of local beam search, the parallel
running threads continuously coordinate with one another to decide the fruitful region of the search tree.
e Local beam search can suffer from a lack of diversity among the k states by quickly concentrating to small region
of the state space.
e While it is possible to get excellent fits to training data, the application of back propagation is fraught with
difficulties and pitfalls for the prediction of the performance on independent test data. Unlike most other
learning systems that have been previously discussed, there are far more choices to be made in applying the
gradient descent method.
¢ — The key variations of these choices are : The learning rate and local minima - the selection of a learning rate is of
critical importance in finding the true global minimum of the error distance.
* Back propagation training with too small a learning rate will make agonizingly slow progress. Too large a
learning rate will proceed much faster, but may simply produce oscillations between relatively poor solutions.
* Both of these conditions are generally detectable through experimentation and sampling of results after a fixed
number of training epochs.
¢ Typical values for the learning rate parameter are numbers between 0 and 1: 0.05 <h<0.75.
* One would like to use the largest learning rate that still converges to the minimum solution momentum -
empirical evidence shows that the use of a term called momentum in the back propagation algorithm can be
helpful in speeding the convergence and avoiding local minima.
a
® The idea about using a momentum is to stabilize the welght change by making non radical revisions using
change:
combination of the gradient decreasing term with a fraction of the previous weight
Aw(t) = - dEe/dw(t) + aAw(t- 1)
the current weight change.
where a is taken 0£ a £ 0.9, and t is the Index of
* This gives the system a certain amount of Inertia since the weight vector will tend to continue moving in the
gradient term.
same direction unless opposed by the
* The momentum has the following effects :
Tecknowtedge
UF Penitcacions
a
RE RT SAA NERA LATEST SSS

Search Techniques
WF Aland Ds-1 (MU) 2-39 SS
-to-sideside oscillations accross the
ing, that is cancels side-to-
It smooths the weight changes and suppresses cross-stitch
error valey;
te causing a faste
When all weight changes are all in the same direction the momentum amplifies the learning ra S r
convergence;

Enables to escape from small local minima on the error surface.


nce and avoij
The hope is that the momentum will allow a larger learning rate and that this will speed converge id
no momen tum will be much faster when no eee
local minima. On the other hand, a learning rate of 1 with
with local minima or non-convergence is encountered ; - sequential or random presentabion~ : epoe . “
measured in terms of epochs. i on.
fundamental unit for training, and the length of training often is
epoch with revision after a particular example, the examples can be presented in the same sequentia, Or
the examples could be presented in a different random order for each epoch. The random representation Usually
yields better results.
The randomness has advantages and disadvantages :
tends to jitter
o Advantages : It gives the algorithm some stochastic search properties. The weight state
may escape trapping in suboptimal
around its equilibrium, and may visit occasionally nearby points. Thus it
the
weight configurations. The on-line learning may have a better chance of finding a global minimum than
true gradient descent technique.
a good minimum it
© Disadvantages : The weight vector never settles to a stable configuration. Having found
may then continue to wander around it
state. The
Random initial state - unlike many other learning systems, the neural network begin in a random
network weights are initialized to some choice of random numbers with a range typically between -0.5 and 0.5
(The inputs are usually normalized to numbers between 0 and 1). Even with identical learning conditions, the
random initial weights can lead to results that differ from one training session to another.
The training sessions may be repeated till getting the best results.

2.17 Crypto-Arithmetic Problem


A crypto-arithmetic problem is an example of Constraint Satisfaction Problem (CSP), which we are studying in
the next section of this chapter.
It is a mathematical puzzle, where the numbers are represented by letters or symbols. Each letter represents a
unique digit. The goal is to find the digits such that a given mathematical equation is verified. In general, there
are few variables in form of letters which is to be assigned numeric values in the range of 0 to 9, such that the
given equation hold true.
To solve crypto-arithmetic problem, we have to generate the constraints by observing the given equation. Then
we may be able to assign a single final value to one of the variable or can reduce the range of possibly allowed
values for a particular variable. That helps in leading towards the solution,
From the above discussion one thing is clear that there is no specific algorithm to solve crypto-arithmetic, but
it’s a trial and error method based on backtracking strategy. Following example illustrates the procedure for
solving crypto-arithmetic problems.
Example 1:
a)
+ WT
+ SBM

BEIT
UP TechKnowledge
M pybbications
WY Aland DS-1(MU)
Search Techniques
One of the possible solution js [=3's- an
34 4W=8 T=5 S=9 M=6 B=1 E=0 as,
+ 85
+916
————
1035
The above example has multiple Solutions, Ideally, a good
crypto-arithmetic puzzle must have only one solution.
How to solve the problem?
We follow the classical three-stage method

stage 1: Describe
» Inthe describe stage we will
explain the problem and the
goal in natural language.
e Foracomplete problem description
we heed to answer following three questions :
1, Whatis the goal of the problem?
In this example, the goal is to
replace letters by digits such that
the sum IS + WT + SBM = BEI
2. Are there any unknow T is verified.
ns or decision variables?
The digits that the letters represen
t. In other words, for each letter we have one decision variable that can
take any digit as value of that letter.
3. Whatare the constraints?
The obvious constraint is the sum that has to be verified and
all the variables must have different values in a
feasible solution. But as we observe the given equation, we notice
that, there are other implicit constraints as
well,

ress
It is implicit that letters I, W, S and B cannot represent digit 0, as they are
the first digit of a number.
There are 9 distinct letters, so we need at least 9 different digits in the
answer.
By observation, we have B = 1, Consider three auxiliary variables, X, Y, Z. Hence
we can write the constraint

ES,
equations as,
S+T+M 10X+T
ll

1+W+B+X 10Y +I
it

S+Y 10B+E
Solve : A we got the quadratic equations representing the problem, we can solve them
SST:
simultaneously to get the
solution.
Example 2 :

TWO
+ TWO
FOUR
E.g., setting F= 1,0 =4,R=8, T=7, W=3,U = 6 gives
734 + 734 = 1468
Trick: introduce auxiliary variables X, Y
O+0O = 10X+R

W+W+X = 10Y+U
T+T+Y = 10F+0

ig’ Techinewledga
Pevtications
~
‘and DS-1 (MU) 2-41 Search Techniques
| a

ay

—— Also need pair wise constraints between original varlables if they are supposed to be different. a
EX.247.1: Solve Crypt arithmetic problem
EAT }
+ THAT
— ne
APPLE
Soln. :

EAT
THAT
eee

APPLE
! 2T=E+(C1"10 assuming C1 as carr
y so L = 2A+ C1
As Ais the final carry, A = 1, so we
have, L=2+C1
As carry can not be 2,C1=1
andL=3
So we have,

E1T
“TH1IT
—————_

IPP3E
As we observe that T is the only letter in the last addition and its generating
carry A.
Hence T= 9 andE=8
So we have,
819
9H19
1PP38
This makes P = 0

And the solution


is
819
9219
10038

2.18 Constraint Satisfaction Problem


Constraint:

Constraint is a logical relation among variables. Constraints arise


in most areas of huma n endeavor. They are a
natural medium for people to express problems in many fields.
Examples of constraints :
e Thesum of three angles of a triangle is 180 degrees,

UF Toca Kncwiodse
Publicachons
Vy Al and DS-I (MU)
2-42 Search Techniques
The sum of the currents owing int
Oan
. 0
constraint satisfaction ; de must equal zero.

The Constraint satisfaction js aprocess


. 0 nan 8a solution to a set of constraints. Constraints articulate allowed
yalues for variables, and finding solution
Constraint Satisfaction Problems Icses evalu ation of these variables that satisfies all constraints.

e In standard search problems Ww


state. In this case the
state is represented in the fori of a given with state space, heuristic function and goal
goal test. ny data structure that supports successor function, heuristic function, and
e Constraint
from domainSatisfactio
D, and vat as are special type of search in which, a state is defined by variables X, with values
1S a set of constraints specifying allowable combinations of values for subsets of
variables. Many real problems in Al
can be modeled as Constraint Satisfaction Problems.CSP allows useful
general purpose algorithms with more
power than standard search algorithms CSPs are solved through search.
2.18.1 Graph Coloring

Some i
poplar puzzles like, .
ome poplar map coloring problem, Latin Square, Eight Queens, and Sudoku are stated below.
1. Map coloring problem:
In this example, Variables are WA, NT, Q, NSW, V,S A,T
Domains D, = {red, green, blue}
Constraints : adjacent regions must have different colors

e.g. WA # NT, or (WANT) in {(red, green),(red, blue),(green, red), (green, blue), (blue, red),(blue, green)}

_ Northem
Territory
Westem Queensland
Australia

South
Australia
New South Wales ESSN SE FSU.

Victoria
eS SW

Tadkani]

Fig. 2.18.1
= green, V = red,
Solutions are complete and consistent assignments, e.g., WA = red, NT = green, Q = red, NSW
SA = blue, T = green
symbols such that each symbol occurs
2. Latin Square Problem : How can one fill an n x n table with n different
column ?!
exactly once in each row and each
4 are:
Solutions : The Latin squares for n= 1, 2,3 and
TechKnowladge
Publications
Search Techniques
= _ Mand DS. (MU) 2-43
wt,

—=,
a

Fig. 2.18.2
3. Eight Queens Puzzle Problem : How can one put 8 queens on a (8 x B)chess board —
attack any other queen ?
:

Solutions : The puzzle has 92 distinct solutions. If rotations and reflections of the board are counted as one, the
en ee

puzzle has 12 unique solutions.


ee

Real-world CSPs :
=i

1, Assignment problems : E.g., who teaches what class.


2 . Timetabling problems : E.g., which class is offered when and where?
3. Transportation scheduling : Factory scheduling.

2.18.2 Varieties of CSPs


varieties of CSPs.
Basis on various combinations of different types of variables and domains, we have
| They are as follows:
1} $i 1. Discrete variables and finite domains : n variables, domain size d -» O(a") complete assignments.
+ e.g., Boolean CSP
ATE

2. Discrete variables and infinite domains : integers, strings, etc. range of values
ak pla theta

e.g., Start/ob: + 5s StartJob3


e.g., job scheduling, variables are start/end for each job need a constraint language,
3. Continuous variables : E.g,, start/end times for Hubble Space Telescope observations
4. Linear constraints solvable in polynomial time by linear programming.

| | 2.18.3 Varieties of Constraints

1. Unary constraints involve a single variable.


e.g., SA # green
2. Binary constraints involve pairs of variables.
e.g. SA# WA
3. Higher-order constraints involve 3 or more variables.
e.g,, crypt arithmetic column constraints

2.18.4 Backtracking in CSPs

How to solve CSPs? Is there any standard method??Let's start with the straightforward approach, then fix it
States are defined by the values assigned so far
1. Initial state : The empty assignment {}
2. Successor function : Assign a value to an unassigned variable that does not conflict with current
assignment; fail if no legal assignments
Tech
Pablicaciont
ey Aland DS-1 (MU)

3. Goaltest: . The 2-44
current assignment Search Techniques
, Variable assignments is
S Compl]
are comm Utatiy _
Only need to consider assignments mh he. WAA = red then NT = green ] same as [ NT = green then WA = red J.
variable assignments js calted 9 single y with single-
backt ra k arlable at each node. Depth-first search for CSPs
algorithm for
Cops. Uetng thls technique one can vl ms search Backtracking search js the basic uninformed
» _ Backtracking algorithm is as fo queen Sforn~ 25,
ean ces lows ;
function BACKTRACKING-SEARCH (csp) ret
/> return RECURSIVE-BACKTRACK) Ms a solution; or failure no
function RECURSIVE-BACKTRACKING
agg
NG({}, csp)
Bament, csp)returns a soluti
_ <> - ifassignmentis comple te then on, or failure
return assignment
var SELECT-
Se a TUNASSIGNED-VARIABLE(Vaiables{cspl, assignmentcsp)
oreach value in ORDER-DOMAIN-VALUES{var,
assignment,csp)do
if value is consi ;
eee ensistent with assignment according to Constraints[csp] then
hee add{var = value} to assignment
Pe ; result <~ RECURSIVE-
BACKTRACKING (assignment, csp)
if result # failure then ret
urn result
mean move {var = value} from assignment...
AR

peeTetum failures ory Bee yo ee


besa

a ance yale hea RAL haat

2.18.5 Improving Backtracking Efficiency

Following are the general-purpose methods that can give huge gains in speed by enforcing minimum number of
backtracks. While assigning values are choosing variable for evaluation following questions are inevitable. By
answering these questions we can design an efficient strategy in order to improve efficiency of backtracking process.
1. Which variable should be assigned next?
The answer to this question leads to two simple strategies to select the next variable for assignment, called
“Most Constrained Variable” and “Most Constraining Variable”.
2. In what order should its values be tried?
This question designs a strategy called “Least Constraining Value”, which helps us in choosing a value among all
possible values from the domain.
3. Can we detect inevitable failure early?
This is the key question which has a significant impact on the speed of generating the solution. These are the
strategies which can foresee the failure if the current path of assignment is followed. Thereby, guide us whether
“Forward checking”, “Constraint
we are on the right track. There are three strategies under this section namely,
Propagation”, and “Arc Consistency”.
Let's study all these strategies one by one.
1. Most constrained variable :
It choose the variable with the fewest legal values. By assigning this variable first, we can get a fair idea of other
variable assignment. Also most of the constraints get satisfied by the assignment, hence chances of getting on
first
wrong track are lowered down. This is also called as minimum remaining values (MRV) heuristic or Forced
heuristic.
—_
UF TechKnowiadge
Publications
Search Techniques
1 (MU) 2-45 _—

Fig. 2.18.3
Most co nstraining variable: ,
ning variable strategy we first choose
This is the tie-breaker among most constrained variables. In Most constrai
ximum
the variable with the most constraints on remaining variables. Example, the region surrounded by ma
number of regions. Once we assign color to his part of the graph, it as have decided the colors for aj)
good as we
the surrounded parts.

oO
Fig. 2.18.4
Least constraining value:
Given a variable, choose the least
constraining value that is the one
that rules out the fewest values for
remaining variables. the

Allows1 value for SA

Allows 0 value for SA

Fig. 2.18.5
Forward checking:
It keeps track of remaining legal values for unassigned variables. Terminate
search when any variable has no
legal values. As shown in the graph, as we go on assigning colors to different
regions, we notice that SA is not left
with any valid color to assign. Hence the assignment will terminat
e and the process backtracks to previous state.

oO

WA
SA Tw
EE Ae
me cs
Bang
[2re)
Fig. 2.18.6

ver Tech Knowled


Puolications
Aland DS-I (MU)
=.‘¢ 2-46
Search Techniques
s, Constraint propagation ;
Forward checking Propagates inf i
; jables, but doesn't' provi‘de early1
detection for all failures, In const Ormation fro mM assigned to unassigned variab
r taint propagation strate gy, as we assign
i color
lor to one of the part of the graph,
Recrea
following example, Asevaluated
te for the valid assignments. Hence we can detect the conflicts early. As shown the
the region
they both can’t hav Qis getting assigned Green color, both NT and SA are left with blue color; but
e same color q
S they are adjacent regions. Hence the wrong assignment of green to Q region is
detected a step early,

wh SA T
Ee
a Ee
| SEED
Lea BIE

NT and SA cannot both be blue!


Constraint propagation repeatedly enforces constr
aints locally.
6. Arcconsistency:
In this technique, arcs are drawn from one variable to other
if they have values satisfying the constraints. The
aim is to make each arc consistent while propagating the constra
ints. The arc P > Q is consistent if and only if
for every value p of ‘P’ there is some allowed value q. If P’ loses a value,
neighbors of ‘P’ need to be verified for
permitted values and all arcs are again checked for consistency. Hence
failures can be detected even early as
compared to forward checking. We check arc consistency after every assignment.

WA SA T

Maal Meas

Fig. 2.18.8 : Arc Consistency

In Fig. 2.18.1as WA and Q are assigned red and green colors respectively, only blue is left for NT and if it is
assigned then there is no consistent value left for SA. |

7 TechKnewledge
Publications
WF Aland Ds-1 (Mu) 2-47 Search Techniques
Soln.:

No. of regions = 6 (Guj, MP, MH, AP, KAR, Goa)


Constraints = Guj#MP#MAH
MAH # GOA # KAR
KAR # AP # MAH

GU) MP MAH GOA KAR AP


Red Red Red Red Red Red
Blue Blue Blue
Green Green

Yellow

Hence, the optimal number of colours required to colour this map is ‘4’.
Fig. P. 2.18.1
2.18.6 Water Jug
eS
ahd

Let us take an example of a water jug problem


e¢ We have two jugs, a 4 - gallon one and a 3- gallon one. Neither has any measuring markers on it. There is a pump
Na

that can be used to fill the jugs with water how can you get exact 2 gallons of water into the 4 - gallon jug?
* The state space for this problem can be described as of ordered pairs of integers (x, y), such that x = 0,1,2,3 or 4,
representing the number of gallons of water in the 4- gallon jug and y = 0,1,2 or 3, representing the quantity of
water in the 3- gallon jug. The start state is (0, 0). The goal state is (2, n) for any value of n, since the problem
does not specify how many gallons need to be in the 3- gallon jug.
The operators to be used to solve the problem can be described as shown bellow. They are represented as rules
whose left side are matched against the current state and whose right sides describe the new state that results
from applying the rule.
Rule set :
1. (xy) ——> (4,y) fill the 4- gallon jug
Ifx<4
2. (xy) ——» (x3) fill the 3-gallon jug
Ifx<3
3. (xy—» = (x-d.y) pour some water out of the 4- gallon jug
Ifx>0
4. (xy) —~» (x-dy) pour some water out of the 3- gallon jug
Ify>0
5. (xy) ——> (0,y) empty the 4- gallon jug on the ground
Ifx>0
6. (xy)——+» (x,0) empty the 3- gallon jug on the ground
Ify>0
7. (xy) ——» (4y-(4-x)) pour water from the 3- gallon jug into the 4- gallon
Ifx+y>=4andy > 0 jug until the 4-galoon jug is full
W Tech Knowledge
Pupiicarions
v Aland DS-1 (MU) a.49 Search Techniques
— 3
a. (xy)— > (x-(3-y),3)) pour water from the 4- gallon jug into the 3-gallon
Ifx+y>=3andx>0 jug until
the 3-gallon jug is full
9, (xy)——> (+y,0) pour all the water from the 3 -gallon
jug into
Ifx+y <=4andy>0 the 3-gallon jug
40. (&y)——> (Ox+y) pour all the water from the 4 -gallon jug into
Ifx+y <=3andx>0 the 3-gallon jug
11. (0,2) —> (2,0) pour the 2-gallon from the3
-gallon jug into
the 4-gallon jug
12. (2y) ——> (0x) empty the 2 gallon in the 4 gallon on the groun
d
production for the water jug problem:

Gallons in the 4- gallon Jug | Gallons in the 3- gallon | Rule Applied


0 0
0 3 2
3 0 9
3 3 2
4 2 7
0 2 Sor12
2 0 9orii

One solution to the water jug problem.


(x, y)

(0, 0)

(4, 0) (0, 3)

ak hs
43) 3) (0) %°
~~
W™ (4.0) (9 3) (0, 0) (0, 1)

(4, 3) (0,1) (4,0) (2, 3)

~i~™
Fig. 2.18.9
Ww Tech Knowledge
Puslications
aff A
ce BR ™

WF Aland ps. (mv) Search Techni

=
2.19 Adversarial Search -
ae
according
* — Adversarial Search Problem is having competitive activity which involves 'n! players and it is played
to certain set of protocols.
surro unding environment in g
Game is called adversarial because there are agents with conflicting goals and the
game is competitive as there are 'n' players or agents participating.
very participant wants to win the
We say that goals are conflicting and environment is competitive because e
game.
* From above explanation it is understandable that we are dealing with a competitive multi agent environment.
* — As the actions of every agent are unpredictable there are many possible moves/actions.
* In order to play a game successfully every agent in environment has to first analyze the action of other agents
i and how other agents are contributing in its own wining or loosing. After performing this analysis agent
i executes.

i 2.20 Environment Types

Fig. 2.20.1 : Environment types

There can be two types of environments in case of multi-agent. Competitive and cooperative.

1. Competitive environment: \

e In this type of environment every agent makes an effort to win the game by defeating or by creating
superiority over other agents who are also trying to win the game.
e Chess is an example of a competitive environment.

2. Cooperative environment:

e Inthis type of environment all the agents jointly perform activities in order to achieve same goal.
e Car driving agent is an example of a cooperative environment.

2.21 Al Game - Features

Under artificial intelligence category, there are few special features as shown in Table 5.3.1, which make the
game more interesting.

Ww TechKaowladge
Pupticarioes
j

Search Techniques
Bs0

A Eaplanatlon 200) 2 ES 0) oe example se


When there are
two o
2- ply game. To incr Pponents/agents nt playing
playing the game it is called | Chess
e ase difficulty of the game Intelligence
to agents in Al ga is added
mes
Note that, in case
of Al Games we must have
a. (i.e. single player ga at least two players.
mes don't com e under Al games category).
Multi-agent When there are two or mor
e 0 pponents/agents playing the game it | Monopoly
is called multi agent
enviro nment where action of every agents
affects the action of oth
— er ag ents,
penrcoaparaive eae Surrounding environment is not helpful for winning
the | Card games
environmen
enviro! game it is called as non- i or competitive.
On-cooperative it
Turn taking In a multi-agent environment when the agent/ player performs a | Any board game,
move and has to wait for the next player to make the next move. Chess, carom, etc.
Time limit One more constraint can be incorporated in a game i.e. keeping a | Time bound
chess
limitation on time. Every player will get a finite amount of time to | games
take an action.
Unpredictable In Al games action of opponent agent is fuzzy which makes the | Card game with
opponent game challenging and unpredictable. multiplayer
Players are called unpredictable when the next step depends upon
an input set which is generated by the other player.

2.21.1 Zero Sum Game

¢ “Zero sum game” concept is associated with payoffs which are assigned to each player when the instance of the
game is over. It is a mathematical representation of circumstances when the game is in a neutral state. (i.e.
agents winning or losing is even handed with the winning and losing of other agents).
¢ Forexample, if player 1 wins chess game it is marked with say + 1 point and at the same time the loss of player 2
is marked with - 1 point, thus sum is zero. Another condition is when game is draw; in that case players 1 and 2
are marked with zero points. (Here + 1, - 1 and 0 are called payoffs).
2.21.2 Non-Zero Sum Game
Non-zero sum game's don't have algebraic sum of payoffs as zero. In this type of games one player winning the
game does not necessarily mean that the other player has lost the game.
There are two types of non-zero sum games :
1. Positive sum game
2. Negative sum game

2.21.2(A) Positive Sum Game


It is also called as cooperative game. Here, all players have same goal and they contribute together to play the
game. For example, educational games.
—~—_ YF lechtnowtedge
Pudlicacions
(ea HL YE MT MOREL YS GES =
NG
EN,

Search Techniques
WF Aland ds.1(mu) 2-51 —
LOL

2.21.2(B) Negative Sum Game

It is also called as competitive game. Here, every player has a different goal so no one really real y wins the game,
everybody loses. Real world example of a war suits the most.

2.22 Relevant Aspects of Al Game

4 ive overview of
To understand game playing, we will first take look at all appropriate aspects of a game which give
the stages in a game play. See Fig. 2.22.1.
handy. F
| ¢ Accessible environments : Games with accessible environments have all the necessary information handy. For
j example : Chess.
* Search : Also there are games which require search functionality which illustrates Bow players have to search
through possible game positions to play a game. For example : minesweeper, battleships.
|
¢ Unpredictable opponent : In Al games opponents can be unpredictable, thisis j introduces unce rtainty in game

‘Relevant aspects.
— ofgame

Fig. 2.22.1: Relevant aspects of AI game

2.23 Game Playing

Fig. 2.23.1 shows examples of two main varieties of problems faced in artificial intelligence games. First type is
“Toy Problems” and the other type is “Real World Problems”.

“Traveling Salesperson
Se (NP hard) Seuueee

Fig. 2.23.1 : Example Problems

Game play follows some strategies in order to mathematically analyze the game and generate possible outcomes.
A two player strategy table can be seen in Fig, 2.23.2.

W Tech Knowledge
Publicatians
pr

wW Al and DS-I (MU) 2.5 Search Techniques

Player 1 | Strategy 1 Strategy 2 Strategy 3


Player 2
~
FE
Strategy 1 Pew | Player 1 Wins Player 2 Wins
Strategy 2 Player 1 Wins Draw Player 2 Wins
Strategy 3 Player 1 Wins Player 2 Wins Draw

Fig. 2.23.2 : Two player strategy table


2.23.1 Type of Games

Game can be classified under deterministic or probabilistic category. Let's see what we mean by deterministic
and Probabilistic.

Deterministic :
e It is a fully observable environment. When there are two agents playing the game alternatively and the final
results of the game are equal and opposite then the game is called deterministic.
e Take example of tic-tac-toe where two players play a game alternatively and when one player wins a game then
other player losses game.
Probabilistic :

e Probabilistic is also called as non-deterministic type. It is opposite of deterministic games, where you can have
multiple players and you cannot determine the next action of the player.
e You can only predict the probability of the next action. To understand probabilistic type you can take example of
card games.
e Another way of classification for games can be based on exact/perfect information or based on inexact /
approximate information. Now, let us understand these terms.
1. Exact/perfect information: Games in which all the actions are known to other player is called as game
of exact or perfect information. For example tic-tac-toe or board games like chess, checkers.
2. Inexact / approximate information : Game in which all the actions are not known to other players (or
actions are unpredictable) is called game of inexact or approximate information. In this type of game,
player's next action depends upon who played last, who won last hand, etc. For example card games like
hearts.
Consider following games and see how they are classified into various types of games based on the parameters
which we have learnt in above sections :
Exact/ perfect information Inexact / approximate information

Deterministic | « Chess e Battleships


e Checkers e Card Game (Hearts)

Probabilistic |e Ludo e Scrabble

e Monopoly e Poker

Fig. 2.23.2 : Types of game

sey TechKnowledge
Publicacttens
Search Techniques

BF Aland Ds-1 (MU) 2-53 _


Now, let us try to learn about few games mention in Fig, 2.23.2.

2.23.1(A) Chess zero-sum


perfec t inform ation catego ry. Th is game is a two person,
* Chess comes under deterministic and exact/
game. same
Is no sec rec y and pla yers don't play at the
i e In chess both players can see chess board positions s o there
time they play one after the other in an order.
, a computer Deep Blue
nme nt to test arti fici al inte lligenc e techniques. In 1990's
Thus this game has perfect
enviro to und erstand how
| e This example is given
II defeated Garry Kasparov who was world champion at that time.
artificial intelligence can be used in decision making.
a. eer
ST
ee
ae

i és (a) Chess board (b) Deep Blue II vs Garry Kasparov


final position game 1
Fig. 2.23.3

|
aM m3 Fi
Bama a i
i eB wm

Ld
Data

Fig. 2.23.4 : Chess game tree

Tay Tech Knowledge


Puoticacions
¥¥ Aland DS-I (MU) 2-54 Search Techniques

7.23.1(B) Checkers

Checkers comes under deterministic and exact/perfect


information category. This game is a two person game where both
players can see board positions, so there is no secrecy and players
play one after the other in an order,
In 1990's a computer program name Chinook (also called
draughts) was developed which defeated human world champion
Marion Tinsley.

Fig. 2.23.5 : Checkers board


ome OS
(oll OE 6 @)
no. 0-0
|
a
[ol ofl bl ot
(ol oll oll
(oot ol of
.0 0:0)CBO 0-0cB o RES ERRORS MM CMS Moma CRe
Bomion|

oWeMoMeld loMomemem fimeme


Cc

L MoMiomc IM McMomis! jit ot ae keke enon meme panne c | oom


a} Po Cm a ow |e om | om tm ig)
ep

om il im | | wom my | |
a ne 8 L | le | oe) ee
Oo oc Moonf.-Bo oc o Peel oi onc o oc M eM oml loll oll eile
(ome) oe ae |e JalOi 6} | OF O3i Om OO OM oHomic [CMSM omic I omic om:
(oll oll oll a (eile eMe onc 8 foMomcMek) (eMoMoMell [oMoMeMoll \plem ome

aoa cae O00 OOM CO 0 CB o oon OMCMS) MOMSmoNS Momomome


(comiol ismeicMoll lola clei lollloill oo) BE eee (cKicMeMoll lollolteilet
fom mom! | lot Mole: [ical Molle Mom kiole (Moll MoMicl kisi Belle imenl Mole
mom Mom | Mom mm (om io mM | MommM O | Momim mi | Mom Mm | Nom i |
om fe i | om i mm | | om i | | om i | |i om | Gets ie eS
occ Bceco ec ecoc Be oec Bcce co Recon Bcoce
MoMeieme BoMletiols Rollo ite eas loMomomc MeMoliomie Mell oll omc
pan OTE Ox | ro) tn) Lom oo >i © Ui o [oilsOil oll os ome 6 oil 3! oo. tr 0

Fig. 2.23.6 : Checkers game tree

2.23.2 What is Game Tree ?

We saw game tree of chess and checkers in earlier section.


Game tree is defined as a directed graph with nodes and edge. Here nodes indicate positions in a game and edges
indicate next actions. Ss
Let us try to understand what a game tree is with the help of Tic-Tac-Toe example. Tic-Tac-Toe is a 2-player
game, it is deterministic and every player plays when his/her turn come, iaGame tree has a Root Node which give
N
the starting position of the game board, which isa blank 3 x 3 grid.
Say in given example player 1 takes 'X' sign. Then MAX(X) indicates the board for the best single next move. Also
d with'a MAX).
it indicates that it is player 1's turn. (Remember that initial move is always indicate
If the node is labeled as MIN then it means that it is opponents turn (i.e. player 2's turn) andae board for the
.
best single next move is shown.
illustrates
Possible moves are represented with the help of lines. Last level shows terminal board position, which
board
end of the game instance, Here, we get zero as sum of all the play offs. Terminal state gives winning
the player (- 1, 0, + 1).
position. Utility indicates the play off points gained by
games.
Ina similar way we can draw a game tree for any artificial intelligence based

sey TechKnowledge
Publications
rnin el

Search Techniques
BF Aland Ds-1 (MU) 2-55 —

MAX (X)

xX X X 7
MIN (O)
(O X x . x X

O|X x}O Xx
MAX (X) o coe
Ente

x} x]O x{o x/o


MIN (©) X X ae
t

'
|
|

X| X}O X}|X|O X| X}O eee


TERMINAL O}O| xX O}X|X X
O|xX X}{O}O OJO;/X
Utility -1 0 +1

Fig. 2.23.7 : Tic-Tac-T


oe Game Tree
We can formulate a game as a search problem as follows :

Initial state It gives give the starting position of the game board.

Operators They define the next possible actions. ee

Terminal state Which indicates that all instance of game are over.

Utility It displays a number which indicates if the game was won or lost or it was draw.
From Tic-Tac-Toe game's example you can understand that for a 3 x 3 grid, two player game, where the game
tree is relatively small (It has 9! terminal nodes), still we cannot draw it completely on one single page.
Imagine how difficult it would be to create a game tree for multi player games or for the games with bigger grid
size.
Many games have huge search space complexity. Games have limitation over the time and the amount of
memory space it can consume. Finding an optimal solution is not feasible most of the times, so there is a need for
approximation.
Therefore there is a need for an algorithm which will reduce the tree size and eventually will help in reducing
the processing time and In saving memory space of the machine.
One method is pruning where, only the required parts, which improve quality of output, of the tree are kept and
reaming parts are removed,
Another method is heuristic method (it makes use of an evaluation function) it does not require exhaustive
research. This method depends upon readily available information which can be used to control problem
solving.

Up lechtnewtedys
Publicacions
ay. Mand DS (MU
— ’ fe
2.24 MiniMax Algorithm 2-56 Search Techniques
i =a

noes Question

What is Min-Max search »


Minimax algorithmevaluates q Ecision
de terministic environ ment with based
: Perfect/exact | , 0 n the present status of the game.
This algorithm needs
Minimax algorithm directly im nformation,
plements
value is calculated with the hel
P 0 of simple recursive
In case of minimax computation
algorithm t
payoff (outcome) against best he Selected action ‘
play, with highest minimax value should be equal to the best possible
2.24.1 Minimax Algorithm

Take example of tic-tac toe game


gar¥i Greate an éntire bativ'tree toiney ndersta i imax algori: thm
nd min . We will take a random stage.
luding all the term
inal states,

Next action : 'X'


Oo
~<
x<

~<
oO

*<
Oo

«x

x<
oO
Oo
Oo

oO

<

Oo
~

Fig. 2.24.2

17 TechKnowlodye.
Publications
aa PS I i

Search Techniques
WF Aland ps. (MU) 2-57 ==,

Next action 'o'

X}|X}O
Oo x
°

o|x]lo!
XIX]O xIxlo xI|xlo xlxlo xK|x]oO —X|xX
O]O]x ololx olx]x o x o;x|x oO
x oO aoe) Oo o|x|o o|o x|o

O'sTum

xXIx]o xIxlo xIxlo xIx/o x]|x/oO x}x


o]o]x ofolx ofx|x ofo/x of;/x|x ojo
X1O10 o}]xlo ofolo o]x|o o]ojo x|o
pt
‘ Fig. 2.24.3
Step 2: For every terminal state find out utility (playoff points gained by every terminal state). Terminal
position where 1 means win and 0 means draw.

X| xX }O X}|X]O
oj;o;x o X
° ° oO

X}X]O X|X]O x; X|O X|X]O X|X]O X|X]O


O|O]xX O|O|]x OQ; X|xX oO xX Oo} X]X oO
x o x|O ° ° O|x|o O}/O xXx|O

O'sTum ' i |

x|X]O xX} X xXx|]X]O Xx] X X| X1O xX]x


O;O;xX oO x QO} X{X Oo x O| X|xX o1o
X}O];O Xx|/O <=er;e X|}O -8Fe+e xIlo

Fig. 2.24.4

UF lecatnewtadee
Publicaciens
2-58 Search Techniques
¥ Aland DS (MU)
step3: Apply
tpwandiinstietiite ity valuvalues
MIN and MAX operators on the nodes of the present stage and propagate the utility

x|xlo
o| [x

eh ees

X|x|o x|xlo x|x|o


O}0 |x o| |x o| |x
o| |o

x|x]o x|x]o\ x{xJo x|x]o\ x|x]o\ x|x{o


ofo|x | ofo[x | ofx|x | of |x| olx[x | o} [x
X| |O X}]O/ o| jo a ojo | xjo|o

0 | 1 1 1 o
x|xJo xx x|xlo_x|x x|x]o _x|x|o
ofo|x olg@|x olx|x “olg|x olx|x olo|[x
xlolo xlo -etete xlo tele xlolo

Fig. 2.24.5

Step4: With the max (of the min) utility value (payoff value) select the action at the root node using minimax
decision.

x
Oo x
x °

xIxlo xlx|@ x|xlo _x|x x|x|o x{xlo


ololxol@|x o[x{x o]M|x o|x|x olx
x|o]o x]O Teo x]O -ere7e X]o]o

Fig. 2.24.6

vey Tech Knowledge


Publicacians
Tae A LN Ne) RO Ss a
|

WF
tees Aland Ds. (Mu) 2.59 Search
Technigy

x} xX |9O
Oo xX

x
Oo

XIxlo xx xIxlo x{x X[X]O _X}X]O0


ClOlx olg@lx olxixolg@ix olx O}0|x
xlolo xlo stele xlo -etete xlolo

Fig. 2.24.7
(In case of Steps 2 and 3 we are assuming that the opponent will play
perfectly as per our expectation)
2.24.2 Properties of Minimax Algori
thm
e —_ Itis considered as Complete if the game
tree size is finite.
° — Itis considered Optimal when it is played against
an optimal number of opponents.
e Time complexity of minimax algorithm is indicat
ed as O(b™).
¢ Space complexity of minimax algorithm is indicated by 0(b™) (using
depth-first exploration approach).
e For chess, b = 35, m ~100 for “reasonable” games,
e Exact solution is completely infeasible in most of the games.

2.25 Alpha Beta Pruning

Q. Whatisa-Bpuning? =
Q. _.- Show the use of a-f pruning for a two person game with example.
e Pruning means cutting off. In game search it resembles to clipping a branch
in the search tree, probably which is
not so fruitful.

* At any choice For


highest-value. pointeach
along“X”, theif "X"
pathis worse
for max,Le. a is considered as the value of the best possible choice found ie,
lesser value than a value then, MAX will avoid it. Similarly we can
define B value for MIN.

e —a-B pruning is an extension to minimax algorithm where, decision making process need not consider each and
every node of the game tree.

e Only the important nodes for quality output are considered in decision making. Pruning helps in making the
search more efficient.
es

W Tech Knossledse
punsicacio®
a

w Ae 2-60 Search Techniques


pruning keeps only those Parts of t
he tre ¢ which contribute in improving the quality 0 f the result remaining
parts of the tree are removed,
Consider the following game tree ‘
MAX

MIN

Ag ” 7 2

Fig. 2.25.1
« Forwhich we have to calculate minimax values for root
node
Minimax value of root node = Max(m
in(4,1 0,6), min(3,A,B), min(13,7,2))

Max (4, C, 2) (C <=3) =4


e Letus see how to check this step by step.

MAX A |-.+)

MIN

MAX P< (4, +)

MIN
[--, 4]

MAX
Padre] [4, +20]

MIN [4,4]

4 10 6

W ledtinonteaye
Publications
Search Techniques
WF Aland Ds. (MU) 2-61 SR

MAX

MIN

sae Dice] [4] 4, 13)

MIN [4, 4] [--, 3] [->, 13]

MAX
Dt] [4] XG] (4, 7]

MIN
>3] [-~, 7]

4 10 6 3 13267

MAX

MIN

So in this example we have pruned 2 B and 0 a branches. As the tree is very small, you may not appreciate the
effect of branch pruning; but as we consider any real game tree, pruning creates a significant impact on search as far
as the time and space is concern.

er Yoch Knowled
W Al and DS-I (MU)
es |
2-62 Search Techniqu
575 Example of a-B Pruning

: Explain Min-Ma Marks


TU eee

22113443741035
4256
Fig. P. 2.25.1: Game Tree

soln. : Using min-max algorithm

221134437-110354256

Fig. P. 2.25.1(a) : Min-max solution with optimal path shown in bold


22

ee
ee
SS
221134437-110354256

Fig. P. 2.25.1(b) : Solution by a-B pruning

ESS
So
Total pruned branches
a cuts = 2
B- cuts= 3
fae cae

faa MU - Dec. 14, 12 Marks f


Ex. 2.25.2 : © Perform a - 8 cutoff on the following:.”

Ce
/~\ [L\prm/o
Joovbdrdd'sds Fig. P. 2.25.2
EE
—~— W TeckKnowledge
Publications
WF Aland Ds-i (mv) as
Soin. :

| Fig. P. 2.25.2(a)
No. of a- cuts
=1

No. of B - cuts
= 2

2.25.2 Properties of a-B


¢ Final results are not affected by pruning.
* Ordering of Good actions helps in improving effectiveness of pruning techni
que.
e If there is exact/perfect ordering then we can get time comple
xity as O(b™/2).
¢ Depth of search is doubled with pruning.

Sra ceicrsis Spe palsies

Review Questions
Q.1 Why is it called uninformed search? What is not been informe
d about the search?
Q.2 Write a note on BFS.
Q.3 Write a note on : Uniform cost search.

Q.4 How the drawbacks of DFS are overcome by DLS and


IDDFS?
Q.5 Compare and contrast DFS, DLS and IDDFS.

Q.6 Write short note on bidirectional search.


Q.7 Differentiate between unidirectional and bidirectional
search.
Q.8 Compare and contrast all the un-informed Searchi
ng techniques.
Q.9 Write a note on comparative analysis of
Searching techniques.
Q.10 Writea short note on Interative deepening search,
Q.11 Compare DFS and BFS with example.
Q.12 What Is heuristic function? What are the qualitie
s of a good heuristic 2
Q.13 Write a short note on : Properties of Heuristi
c function and its role in Al.
Q.14 Write algorithm for Best first search and Specify its properti
es,
Q.15 What Is the difference between best first and
greedy best first search? Ex pla
in with example,
Q.16 What Is the difference between best first and gree
dy best first search? Explain with exampl
Q.17 Compare Best First Search ple,
and A* Search with an example,

i
| rs
publicattor>
> E ETE
ey Aland DSI-1 (MU ) 2-64 Search Techniques
, g. 18 Write a short note on IDA*,

0. 19
Explain SMA" algorithm with example. When should we choose SMA’ given options?
Give a-B pruning algorithm properties,
Q. 20

Q.21 Write short note on behavior of A* In case of underestimating and overestimating Heuristic.
Q. 22 Discuss admissibllity of A* In case of optimality,

Q, 23
Compare and contrast A*, SMA’ and IDA*,

Q. 24 Compare and contrast A*, SMA* and IDA’.

Q.25 Explain solution in brief for Problems in Hill Climbing.

Q, 26 Give a-B pruning algorithm with an example and i's properties, also explain why is it called a-B pruning.
Q, 27 Write a note on : Simulated Annealing.

Q. 28 What is CSP?

Q. 29 Give examples of real time CSPs.

Q.30 How CSPs can be classified? Give example.

Q, 31 How to solve crypt arithmetic problem? Explain with example.

Q. 32 How to improve efficiency of backtracking In CSP?

Q. 33 What is adversarial search?

Q.34 Write a short notes on : Features of Al game.

Q.35 Write a short notes on : Zero-sum game.

Q. 36 Write a short notes on : Relevant aspects of Al games.

Q, 37 Write a short notes on : Game types.

Q.38 Explain minimax algorithm with an example.

0.39 Give minimax algorithm properties.

emer one hae


O00 ents
te
ee
aes
Nee Ort
SEEPS
7
Seine rome
Pe
KNOWLEDGE REPRESENTATION
USING FIRST ORDER LOGIC

Knowledge and Reasoning ; A Knowledge Based Agent, WU


MPUS WORLD Environment, Propositiitio
o nal“a Log
” ic,
besFirs
os t Order
Predicate Logic, Forward and Backward Cha
ining, Resolution. Planning as an applicati
Concepts of Partial Order planning, Hierarchic
al Planning and Condi
on of knowledg me
tional Planning. Self-Learning Topics: Repre
real world problems as planning senting
problems.

Introduction

* Understanding theoretical or practical aspects of a


subject is called as knowledge. We can gain knowl
through experience acquired based on the facts, edge
information, etc. about the subject.
After gaining knowledge about
some su bject we can apply that knowledge to derive conclusions about various
problems related to that subject based
0 Nn some reasoning.
* We have studied various types of agents in chapter
2. In this chapter we are going to see what is “know
based agent”, with a very interesting game example.
ledge
We are also going to study how do they
store knowledge, how do they infer next level
existing set. In turn, we are studying vario of knowledge from the
us knowledge rep resent
ation and inference methods in this chapter,
3.1 A Knowledge Based Agent

SLIME mesl
Q.
¢ As shown Fig. 3.1.1, a knowledge based
a gents can be described at differen
an Inference Engine. t levels : Knowledge Base (KB) and

Domain-independent algorith
ms

Domain-specific content

Fig. 3.1.1; Levels of Knowledge


Base
1. Knowledge level :
¢ Knowledge level is a base level of an
agent, which consists of domain
“Specific content,
¢ — In this level agent has facts/information
about the su
does not consider the actual implementation.
A ittaS

W Al and DS-I (MU)

plementation level: 3-2 Knowledge Representation using First Order Logic


2.
Implementation leve] consist
S of dom
data structures used in knowle dge “in Independent algorithms, At this level, agents can recognize the
and resolution. (We will be Jeg , base and algorithms which use them. For example, propositional logic
™ming about logic and
resolution in this chapter)
Knowledge based agents are
Crucial
action, knowledge based agents make mat partially observable environments. Before choosing any
environment in order to infer hidden .. of the existing knowledge along with the current inputs from the
Spects of the
cu
As we have learnt that knowledge arent state,
surrounding environment (rea base Is a set of representations of facts/information about the
l world
sentences are expresses ). Every single representation In the set is called as a sentence and
with the he]
statement which ts a set of
P of formal representation language. We can say that sentence is a
words tha
representation language, t express some truth about the real world with the help of knowledge

Declarative approach of building an


agent makes use of TELL and ASK mec
o TELL the agent, about SurTroundin
hanism.
8 environment (what it needs to know in order to perform some actio

tena
TELL mechanism is simi n).
lar to taki ng Input for
a system.

eee
o Then the agent can ASK itself What action should be carried out
to get desired output. ASK mechanism is

nena
similar to producing Output for a system. However
ASK mechanism makes use of the knowledge base
decide what it should do, to

re
TELL and ASK mechanism involve inference. When you run ASK function, the answer is generate
d with the
help of knowledge base, based on the knowledge which was added with TELL function previously.
o TELL(K): Isa function that adds knowledge
K to the knowledge base.
o ASK(K) : Is a function that queries the agent about

ey ee
the truth of K.

ee
An agent carries out following operations : First, it TELLs the knowledge
base about facts/information it
perceives with the help of sensors, Then, it ASKs the knowledge base what action should be carried
out
based on the input it has received. Lastly, it performs the selected action with the help of effectors.

3.1.1 Architecture of a KB Agent

FO
Knowledge based agents can be implemented at three levels namely, knowledge level, logical level and implementation
level.
1. Knowledge level 2. Logical level
3, Implementation level

1, Knowledge Level :
It is the most abstract level of agent implementation. The knowledge level describes agent by saying what it
knows. That is what knowledge the agent has as the initial knowledge.
Basic data structures and procedures to access that knowledge are defined in his level. Initial knowledge of
ledge.
knowledge base is called as background know
an agent for which one only need to specify what the agent
gents at : the the knowledge level can be viewed as
Agents
implemented.
kn hat its goals are in order to specify its behavior, regardless of how it is to be
co with the
exampl
Forvecnp e : A tax sire
tevs agent might know that the Golden Gate Bridge connects San Francis
marin county.
—____
UP Techtnowisdge
Publications
¥ Aland DS-1 (MU) 33 First 0 Order Log
tion using using First
Knowledge Representation
2. Logical Level :
ses some formal langua
* At the logical level, the knowledge is encoded into sentences. This level u have Buage ty
Fepresent the knowledg are propositiona] J,..
e the agent has, The two types of representations We
and first order or predicate logic, Tog

Both these representation techniques are discussed in detail in the further sectio ns.
For example : Links(Golden Gate Bridge, San Francisco, Marin County)
3; Implementation Level :
In implementation level, the physical representation of logical level sentences is done. 7 leve] also
describes data structures used in knowledge base and algorithms that
used for data manipulation.
| For example: Links(Golden Gate Bridge, SanFrancisco,
Marin County) ser sen pe ae RS SNES ES ERT
function KB - Agent (percept) returns an action
ed oor eae rlee ipa eMee
"static: KB,a knowledge base
t, a counter, initially o, indicating time
TELL (KB, MAKE -— PERCEPT-SENTENCE(percept, t))
action — Ask (KB, MAKE-ACTION-QUERY(t))
TRLL(KB, MAKE-ACTION-SENTENCE(action,t)) ' |

- > -returns action


Fig. 3.1.2 : General function of knowledge based agent
Fig. 3.1.2 is the general implementation of knowledge based agent. TELL and ASK are the sub procedures
implemented to perform the respective actions.
The knowledge base agent must be able to perform following tasks :
Represent states, actions, etc.
Incorporate new precepts.
0

Update internal representations of the world.


0

Deduce hidden properties of the world.


0

Deduce appropriate actions,


o

3.2. The WUMPUS World Environment


You have learnt about vacuum world problem, block world problem so far. Similarly we have WUMPUS
world
problem. Fig. 3.2.1 shows the WUMPUS world.
WUMPUS is an early computer game also known as “Hunt the Wumpus”. WUMPUS was developed by Gregory
Yob in 1972/1973. It was originally written in BASIC (Beginner's All-purpose Symbolic Instruction
Code).
e WUMPUS is a map-based game. Let's understand the game ;
o WUMPUS world is like a cave which represents number of rooms, rooms,
which are connected by passage
ways. We will take a 4 x 4 grid to understand the game.
WUMPUS is a monster who lives in one of the rooms of the cave, WUMPUS eats the player (agent) if player
(agent) comes in the same room. Fig. 3.2.1 shows that room (3, 1) where WUMPUS
is staying.
o Player (agent) starts from any random position in cave and has to explore the cave. We are
starting from
(1, 1) position.

ww Tech Knowledge
Publicacions
» STi

-1 (MU ic
aan — 3-4 Knowledge Representation using First order ee
Vv
re are various S spri sprites in th © game like pj as some feature
fea
rats understand this one-by-one ; pit stench, breeze, gold, and arrow. Every sprite h
> Few room have mowtomles pits which trap the player (agent) if he comes to that room. You can see in the
Fig. 3.2.1 tha ‘on (1,3), (3,3) and (4,4) have bottomless pit. Note that even WUMPUS can fall into a pit
stench experienced in a room which has a wu 3.2.1, here room
0 (2.1) (3,2) and (4,1) have Stench, MPUS in its neighborhood room . See the Fig. 3.4.4,

q frets’ is expertenced in a room which has a pit in its neighborhood room. Fig. 3.2.1 shows that room (1,2),
(1,4), (2,3), (3,2), (3,4) and (4,3) consists of Breeze,
> Player (Agent) has arrows and he can shoot these arrows in straight line to kill WUMPUS.
> one of the rooms consists of gold, this room glitters. Fig. 3.2.1 shows that room (3, 2) has
Gold.
Apart from above features player (agent)
. can acce pt two types of percepts which are: Bump and scream. Abump
is generated if player (agent) walks into a wall. While: a sad scream created everywhere in: the cave when the
WUMPUS is killed.

a ASenans Create
MA AARAARD yas

*.'3N

ws

|
|XStoze”
She Bree nche | Ex ud} | Breeze”
Gay)
MAK AABAA

“ yy
2 ASioncw, Breeze”
aN

RE ere
eeePT
yxy ‘ xa A >

1 Coe’ | Perry | Creze’


Start

+e eet
1 2 3 4
Fig. 3.2.1: The WUMPUS World

IG
32.1 Description of the WUMPUS World I

* An agent receives percepts while exploring the rooms of cave. Every percepts can be represented with the help
of five element list, which is [stench, breeze, glitter, bump, scream]. Note that player (agent) cannot perceive its
AA

own location.
eran

* Ifthe player (agent) gets percept as [Stench, Breeze, None, None, None]. Then it means that there is a stench and
abreeze, but no glitter, no bump, and no scream in the WUMPUS world at that position in the game.
a

* Let's take a look at the actions which can be performed by the player(agent) in WUMPUS World :
a

© Move: To move in forward direction,


© Turn: To turn right by 90 degrees or left by 90 degrees,
a

© Grab: To pick up gold if it is in the same room as the player(agent),


© Shoot : To Shoot an arrow ina straight line in the direction faced by the player (agent).
a

LL
SF

W TechKnowledye
Publications
W Aland DS-I (MU) 3-5 Knowledge Representation using First Order Logi,
————_ SSS SSS

* These actions are repeated till the player (agent) kills the WUMPUS or If the
WUMPUS
player Cogent) Os Hanes.
is killed then {t {s a winning condition, else if the player (agent) Is killed then it is a losing Ifthe
conditi on
and the game is over.
* — Game developer can keep a restriction on the number of arrows which can be used by the player (agent). So jf
we allow agent to have only one arrow, then only the first shoot action will have some effect. If this shoot action
kills the WUMPUS then you win the game, otherwise {t reduces the probability of winning the game.
° Lastly there is a die action : It takes places automatically If the agent enters In a room with a bottomless pit or {p
a room with WUMPUS, Die action {s Irreversible.
Goal of the game:

* Main aim of the game ts that player (agent) should grab the gold and return to starting room (here its (1,1)
without being killed by the monster (WUMPUS).
e Award and punishment points are assigned to a player (Agent) based on the actions it performs. Points can be
| given as follows:
© 100 points are awarded if player (agent) comes out of the cave with the gold.
© 1pointis taken away for every action taken.
© 10 points are taken away if the arrow is used.
© 200 points are taken away if the player (agent) gets killed.

3.2.2. PEAS Properties of WUMPUS World

Q. Give PEAS descriptors


for WUMPUS world.
1. Performance measure
® +100 for grabbing the gold and coming back to the starting position,
e -200 ifthe player (agent) is killed.
® -1peraction,
e -10 for using the arrow.
2. Environment
e Empty Rooms.
e Room with WUMPUS. h
e Rooms neighbouring to WUMPUS which are smelly.
e Rooms with bottomless pits
e Rooms neighbouring to bottomless pits which are breezy.
e Room with gold which is glittery.
e Arrow to shoot the WUMPUS.

3. Sensors (assuming a robotic agent)


e Camera to get the view
e Odour sensor to smell the stench
e Audio sensor to listen to the scream and bump

Tock Knowledge
Punticatioss
Ala rder Logic
3-6 Knowledge Representation using First 0
ertectorS (assuming a robotic agent)
‘ Motor to move left, right
Robot arm to grab the gold
Robot mechanism to shoot the arrow
” wuUMPUS world
agent has following
characteristics :

1, Fully observable 2. Deterministic 3. Eplsodic


4 Static 5. Discrete
6. Single agent
423 Exploring a WUMPUS World

Let's try to understand the WUMPUS world problem in step by step manner. Keep Fig. 3.2.2 as a reference figure.

4,1 4.2 43 44 [A] + Agent

B - Breeze
G - Glitter, Gold
3100/3200 «4/3 0 I 34 OK - Safe square
P- Pit

21 2.2 2.3 2.4 § - Stench


V - Visited

Ll 12 W- Wumpus
ox lox 13 14

Fig. 3.2.2(a) : WUMPUS world with player in room (1,1)

«The knowledge base initially contains only the rules (facts) of the WUMPUS world environment.
Stepi: Initially the player(agent) is in the room (1,1). See Fig. 3.2.2(a).
The first percept recelved by the player Is [none, none, none, none, none]. (remember percept consists of
{stench, breeze, glitter, bump, scream])
Player can move to room(1,2) or (2,1) as they are safe cells.
Step2: Let us move to room (1,2). See Fig. 3.2.2(b).

4.1 4.2 4.3 4.4 [a] - Agent

3.1 3.2 3.3 3.4 B- Breeze

G - Glitter, Gold
2.1 2.2 2.3 2.4
OK - Safe square
1.1 1,2 1.3 1.4
P-Pit
V P?
S ~ Stench
OK B V- Visited
OK W - Wumpus

room (1,2)
Fig. 3.2.2(b): - WUMPUS world with player in
ee

As room (1,1) is visited you can see “V" mark in that room. The player receives following percept: (none, breeze,
hone, none, none].
——________
Ww Tech Knowiodge
Pubticetiass
AT SR ESD AS RS aA ng >

Wy Aland Ds-| (Mu) Knowledge Representation using First Or der


3-7 Logi
°* As breeze Percept is received room (1,2) is marked with “R" and it can be predicted that a therei Sa bottomless
in the nej hbor
g i ng room.
«p7", $0 room (1,3) and (2,2) is not safe to Mo
ve
see that room (1,3) and room (2,2) is marked with .
melus can r d
playe shoul return to room (1,1) and try to find other, safe
h,

Step 3:

aA 42 4.3 a4

3.1 3.2 3.3 3.4

24 2.2 2.3 2.4

1.1 : 1.2 al 1.3 p? 1.4

Vv B
OK OK

Fig. 3.2.2(c) : WUMPUS world with player moving back to room (1,1) and then moves to other safe room (2,1),
As seen in Fig. 3.2.2(c). Player in now in room (2,1), where it receives a percept as follows : [stench, none, None,
none, none] which means that there is a WUMPUS in neighboring room (i.e. either room (2,2) or (3,1) has
WUMPUS).
As we did not get breeze percept in this room, we can understand that room (2,2) cannot have any pit and from
step 2 we can understand that room (2,2) cannot have WUMPUS because room (1,2) did not show stench
percept.
Thus room(2,2) is safe to move in.
Step4: Player receives [none, none, none, none, none] percept when it comes to room (2,2). From Fig. 3.2.2(d) you can
understand that room (2,3) and room (3,2) are safe to move in.
44 42 4.3 44

3.1 3.2 3.3 34

2.1
Vv
Eb
2.2 2.3 24

OK
1.1 1.2 1.3 py | 14
Vv B
OK OK
Fig. 3.2.2(d) : WUMPUS world with player moving to
room (2,2)
_ eee

Wy recitarein
pubiicatié
it Oita AR eS tee” —_-—
a wey Ae Sau AA,
g —?z£ i
ie

(MU)
rder Logic
¢ Knowledge Representation using First 0
Al and ps-l

3-6

teP >!

starting po sition,
without being killed by the WuMpus. of this game | s to grab the gold and g 0 back to the
|,
..00———

" 42 Tag 4A
P?
aa. Ta
P? P?

24 2D i = ri

1.1 1.2 13 p 14

V B

Fig. 3.2.2(e) : WUMPUS world with player moving to room


(3,2)
Now, we have to go back to the Starting position Le. room (1,1) without getting killed by WUMPUS. From steps 1, 2,3
and 4 We know that room (1,1), (1,2), (2,1) and (2,2) are safe rooms. so, we can go back to room (1,1) by following any
(1,2), (1,1).
of the two paths : i.e. (2,2), (2,1), (1,1) or (2,2),

step 6: As can be seen in Fig. 3.2.2(£). We will go from room (2,2) to room (2,1) and from room (2,1) to room (1,1).
Thus we won the WUMPUS World game!!!
44 42 43 TA
Ww?
P?
34 3.2 33 34
W? w?

P? P?
21 2.2 2.3 24
AK +—
Vv AI OK
OK V
vd 1.2 13 0 1.4

v 8
OK OKV
Fig. 3.2.2(f) : WUMPUS world with player moving back to room (1,1) with gold

33
—.
Logic
to perform a
* Logic can be called as reasoning which is carried out or itis a review based on strict rule of validity
Specified task.
In case of intelligent systems we say that any of logic's particular form cannot bind logical representation and
form of logic.
reasoning, they are independent of any particular
—__ UP Techinowtedge
Publications
id Al and DS-I (MU) 3-9 Knowledge Representatio n using g First O tder Logic
nt and when knowleg
Be is
Make a note that logic is beneficial only if the knowledge Is represented In small exté
represented in large quantity the logic {s not considered valuable.
it is shows that sentences need
* — Fig. 3.3.1 depicts that sentences are physical configurations of an agent 4 Iso
sentence. This means that reasoning Is a process of forming new physical con fig ura tions from old ones.
Need
Sentences. —a Sentence

Bc eeaneneae g5 _lespe Tawar


___. Representation
5
Realworld

Features of the Features of t he


“teal world Follows real world

Fig. 3.3.1 : Correspondence between real world and Its representation

° Logical reasoning should make sure that the new configurations represent features of the world that actually
follow the features of the world that the old configurations represented

3.3.1 Role of Reasoning in Al


e — Fig. 3.3.2 shows how logic can be seen as a knowledge representation language. There are various
levels to the
logic and most fundamental type of logic is propositional logic.
Modal -
Non-monotonic logicI
Multi-valued logic

Probabilistic logic

Fuzzy logic Propositional logic

Fig. 3.3.2 : Logic as Knowledge Representation language

¢ Propositional logic can be considered at fuzzy logic level, where rules are values between range of 0 and 1. Next
level is also called as probabilistic logic level using which first order predicate logic is implemented.
e In this Fig. 3.3.2 that there are two more levels above higher order logic which are multi-valued and non-
monotonic logic levels and they consist of modal logic and temporal logic respectively. All these types of logic are
basic building blocks of intelligent systems and they all use reasoning in order to represent sentences. Hence
reasoning plays a very important role in Al.

3.4 Representation of Knowledge using Rules

Q. se: ‘Explain various method of knowledge representation technique:


Q. Explain different knowledge representation methods with example. |
e Knowledge can be considered to be represented at generally two levels ;
(i) Knowledge level : This level describes the facts.
(i!) Symbol level : This level deals with using the symbols for representing the objects, which can be manipulated In
programs,

Ww TechKnowteds?
puprications
Ww Al and DS-I (MU) 3-10 Knowledge Representation using First Order Logic

_ Knowledge can be represented using the following rules


:
(a) Logical representations
(b) Production rule representations,
(c) Semantic networks
Frame representations

(a) Logical representation


e The logical representations are mostly concerned with truth of statements regarding the world. These
statements are most generally represented using statements like TRUE or FALSE.
e Logic is successfully used to define ways to infer new sentences from the existing ones. There are certain logics
that are used for the representation of information, and range in terms of their expressiveness. There are logic
that are more expressive and are more useful in translation of sentences from natural languages into the logical
ones. There are several logics that are widely used :
1, Propositional logic : These are restricted kinds that make use of propositions (sentences that are elther true or
false but not both) which can be elther true or false. Proposition logic is also known as propositional calculus,
sentential calculus or boolean algebra.
All propositions are either true or false, For example :.
(i) Leaves are green (li) Violets are blue.

Sky Is blue true yes

Roses are red | true yes

2+2=5 false yes

2. First Order Predicate Logic : These are much more expressive and make use of variables, constants, predicates,
functions and quantifiers along with the connectives explained already In previous section.
3. Higher Order Predicate Logic : Higher order predicate logic is distinguished from first order predicate logic by
using additional quantifiers and stronger semantics.
4. Fuzzy Logic: These indicate the existence of In between TRUE and FALSE or fuzziness in all logics.
Other Logic : These include multiple valued logic, modal logics and temporal logics.

(b) Production Rule Representation


One of the widest used methods to represent knowledge Is to use production rules, it ls also known as IF-THEN rules.

peHaENrsproopnosit
«T
* Example:
me Is small.
o IF pressure is high, THEN volu
Is dangerous.
© IF the road is slippery, THEN driving
Tech
Publications
WF Aland ps. (MU) 3-14 Knowledge Rep resentation using First Order Logic
oC
Some of the benefits of IF-THEN rules are that they are modular, each defining a relatively small and, at least in
principle, independent piece of knowledge. New rules may be added and old ones deleted usually independently
of other rules,
Production rules are simple but powerful forms of representing knowledge, they P rovide flexibility for
combining procedural and declarative representations In a unified manner. The major advantage aD
rules are that they are modular, independent of other rules with the provision for addition ne S and
deleting older ones,
(c) Semantic networks

These represent knowledge in the form of graphical networks, since graphs are easy to be stored inside programs as
they are concisely represented by nodes and edges,
A semantic network basically comprises of nodesthat are named and represent concepts, and
labelled links representing relations between concepts. Nodes represent both types and tokens.
For example, the semantic network in Fig. 3.4.1 expresses the knowledge to represent the following data:

Tom is a cat.
00006000

Tom caught a fish.


Tom is grey in color.
Tom is owned by Sam.
Tom is a Mammal.
Fish is an Animal.
Cats love Milk.
0

All mammals are animals.


©

Fig. 3.4.1
Conceptual Graph : It is a recent scheme used for semantic network, introduced by John Sowa, has a finite,
connected, bipartite graph. The nodes represent either concepts or conceptual relations. It differs from the
previous method that it does not use labelled arcs. For example : Ram, Laxman and Bharat are Brothers or cat
color is grey can be represented as shown.

Fig. 3.4.2
) Frame Representation
This concept was introduced by Marvin Minsky in 1975. They are mostly used when the task becomes quite
complex and needs more structured representation. More structured the system becomes more would be
the requirement of using frames which would prove beneficial.
Generally frames are record like structures that consists of a collection of slots or attributes and the
corresponding slot values.

WV Techinowiedge
Publications
yo
gy Aland wm) 3-12 Logic
» _ Slots can be of any size ang type Knowledge Represen tation using First Order
The slots
have names or numbers too, A simple frame ve names and values (subfields) called as facets. Facets can
8. 3.4.2 for a person Ram,
o (PROFESSION (VAL
UE Professor)
o (AGE(VALUE 50))
(WIFE(VALUE sita))
90

(CHILDREN(VALUE Juy ku
sh))
90

(ADDRESS (STREET(VALUE
4¢ Bb road)))
O09

CITY(VALUE banaras))
89

(STATE(VALUE mh))
Oo

(ZIP(VALUE400615))
o

3,5 Propositional Logic (PL)


propositional Logic (PL)
is simple but powerful for
simple mathematical logic in which Some artificial intelligence problems. You have learnt
strict sub-formulas), Atomic logic fo uses atomic formulas are used. (Atomic formula is a formula that has no
Tmulas are called propositions.
In case Of artificial intelligence Propositional
ivity of truth val . logic is not categorized as the study of truth values, but it is based
on relativity of truthvalues. (i.e, The relationship between the truth value of one statement to that of the
value of other statement) truth

35.1 Syntax

Basic syntax followed by the propositional logic can be given as follows


:
Propositional symbols are denoted with capital letters like : A,B,C, etc.
Propositional logic constants have a truth value genera
lly truth values have a crisp nature (ie. 0 (false) and
(true)). But for fuzzy logic truth values can vary i n the 1
range of 0 and 1.
Propositional logic make use of wrapping parenthe
sis while writing atomic sentence. It is denoted as "(«-)

re
'.
Literal is an atomic sentence or it can be negation of atom
ic sentence, (A, 7A)
IfA is asentence, then 7A is a sentence.
Propositional logic makes use of relationships between propos
itions and it is denoted by connectives, if A and B

SER
are propositions. Connectives used in proposition logic can be seen in
the Table 3.5.1.
Table 3.5.1: Connectives used in Propositional logic ES RE

@aective | Nameofthe |. Relationshipb etween | Name of the Relationship between


————————

|Symbol | Connective symbol | Propositional symbols _ _Propositional symbols.


pA And A A B C
onjunction
Vv Or AvB
——__ - Disjunction

Rn Not 7A Negation
— > Implies A=>B Implicati
plication / conditional
Pn is equivalent/ if and ASB Biconditional
~~ only if

W TechKaowladga
Publications
W Al and Ds-] (MU) 3-13 Knowledge Representation using First Order Logic

connectives.
* To define logical connectives truth tables are used. Truth table 3.5.2 shows five logical
Table 3.5.2 re

uA Bie RTAAB IS GeAUH ee A Ga ASB eee


False false false False true true tue
False true false True true true false

false false True false false false


True

True true true True false true true

* Take an example, where A B, i.e. Find the value of A B where A is true and B |s false. Third row ofthe Table
3.5.2 shows this condition, now see third row of the third column where, A A B shows result as false. Similarly
other logical connectives can be mapped in the truth table.

3.5.2 Semantics

° World is set of facts which we want to represent to form propositional logic. In order to represent these facts
propositional symbols can be used where each propositional symbol's interpretation can be mapped to the real
world feature.
* — Semantics of a sentence is meaning of a sentence. Semantics determine the interpretation of a sentence. For
example : You can define semantics of each propositional symbol in following manner :
1. Ameans “It is hot”
2. Bmeans “It is humid”, etc.
* — Sentence is considered true when its interpretation in the real world is true. Every sentence results from a finite
number of usages of the rules. For example, if A and B are sentences then (A 4 B), (Av B), (B—> A) and(A_B)
are sentences. The knowledge base is a set of sentences as we have seen in previous section.
e Thus we can say that real world is a model of the knowledge base when the knowledge base is true for that
world. In other words a model can be thought of as a truth assignment to the symbols.
¢ If truth values of all symbols in a sentence are given then it can be evaluated for determining its truth value (ie.
we can Say if it is true or false).
3.5.3 What is Propositional Logic ?
e A«aBand BAA should have same meaning but in natural language words and sentences may have different
meanings. Say for an example,
1. Radha started feeling feverish and Radha went to the doctor.
2. Radha went to the doctor and Radha started feeling feverish.
e Here, sentence 1 and sentence 2 have different meanings.
e In artificial intelligence propositional logic ts a relationship between the truth value of one statement to that of
the truth value of other statement.
3.5.4 PL Sentence - Example
Take example of a weather problem,
e Semantics of each propositional symbol can be defined as follows:
o Symbol Aisa sentence which means “It is hot”.

WF Pubticaciens
iectovsietst
Al and DS-1 (MU)

> symbol Bisa sentence which Means “It | — Knowledge Representation using First Order Logic
symbol C is a sentence
which Means “It
js rainin
| Wecanalso choose symbols which are easy to
5 HT for "It is hot”, unde rstand, like:
o HM for “It is humid”,
o RN for “It is raining”.

ifyou have HM - HT then that Means "If it js


humid, th en {tis hot”,
Ifyou have (HT A HM) > RN then it Means
Ifitis hot and humi
First we have to create the possible d, then itis rain
models fora know ing” and so on.
assignments of true or false valu ledge base. To
es for Sent €nce A, do this we need to consider all the possible
B and C.
pe total 8 possibilities as shown below:
Then verity the truth table for the validity, There can

False False False Valid


False False True Valid
False True False Not Valid
False True True Not Valid
True ’ False False Valid
True False True Valid

————
True True False Not Valid
True True True Valid
* Now, if the knowledge base is [HM, HM HT, (HT A HM )— RN][(Le. [“It is humid”, “If itis
humid, then itis hot”,
‘If itis hot and humid, then it is raining”] ), then “True -True
- True” is the only possible valid model.
Tautology and Contradiction
Tautology means valid sentence. It is a sentence which is true for all the
interpretations. For example :
(A wA) (“A or not A”): “It is hot or It is not hot”
Contradiction means an inconsistent sentence. It is a sentence which is false for al ] the
interpretations.
For example : A A 7A ("A and not A’) : “It is hot and it is not hot.”
X entails Y, is shown as X |= Y. It means that whenever sentence X is True, sentence Y will
be True,
For Example : if, X= Priya is Pooja's Mother's Sister and Y = Priya is
Pooja's Aunty. Then X |= Y (X entails Y).
35.5 Inference Rules

New sentences are formed with the logical inference. For example : If A = B and B = C then A = C. You must have
Come across this example many times it implies that if knowledge base has
“A = B” and “B = C” then we can infer
that “A = ¢”,

hh
W Tech Knowledge
Publications
EE ERT BS “~y

345 Knowledge Representation using First Order Logie


W Al and DS-1 (MU)

* In short inference rule says that new sentence can be create by logically following the set of sentences of
knowledge base.
Table 3.5.3 : Inference Rules

“Inference Rules |. Premise (KB) Git | eee Conclasion


Modus Ponens X, XY Y

XOZBYOTZ X=¥
Substitution
X 3 Y, YZ XZ
Chain rule

AND introduction X,Y X*Y

XY ~X3~Y
Transposition
* — Entailment is represented as : KB |= Q and Derivation is represented as : KB |- Q.
° There are two types of inference rules:
1. Sound inference

Complete inference

1. Sound inference
e Soundness property of inference says that, if “X is derived from the knowledge base” using given set of
protocols of inference, then “X is entailed by knowledge base”. Soundness property can be represented as:
“If KB |- X then KB |= X”.
e For Modus Ponens (MP) rule we assume that knowledge base has [A, A — B], from this we can conclude
that knowledge base can have B. See following truth table :
ay

TRUE TRUE TRUE Yes

TRUE FALSE FALSE Yes

FALSE TRUE TRUE Yes

FALSE FALSE TRUE Yes

In general,

e Foratomic sentences p,, p;’, and q, where there is a substitution 0 such that

: P, P, ' PyPo» (P,A P2A PaAv oem P, => Q)


SUBST (@,p)) = SUBST (®,p\') for alli; SUBST (©, a) *

N +1 premises = N atomic sentences + one implication.

Example :
A :Itis rainy.
B_ :] will stay at home.
A—B: If it is rainy, I will stay at home.
¥ Aland DS-] (MU)
—_— 3-16 Knowledge Representation using First Order Logic
Modus Tollens ————————————
when B is known to be false, and if there
is a rule “IFA, then B,” it is valid to conclude
that A Is also false.
2, Complete inference
° seComplete
snag inferen
7 so on of soundness, Completeness property of inference says that, if “X is entailed
| can be derived from the knowledge base” using the inference
protocols.
e Completeness property can be represented as : "If KB |= Q then KB |- Q”
3.5.6 Horn Clause

e nn
Seer written as sets of literals. Horn clause is also called as horn sentence. In a horn clause a
conju or more symbols is to the left of “-y" and 0 or 1 symbols
to the right. See following formula :
AyAA2AA3 « AA;—B,, Where n >=
0 and m is in range{0,1}
|
e There can be following special cases in horn clause in
the above mentioned formula :
o Forn=0andm=1:A (This condition shows
that assert A is true)
o Forn>0Oand m=0:AAB- (This constraint shows that both
A and B cannot be true)
o Forn=0and m=0: (This condition shows empty
clause)
¢ Conjunctive normal form is a conjunction of clauses and by its set of clauses it is determined up to equivalence.
For a horn clause conjunctive normal form can be used where, each sentence is a disjunction of literals with at
most one non-negative literal as shown in the following formula: —A,v7A,VA; ... VnA,VB
¢ This can also be represented as : (A B)= (—A vB)
Significance of horn logic
¢ Horn sentences can be used in first order logic. Reasoning processes is simpler with horn clauses. Satisfiability of
a propositional knowledge base is NP complete. (Satisfiability means the process of finding values for symbols
which will make it true).
¢ For restricting knowledge base to horn sentences, satisfiability is in A. Due to this reason, first order logic horn
sentences are the basis for prolog and datalog languages.
¢ Let's take one example which gives entailment for horn formulas.
¢ Find out if following horn formula is satisfiable?
(true X) a (X AY Z) A (Z! W) a (ZA W- false)a (true >Y)
* From the above equation, we entail if the query atom is false. Equation shows that there are clauses which state
that true >X and true >Y, so we can assign X and Y to true value (i.e. true >X AY).
* Then we can say that all premises of X , Y->Z are true, based on this information we can assign Z to true. After
that we can see all premises of ZW are true, so we can assign W to true.
* As now all premises of Z A W — false are true, from this we can entail that the query atom is false. Therefore, the
horn formula is not satisfiable.

3.5.7 Propositional Theorem Proving


* — Sequence of sentences form a “Proof”. A sentence can be premise or it can be a sentence derived from earlier
as a query or a goal.
sentences in the proof based on the inference rule. Whatever we want to prove is called
in the proof.
Query/goal is the last sentence of the theorem
* Take Example of the “weather problem” which we have seen above.
—____ Ww TechKnowladga
Publications
*
tO
WF Alandps-1 (MU) 3-17 Knowledge Representation us'g Tes omen Loeie
| © HT for “Itis hot’,
! oO HM for “It is humid”,
| © RN for “Itis raining”. a

i 1, | HM Premise (initial sentence) “It's humid” —


2. | HM>HT Premise(initial sentence) “If it’s humid, it’s hot” ___ |

| 3. | HT Modus ponens(1,2) (sentence derived from 1 and 2) | “It's hot” |

4. | (HTAHM)ORN | Premise(initial sentence) “If it’s hot and humid, it's raining” _
5. | HTAHM And introduction(1,3) “It’s hot and humid” _|

6. | RN Modus ponens(4,5)(sentence derived from 4and5) | “It’s raining”

3.5.8 Advantages of Propositional Logic


Propositional logic is a simple knowledge representation language.
® — Itis sufficient and efficient technique for solving some artificial intelligence based problems.

¢ — Propositional logic forms the foundation for higher logics like First Order Logic (FOL), etc.
* — Propositional logic is NP complete and reasoning is decidable.
¢ The process of inference can be illustrated by PL.

3.5.9 Disadvantages of Propositional Logic


¢ — Propositional logic is cannot express complex artificial intelligence problems.
¢ Propositional logic can be impractical for even small worlds, think about WUMPUS hunter problem.
e Even if we try to make use of propositional logic to express complex artificial intelligence problems, it can be
very wordy and lengthy.
e PLis a weak knowledge representation language because :
o With PL it is hard to identify if the used entity is “individual”. For example : If there are entities like : Priya,
Mumbai, 123, etc.
o PLcannot directly represent properties of individual entities or relations between individual entities. For
example, Pooja is tall.
o PL cannot express specialization, generalizations, or patterns, etc. For example : All rectangles have 4
sides.

3.6 First Order Predicate Logic

TRIACS 8 NS gs
Q. Short note on predicate logic.
* — Because of the inadequacy of PL discussed above there was a need for more expressive type of logic. Thus First-
Order Logic (FOL) was developed. FOL is more expressive than PL, it can represent information using relations,
variables and quantifiers, e.g., which was not possible with propositional logic.
o “Gorilla is Black” can be represented as :
Gorilla(x) Black(x)
—s
WEF Techinewledst
Publicaciess
WF Aland DS-1 (MU)
Logic
3-18 Knowledge Representation using First Order
° "It is Sunday y today””
today can be represented as; —

today(Sunday)
First Order Logic(FOL)is
also called as First Ord

Soe
er Predi cate Logic (FOPL). Since FOPL is much more
expressive as a knowledge representation langu

Sweet
age than PL it j s more commonly used in artificial intelligence.
3.6.1 Syntactic Elements, Semant
ic and Syntax
« _ Assuming that “x” is a domain

gti
of values, we can define a term with following rules :
1. Constant term : It is a term with fixed value
which belongs to the domain.
2, Variable term: It is a term which can be assigned
values in the domain.
3. Function : Say “f" is a function of “n” arguments. If we assume that t;, tz, ..t, are
terms then f(ty tz, » ta) is also
called as a term.

¢ Allthe terms are generated by applying the above three


protocols.
« First order predicate logic makes use of Propositional logic as a base logic, so the connectives used in PL and
FOPL are common. Hence, it also supports A conjunction, v disjunction, negation, => implication and <> double
implication.
e Ground Term: If aterm does not have any variables it is called as a ground term. A sentence in which all the

oD be abs ed en
variables are quantified is called as a “well-formed formula’.
o Every ground term is mapped with an object.

Be
o Every condition (predicate) is mapped to a relation.

ONS ee Le
o Aground atom is considered as true if the predicate’s relation holds between the terms’ objects.
° Rules in FOL : In predicate logic rule has two parts predecessor and successor. If the predecessor is
evaluated to TRUE successor will be true. It uses the implication > symbol. Rule represents If-then types of
sentences.
e Example : The sentence “If the bag is of blue colour, | will buy it.” Will be represented as colour (bag, blue)>
buy(bag)
Quantifiers
Apart from these connectives FOPL makes use of quantifiers. As the name suggests they quantify the number of
variables taking part in the relation or obeying the rule.
1. Universal Quantifier ‘ V’
® Pronounced as “for all” and it is applicable to all the variables in the predicate
¢ “vx A” means Ais true for every replacement of x.
¢ Example: “Every Gorilla is Black” can be represented as:
“Yx (Gorilla(x) Black(x))

2. Existential Quantifier ‘3’


* Pronounced as “there exists”
° “3x A” means Ais true for at least one replacement of x.
* Example: “There is a white dog” can be represented as,
3 x (Dog(X) * white(X))

TechKnewladge
Publications
y
Representation using First Order L
WF Aland DS-I (MU) 3-19 Knowledge Bie

Note:

1. Typically, => is the main connective with V )


Example : "Everyone at MU fs smart” is represented as
vx At(x, MU) > smart(x)
2. Typically, is the main connective with 3
Example: Someone killed the cat and is guilty.
3x killed(x, cat) A guilty(x)
rm, and termz refer to the
Equality : term: = term: is true under a given interpretation If and only if term: Same
*
object
° Example: Richard has at least two brothers
3x, Sy Brother(x, Richard) Brother(y, Richard) — (x=)
3.7 Comparison
°
between Propositional
e,e
Logice and First
7 Order Log ic a

Sr. Propositional logic (PL) ie Pi on Predicate logic (FOE a4 eek

1. | PLcan not represent small worlds like vacuum | FOL can very well represent small worlds’ problems.
!

cleaner world.

2. | PL is a weak knowledge representation | FOL is a strong way of representing language.


language

3. | Propositional Language uses propositions in | FOL uses predicated which involve constants, variables,
which the complete sentence is denoted by a | functions, relations.
symbol.

4. | PL cannot directly represent properties of | FOL can directly represent properties of individual entities or
individual entities or relations between | relations between individual entities using individual predicates
individual entities. e.g. Meera is short. using functions. E.g. Short(Meera)

5. | PL cannot express specialization, | FOL can express specialization, generalizations, or patterns, etc.
generalizations, or patterns, etc. | Using relations. E.g. no_of_sides(rectangle, 4)
e.g. All rectangles have 4 sides.

6. | PLis a foundation level logic. FOL is a higher level logic.

7. | PL is not sufficiently expressive to represent | FOL can represent complex statements.


complex statements.

8. | PLassumes the world contains facts FOL assumes the world contains objects, relations, functions like
natural language.

9, | In PL Meaning of the facts is context- | In FOL Meaning of the sentences is context dependent like
independent unlike natural language. natural language.
———— |

10. | PLis declarative in nature. FOL is derivative in nature.


oat

Wate Knowledge
publicat
W Aland DS-I (MU)
3-20 Knowledge Representation using First Order Logic
—————

3,8 Inference in FOL

3.8.1 Forward Chaining

@._ Explain forward-chaining algorithm with the help of example | at see


° For an, y typ e of infer ence there should be a path from start
to goal. When based on the available data a decision
is taken, then the proc
Process is called as the forward chaining. Forward chaining or data-driven inference works
m an initial stat
io ®, and by looking at the premises of the rules (IF-part), perform the actions (THEN-part),
possibly updating the knowledge base or workin i :

al
some cycle limit is met g me Memory. , This continues untili no more rules can be applied or

TA
e For exaniple, "If it is raining then, we will take umbrella”. Here, “it is
raining” is the data and “we will take umbrella” is a decision. This means it
was already known that it’s raining that’s why it was
decided to take

Sa
umbrella. This process is forward chaining.

; Fig. 3.8.1 : Forward Chaining


e “Forward chaining’ is called as a data-driven inference technique.
e Example

Baer
Given :
o Rule: human(A) > mortal(A)
o Data: human(Mandela)
e To prove: mortal(Mandela)
Forward Chaining Solution
o Human (Mandela) matches Left Hand Side of the Rule. So, we can get A = Mandela
o based on the rule statement we can get : mortal(Mandela)
¢ Forward chaining is used by the “design expert systems”, as it performs operation in a forward direction (i.e.
from start to the end).
Example
* — Consider following example. Let us understand how the same example can be solved using both forward.
* Given facts are as follows:
1, It isacrime for an American to sell weapons to the enemy of America.
2. Country Nono {s an enemy of America.
3. Nono has some missiles.
West.
4. All the missiles were sold to Nono by Colonel
5. Missile is a weapon.
6. Colonel West is American.
* We have to prove that West isa criminal.
* Let's see how to represent these facts by FOL.
to the enemy nations,
1. [tis acrime for an American to sell weapons
America) => Criminal (x)
American(x) A Weapon(y) A sell (x, y, 2) A enemy(z,
—____ Ww TechKnowledge
Publications
ES Ne
(SA
_

ing First Orde t Log;


ledge Representation us
WF Aland Ds-1 (MU) 3-21 — —
| 2. Country Nono Is an enemy of America.
' | Enemy (Nono, America)

| 3. Nono has some missiles.


© Owns (Nono, x)
© Missile(x)
| 4, All the missiles were sold to Nono by Colonel West.

Missile(x)A owns(Nono, x)=> Sell(West, x, Nono)


5. Missile ls a weapon.

i Missile(x)=> weapon(x)
6. Colonel West is American.
American (West)

Proof by forward chaining


machthose,
Sa afacts from
An as we can derive 0 ther predicat
ts thait will lead to nthe
' e usCi
e The proof will start from the give
i
n facts. _ And
solution. Please refer to Fig. 3.8.2 As we observe from the given fa
(West).

Fig. 3.8.2 : Proof by forward chaining

3.8.2 Backward Chaining

University Question

Q.. . Describe backward chaining algorithm with an example. ara ay 14, Dec. 19

e If based on the decision the initial data is fetched, then it is called as backward chalning. Backward chaining or
goal-driven inference works towards a final state, and by looking at the working memory to see if goal already
there. If not look at the actions (THEN-parts) of rules that will establish goal, and set up sub-goals for achieving
premises of the rules (IF-part). This continues until some rule can be applied, apply to achieve goal state.
e For example, If while going out one has taken umbrella. Then based on this
decision it can be guessed that it is raining. Here, “taking umbrella” is a decision
based on which the data is generated that “it's raining”. This process is
backward chaining. “Backward chaining” is called as a decision-driven or goal-
driven inference technique.

Fig. 3.8.3 : Backward Chaining

wisn
een’

puoticall
Aland DS-I (MU)
g
Knowledge Representation using First Or er Logic
3-22
= Given :
o Rule: human(A) 4 mortal(A)
o Data: human(Mandela)
» Toprove : mortal (Mandela)
packward Chaining Solution
« _ mortal (Mandela) will be matched with
la) which is also a
mortal (A) which gives human (A) ie. human (Mandela)
given fact. Hence proved,
e Itmakes use of right hand side matchin
8. backward chaini tic expert systems”, because
it performs operations in a backward di rection (i.e, ning Is used by the “diagnostic exp
from end to start).
Example

Let us understand how the same exa


mple used in forwar. d chaining can be soived using backward chaining.

Sate
Proof by backward Chaining

di
The proof will start from the fact to be proved. And as we
can map it with given facts, it will lead us to the solution.

em
Please refer to Fig. 3.8.4. As we observe, all leaf nodes of the proof are given facts that means “West is Criminal”
.

SS 1
<Criminal (west) .

aD
Fb
(8) | [Sell (west, x,2)| [Enemy(Nono, Amarioay]
True UN True
Missile (0| [Missile
()] [fous (Nona) x)
L
True
|
True True
Fig. 3.8.4 : Proof by backward Chaining

Ex. 3.8.1: Using predicate logic find the course of Anish’s liking for the following :

(i) Anish only likes easy courses. (ii) | Computer courses are hard.
(iii) All electronics courses are easy (iv) DSP is an electronics course.
Soln. :

Step 1: Converting given facts to FOL


(i) Vx: course (x) A easy (x) = likes (Anish, x)
(i) Vx: course (x) A computes (x) > hard (x)
(iil) Vx: course (x) A electronics (x) > easy (x)

(iv) Electronics (DSP)


(v) course (DSP)
Step 2: Proof by backward chaining:
As we have to find out which course Anish !ikes. So we will start the proof from the same fact.

Ww TithKuowledga
Publications
yr —— sere i cS FMEA RURIE BET Ak SEB A CAN ee

3-23 Knowledge Representation using First OrderLogic


W Al and DS-1 (MU)

likag (Anish, x)
|

course (x) easy (x)


|
x / DSP
|
course (x) electronics (x)
course (DSP)
| | x/ DSP x / DSP
|
|
courses (DSP) electronica (DSP)
True

| |
True True

Fig. P. 3.8.1

Hence proved that Anish likes DSP course.

3.9 Difference Between Forward Chaining and Backward Chaining

| Fy Attribute Roca, | aes Backward Chalningi@ace<<| vise: Forward Chaining


Also known as Goal-driven Data-driven

Starts from Possible conclusion New data

Processing Efficient Somewhat wasteful

Aims for Necessary data Any Conclusion(s)

Approach Conservative/Cautious Opportunistic

Practical if Number of possible final answers is | Combinatorial explosion creates an


reasonable or a set of known | infinite number of possible right
alternatives is available. answers.

Appropriate for Diagnostic, prescription and Planning, monitoring, control and


debugging application interpretation application

Reasoning Top-down reasoning Bottom-up reasoning

Type of Search Depth-first search Breadth-first search

Who determine search Consequents determine search Antecedents determine search

Flow Consequent to antecedent Antecedent to consequent

3.10 Unification and Lifting

3.10.1 Unification

The processes of finding legal substitutions that make different logical expressions look identical. The unification
algorithm is a recursive algorithm; the problem of unification Is : given two atoms, to find if they unify, and, if they do,
return an MGU (Most General Unifler) of them.

WF Techiinewledye
Publicatioas
>

ys Aland DS-I a
3-24 Knowlede ne aie First ries =

ont katte
pee Inputs Me pte o
ta -tytat atoms Output
a

pee
~ > Most general unifier of ty and ty if it exists or 7 otherwise 4

Pep ren
Local i
a
E: a set of equality statement
s

arenes She ter ens Sa


~ §: substitution
pany . Bette)

of select and remove x=y from E

SERPS TPL ESI


fy is not identical to x) then
if (xis a variable) then

ha a3
replace x with y everywhere inin Ean = |
dS E
| SH} US Be
else if vy isa variable) then
Re teplace y with xeverywhere i in E and Si

aaj I Csis re) andy is oes then


setlin

e Unification algorithm for


f Datalog
Example: “x King(x) A Brave(x) => Noble(x)

King(Ram)
Brave(Ram)

¢ We get an ‘x’ where, ‘x’ is a king and ‘x’ is brave (Then x is noble) then ideally what we want is 9= {substitution
set}
ie. O={x/ Ram}
Hence, Ram Unifies x.

3.10.2 Lifting

® — The process of encapsulating inference rule Is called as Generalized Modus Ponens.


Generalized Modus Ponens

* For atomic sentences p, pr, and q, where there is a substitution 6 such that

foralll,
SUBST (@,pi) = SUBST (®,p')
D,sPy» Pye=P,y (PyA yA Pus » Py => 4)
SUBST (©, q)

TechKnowledya
Publications
ne ARS AU a

W Al and DS-1 (MU) 3-25 Knowledge Representation using First Order Logic
®
N +1 premises = N atomic sentences + one implication.
Applying SUBST(0, q) produces the conclusion we seek.
P,= King(Ram) p= Brave(y)
P,= King(x) P, = Brave(x)
| 8 = {x / Ram, y/ Ram) q = Noble(x)
SUBST(6,q) is Noble(Ram)

* — Generalized Modus Ponens is a lifted version of Modus Ponens. It raises Modus Ponens from ground (variable.
free) propositional logic to first-order logic. Hence it is called as !fting.
° Here “lifted” indicates transformed from.
¢ The major advantage of lifted inference rules over proposit
ional logic is that only those substituttons'are made
that are required so as particular inferences are allowed to proc
eed. ——.
Ex. 3.10.1 ~ Represent following sentences in FOL using a consistence vocabula
ry. Seon ah Sear
(i). Every person who buys a policy is smart. : oe
(ii) No person buys an expensive policy. j
(ii). There is an agent who sells Policles only to people who are not
insured.
{ivy There iisa barber who shaves all men in town who do not save themselves. sy
Soin. :

(i) Wx Vy: person (x) a policy (y) a buys (x, y) > smart (x)

(ii) V x, Vy: person (x) A policy (y) A expensive (y) > ~ buys (x, y)

(iii) Vx : person (x) A ~ insured (x)

Vy : 3x a policy (y) ~ agent (x) > sells (x, y, x) ‘


(iv) dx Vy: barber @) A peicemn (yAA ~ shaves v, ie — shaves (x, y)

Ex.3102: FR tepresent oF

DEE
Soin.:

(i) Yx Vy: kills (x, animal)


- ~ loves (y, x)

(ii) Vx Wy : pit (% y) — breez (x y - 1) A breez (x, y + 1) A breez (x - 1, y) A breez


(x+1,y)

Ex. 3.10.3: Witte first order logic statements for following statement:
(i) - Ifa perfect square Is divisible by a prime p ‘then I Isalso divisible by square of a
(ii) Every perfect square Isdivisible by some prime,
(iil) Alice does not like chemistry and history.
(iv) If it is Saturday and warm, then sam is In the park, =
“(v)-. Anything anyone eats andisnotkilledbyisfood, a

Te TechKaowlodge
Publicacions
ww Aland DS-I (MU)
3-26 using First Order Logic
Knowledge RepresentationS S
soln. :

(i) wx: square(x) A prime (y) divides (p, x) > [3z: : square_ooft,
ff p) Adivides(z, x)]
vx Sy: squar e(x) A divides (p, x)
(ii)
(iil) ~ likes(Alice, History) 4 ~ likes (Alice,
Chemistry)
(iv) day(Saturday) 4 weather(warm) > {n park(Sam)
vx: Vy: person(x) a eat
) s(x, y) A~killed (x) >
food(y)
3.11
—_
Resolution

University. Question

Qa. Write a short note on : Resolution. ieee


anal . a ns inference rule. Resolution produces a new clause which is implied by two clauses containing
complementary literals. This resolution rule was discovered by Alan Robinson in the mid 1960's.

We have seen thata literal is an atomic symbol or a negation of the atomic symbol (i.e. A, A).
Resolution se only interference rule you need, in order to build a sound (soundness means that every
sentence produced by a procedure will be “true”) and complete (completeness means every “true” sentence can
be produced by a procedure) theorem proof maker.
Take an example where we are given that:
o Aclause X containing the literal : Z
o Aclause Y containing the literal : + Z
Based on resolution and the information given above we can conclude :
(X - {Z}) U (Y - (42})
Take a generalized version of the above problem :
Given :
o Aclause X containing the literal : Z
o Aclause Y containing the literal : -Y
o Amost general unifier G of Zand -Y

We can conclude that : ((X - {Z}) U (¥ - {-¥})) | G

3.11.1 The Resolution Procedure

Let knowledge base be a set of true sentences which do not have any contradictions, and Z be a sentence that we
want to prove.
The Idea is based on the proof by negation. So, we should assume ~Z and then try to find a contradiction (You
must have followed such methods while solving geometry proofs). Then based on the Intuition that, if all the
and assuming 4Z creates a contradiction then Z must be inferred from
knowledge base sentences are true,
knowledge base. Then we need to convert knowledge base U {7Z} to clause form.
Z is proved. Terminate the process after that.
If there is a contradiction in knowledge base, that means
knowledge base. If we do not find any
Otherwise select two clauses and add their resolvents to the current
terminate. Else, we have to start finding if there is a
resolvable clauses then the procedure fails and then we
contradiction in knowledge base, and so on.

—.
I se TechKnewledge
Publicatians
i st 0 rder Logic
usii ng Fir
ReprTve sentation
Knowledge
* 7
3-2

3.11.2
, : Conversion
- from FOL Clausal Normal Form (CNF)

University Question Poise! pmitene ees. a voc seb on


Nt oP "ONE with a suitable example.
Q. _~ Explain the steps Involved In converting the propositional logic statement Into Ne mente

1. Elimination of implication Le. Eliminate all ‘9’: Replace P— Q with 7PVQ

2. Distribute negations ; Replace 7 +P with P, 7(PV Q) with +P A ~Q and so on.


:
3. Eliminate existential quantifiers by replacing with Skolem constants or Skolem functions

eg. V XAY(PA(KY) v (Po(KY)) =VX (P1(K, F009) v (P2l%, (00)


4 Rename variables to avoid duplicate quantifiers.
5 Drop all universal quantifiers

6. Place expression Into Cconjunctive Normal Form.


7. Convert to clauses i.e. separates all conjunctions as separate clause.

8 Rename variables to avoid duplicate clauses.


erin
Ex. 3.1 14s:

Soin. :
FOL:A—>(B#C)
Normalizing the given statement.
(i) A> (BO CAC B)

(ii) (A> (B>C)) A (A> (CB)


Converting to CNF.

Applying Rule, a>P=~avB


~Av(~BvC)a~Av(~CvB)
ie.~Av ((~BvC) a(~CvB))

3.11.3 Facts Representation

¢ Toshow how facts can be represented let's take a simple problem:


o “Heads X wins, Tails Y loses.”

o _ Our goal is to show that X always wins with the help of resolution.

e Solution can be given as follows :

1, H=>Win(X)
2. T=>Loose(Y)

3, =H=>T

4. Loose(Y) => Win(X)

Thus we have : Win(X)


UF lechswwtedye
Publications
w Aland DS-I (MU)
3-28 Knowledge Representation using First Order Logic
We can write a proof for SSS
this Problem as follws :
1, (oH, Win(X))
2, {aT, Loose(Y)}

3, {H,T)
4, {-Loose(Y), Win(X))
5, {a Win(X)}

6. {aT, Win(X)}
ww (From 2 and 4)
7. {T, Win(X)} + (From 1 and 3)
g. {Win(X)}
wm (From 6 and 7)
9 BO
(From 5 and 8)

3.11.4 Example

Let’s take the same example of forward and backward chaining to learn how to write proofs for resolution.
Step 1:

The given facts are :

1, Itisacrime for an American to sel] weapons to the enemy nations.

American(x) A Weapon(y) A sell (x,y,z) A enemy(z, America) => Criminal (x)


2. Country Nono is an enemy of America.

Enemy (Nono, America)

3. Nono has some missiles.

e Owns (Nono, x)

e Missile(x)

4, All the missiles were sold to Nono by Colonel West.

Missile(x)A owns(Nono, x)=> Sell(West, x, Nono)


5. Missile isa weapon.
Missile(x)=> weapon(x)
6. Colonel West is American.
American (West)
Step 2:
Lets convert them to CNF
1, ~ American(x) V ~ Weapon(y) V~ sell (x, y, 2) V ~ enemy(z, America)
V Criminal (x)
a Rae

2. Enemy (Nono, America)

3. Owns (Nono, x)

Tech Knewladga
Publicarions
=

First Order Logic


éRepresentation using SS
ete

Ww Al and DS-1 (MU) 3-29 Knowledg

4. Missile(x)
ie
ys

So Missile(x) V ~ owns(Nono, x) V Sell(West, x, Nono)


igen Ped

6. ~Missile(x) V weapon(x)

7. American (West)
ta re

Step 3;
To prove that West fs criminal using resolution.
Seat

~ criminal (West) ~ American (x) v ~ weapon (y) v ~ sell (x, ¥»2) V


~ enemy (z, America) v criminal (x)
cael

i x /West
ah
is ~ American (West) v ~ weapon (y) v enemy (Nono, America)
;4 ~ sell (West, y, z) v ~ enemy (z, America)
z/Nono

~ American (West) v ~ weapon (y) American (West)


v ~ sell (West, y, Nono)

~ weapon (y) v~ sell (West,y,N ono) ~ Missile (x) v~ owns (Nono, x)

| ee v Sell (West, x, Nono)

~ weapon (y) v ~ Missile


o (y) v e (x)

~ weapon (y) v owns (Nono, y) Weapon (x) v ~ Missile (x)

~ Missile (y) v ~ owns ee Missile (x)

~ owns (Nono, y) owns (Nono, x)

NIL
t West is criminal.
Ke

Coe aan

Tepe
Tork Knowledge
Publicatiens
ay sland DS. (U)
3-30 Knowledge Representation using First Order Logic
ee
soln. '

(A) proof by Resolution


step 1: Negate the statement to be proveg.

~ likes (Ravi, Peanuts)


step 2: Convert given facts to FOL

a) x, food (x) — likes (Ravi, x)

(t) food (Apple)


(<) food (Chicken)
a ¥ x W y:eats (x,y) A~ killed (x)5 food (y)
(e) eats (Ajay, Peanuts) A alive (Ajay)

(0 W x:eats (Ajay, x) — eats (Rita, x)

In this case we have to add few common sense predicate which are always true.
(g) ¥ x:~killed (x) — alive (x)

th) x:alive (x) > ~ killed (x)

Step 3: Converting FOLs to CNF


(a) ~ food (x) v likes (Ravi, x) (b) food (Apple)
(c) food (Chicken) (d) ~ eats (x, y) v killed (x) v food (y)
(e) eats (Ajay, Peanuts) (f) alive (Ajay)
(g) ~ eats (Ajay, x) v eats (Rita, x) (h) killed (x) v alive (x)
(i) ~alive (x) v killed (x)
Step 4: Proof by Resolution
~ likes (Ravi, Peanuts) ~ food (x) v likes (Ravi, x)

\ Ze
~ food (Peanuts) ~ eats (x, y) V killed (x) v food (y)
y/Peanuts

~ eats (x, Peanuts) v killed (x) eats (Ajay, Peanuts)

x/Ajay
Killed (Ajay) ~ alive (x) v killed (x)

\Au
~ alive (Ajay)
x/Ajay

alive (Ajay)

\
>
72!
Tech ilnowled
Publications
Knowledge Representation using First Order Log
3-31 __Knov ¢
WF Aland DS-1 (MU)
nuts”,
ce proved that “Ravi likes Pea
==

is NIL, it means out assump tion is W ron g. Hen


As the result of this resolution
To answer : What food Rita eats ?

(B) Proof by backward chaining : (Referring to FOLs of Step 2)

eats(Rita, x)

eats(Ajay; x)
x /Peanuts

eats{Ajay, Peanuts) alive(Ajay)

True True

Hence the answer is Rita eats peanuts. z ea ; —


to. predicates and Pee ee statement “Ram dig
Ex. 3.11.3: ~ Using a predicate logic convert the following sentences
ome fs
Sire tc<le not jump” is false.
i (a) _ Ram went to temple. : pees ae
E and takea left or right road.
(b) The way to temple iis, walk til post Box
a ditch.
(c) _ The left road has ae SF, tnd eee cae er RN aie

@). Wy to cross the dich sto ump SEG IER SUE ies Sige sau Stent tage ono
-(e)., A log is across the Tight road. SS
“(One needs to jump across the log,to go ahead.
Soln.:

Step 1 : Negate the statement to be proved.


~ jump (Ram)

Step 2 : Converting given statement to FOL

(a) At (Ram, temple)

(b;) x: At (x, temple) —— At (x, PostBox) a take left (x)

(bz) © x: At (x, temple) —— At (x, PostBox) a take right (x)

(c) W x:takeleft(x) | —~ cross (x, ditch)

(ad) — jump (x)


xcross (x, ditch)

(e) W x: take right (x) —— at (x, log)

(f) W x:at(x,log)—— jump (x)

Step 3: Converting FOLs to CNF


(a) At (Ram, temple)

(b;;) ~ At (x, temple) v At (x, PostBox)

pupil
Ww Aland DS-I (MU)
—_ 3-32 Knowledge Representation using First Order Logic
(bia) ~At(x, temple) v take left (x)
(bas) ~ At temple) v At (x, PostBox)
(b2a) ~ At (x temple) v take right (x)

() - take left (x) v cross (x, ditch)


(d) ~ cross (x, ditch) v jump (x)

(e) - take right (x) v at (x, log)

() ~at (x log) v jump (x)


Step 4: Proof by Resolution
~ jump (Ram) at (x, log) v jump (x)

x/Ram
~ At (Ram, log) ~ take right (x) v at (x, log)

x/Ram
~ take right (Ram) ~ At (x, temple) v take Tight (x)

\—
~At(x,temple) At (Ram, temple)

o
Hence proved.

ie Consider following Statements. ERs


-Rimiis hungry.

~ Prove that R ngry using resolution

Step1: Converting given facts to FOL.


1. Hungry (Rimi)
2. Hungry (Rini) — barks (Rimi)
3. Barks (Rimi) > angry (Raja)
Step2: Converting FOL statements to CNF.
1. Hungry (Rimi)
2, ~ hungry (Rimi) v barks (Rimi)
3. ~barkes (Rimi) v angry (Raja)
—_—.

W TechKnowledge
Publications
A I TT a

3.33 Knowledge Representation using First Order Logi,


W Al and DS-I (MU)

Step3: Negate the stmt to be proved


T.P.T. Angry (Raja)
Negation ; ~ Angry (Raja)
Step4: Proof by resolution
~ Angry (Raja)
~ Barks (Riml)
\ a V Angry (Raja)

~ Barks (Rimi)
Barks (Rimi)
a v ~ hungry (Rimi)

~ hungry (Rimi)
hungry

This shows that our assumption is Wrong. Hence proved that maya is Angry:

Ex. 3.11 5: Consider following facts. a


pe 1. - if maid stole the Jewelry then butler was not guilty. Seas re ae: fees
i 2. ‘Either maid stole the jewelry or she milked the cow. nits os
3. _ If maid milked the cow then butler got the cream.
4: — Therefor if butler was guilty then he got the cream.
-Prove the conclusion (step 4) is valid using resolution. Se SR a ales eee eae
Soln. :
Step 1: Converting given facts to FOL.
1. steal (maid, jwellary) — ~ guilty (butler)
2. steal (mail, jwellary) v milk (maid, cow)
3. milk (maid, cow) > got_Cream (butter)

To prove that
4. guilty (butler) > got_cream (butter)
Step 2 : Converting FOL to CNF.
1. ~steal (maid, jwellary) v ~ guilty (butler)
2. steal (maid, jwellary) v milk (maid, cow)
3. ~milk (maid, cow) v got_cream (butler)
4. ~guilty (butler) v got_cream (butler)
Step 3 : Negate the proof sentence

As sentence 4 is the one to be proved.


guilty (butler) ~, ~ got_cream (butler)

TechKuowledys
Publications
W Aland Ds-I (MU) 3-34 Knowledge Representation using First Order Logic
step 4: Proof by resolution av got cream (butter)
guilty (butler)
~ milk (maid, cow)
Vv got_cream (butler)

guilty (butler) v ~ milk (maid, cow)


steal (maid jwellery)
Vv ~ milk (mail, cow)

guilty (butler) v steal (maid, jewellery)

~ guilty (butler)
Vv ~ steal (maid jwellery)

>
Hence proved.

aoe
ee a
eens se
ESS

Step 1: Converting axioms to FOL.

(i). Vx: graduating (x) — happy (x).

ee
(ii) Vx: happy (x) > smile (x)

(iii) Sx : graduating (x)

Step 2: Converting FOL to CNF.

(i) ~ graduating (x) v happy (x)

(ii) ~ happy (1) v smile (%)

(iii) graduating (x2)

Step 3: T.P.T. x; smile (x3)

Negating the stmt


~ smile (x3)

Proof by resolution

TechKeowladga
Publications
a ee ee eh LE eS
NG

W Aland DS-1 (MU) 3-35 Knowledge Representation using First Order Lop,

~ smile (x3)
~ happy (x1) v
mils in

X3 | X1
~ happy (x1)
~ graduating (x)
Vv happy(x)
xy |X
~ graduating (x) graduating | (x2)
x |X,

%
Hence our assumption is wrong.

Hence proved.

3.12 Planning as an Application of Knowledge Based Agent

¢ Planning in Artificial Intelligent can be defined as a problem that needs decision making by intelligent systems
{it can be a robot/ a computer program) to accomplish the given target.
¢ Sometimes even a human being cannot perform two tasks at a same time if tasks have same importance level. In
that case, you have to put tasks in a sequence to accomplish the target.
¢ Take example of a driver who has to pick up and drop people from one place to another. Say he has to pick up
two people from two different places then he has to follow some sequence, he cannot pick both passengers at
same time.
e Based on these facts there is one more definition of planning which says that, Planning is an activity where
agent has to come up with a sequence of actions to accomplish target.
¢ Now, let us see what information is available while formulating a planning problem and what results are
expected.
¢ We have information about the initial status of the agent, goal conditions of agent and set of actions of an agent.
¢ Aim of an agent is to find the proper sequence of actions which will lead from starting state to goal state and
produce an efficient solution.

3.12.1 Simple Planning Agent


hd

e Say we have an agent which can be a coffee maker, a printer and a mailing system, also assume that there are 3
people who have access to this agent.
a

e Suppose at same time if all 3 users of an agent give a command to execute 3 different tasks of coffee making,
printing and sending a mail.
i a hl

e Then as per definition of planning, agent has to decide the sequence of these actions.

Tech
Publications
a
PU i

Ww Aland DS-I (MU)
Knowledge Representation using First Order Logic

Fig. 3.12.1 Example of a Planning Problem

Fig. 3.12.2 depicts a general diagrammatic representation of a planning agent that interacts with environment
with its sensors and effectors/actuators, When a task comes to this agent it has to decide the sequence
of actions
to be taken and then accordingly execute these actions.

Sensors

eee
ee
ee
Environment
?

SST
Effectors

CS

St
5S
Fig. 3.12.2 : Planning agent

3.13 Penning Problem


P

=e ee ee CTS LE ESET A ERG PRN NTE Se e aha Tere


e We have seen in above section what information is available while formulating a planning problem and what
results are expected. Also it is understandable here that, states of an agent correspond to the probable
surrounding environments while the actions and goals of an agent are specified based on logical formalization.
e Also we have learnt about various types of intelligent agents in chapter 2. Which shows that, to achieve any goal
an agent has to answer few questions like “what will be the effect of its actions”, “how it will affect the upcoming
actions”, etc. This illustrates that an agent must be able to provide a proper reasoning about its future actions,
states of surrounding environments, etc.
e Think about simple Tic-Tac-Toe game. A Player cannot win a game in one step, he/she has to follow sequence of
actions to win the game. While taking every next step he/she has to consider old steps and has to imagine the
probable future actions of an opponent and accordingly make the next move and at the same time he/she should
also consider the consequences of his/her actions.
W Tech Knowledge
Publications
sing First Order Logi
ec
Al and DS-I (MU) 3-37 Knowledge Representation U ee
W
environment:
¢ Aclassical planning has the following assumptions about the task
© Fully observable : Agent can observe the current state of the environment.
o Deterministic: Agent can determine the consequences of its actions.
te in order to
the agent at every sta
o Finite: There are finite set of actions which can be carri ed out by
achieve the goal.
not considered.
o Static: Events are steady. External event which cannot be handled by agent is
step to the endin g(goal) state in terms of time.
© Discrete : Events of the agent are distinct from starting
the goal based on the above
e So, basically a planning problem finds the sequence of actions to accomplis h
assumptions.
e Also note that, goal can be specified as a union of sub-goals.
to opponent player when a player fails to return the
e Take example of ping pong game where, points are assigned
a match you have to win 3
ball within the rules of ping-pong game. There can a best 3 of5 matches where, to win
games and in every game you have to win with a minimum margin of 2 points.

3.13.1 Why Planning?


3.13.1(A) Problem Solving and Planning

Q. How planning problem differs from search problem?


e Generally problem solving and planning methodologies can solve similar type of problems. Main difference
between problem solving and planning is that planning is a more open process and agents follow logic-based
representation. Planning is supposed to be more powerful than problem solving because of these two reasons.
Planning agent has situations (i.e. states), goals (ie. target end conditions) and operations (i.e. actions
performed). All these parameters are decomposed into sets of sentences and further in sets words depending on
the need of the system.
e Planning agents can deal with situations/states more efficiently because of its explicit reasoning capability also
it can communicate with the world. Agents can reflect on their targets and we can minimize the complexity of
the planning problem by independently planning for sub-goals of an agent. Agents have information about past
actions, presents actions and the important point is that it can predict the effect of actions by inspecting the
operations.
e Planning ts a logical representation, based on situation, goals and operations, of problem solving.

Planning = Problem solving + Logical representation

e Example : You must have solved word problems in school. We can create an agent which generates a solution
for word problems, It splits a sentence and follows logical representation,

3.14 Goal of Planning

e We plan activities in order to achieve some goals and to achieve the goal we should select appropriate actions,
also we can divide main goal into sub-goals to make planning more efficient.
* Take example of a grocery shopping, suppose you want to buy milk, bread and egg from supermarket, then your
initial state will be - “at home” and goal state will be - “get milk, bread and egg”.

pew Tech Ka ewledy?


Pusticatians
y”

vw Aland DS-I (MU) 3-38 Knowledge Representation using First Order Logic
Now if you look at the Fig. 3.14.1
qe setofactions, for 2.6. depending upon
you will understand that branching factor can b e enormous depending up
) for e.g. Watch TV, read book, , etc “p sai e at that point of time.

90. to school
pealscdasaiaes attend lactura
CTY NE ie nbiy me

~.. glean,
ie eee

0 to. supermarke
i a 90 {0 supermarket. a) buy
NB We Bhapple
etal Sel)

eoe.boak shu coines


eee EINER a

a Wateh TV.eae
Hee

Fig. 3.14.1 : Supermarket examples to understand need of planning


any state, set can be very large, such as
Thus branching factor can be defined as a set of all probable actions at
of probable actions increases, the branching factor
in the supermarket example or block problem. If the domain space.
other, this will result in the increase of search
will also increase as they are directly proportional to each
consider using heuristic functions, then you have to
To reach the goal state you have to follow many steps, if you
functions will be helpful only for guiding the search of
remember that it will not be able to eliminate states; these
states.
that all
even if we go to supermarket we need to make sure
So it becomes difficult to choose best actions. (i.e.
can be achieved).
three listed items are picked, only then goal state
be combined goals (as
difficult to describe every state, and there can
As there are many possible actions and it is order to be more efficient
is inadequate to achieve goals efficiently. In
seen in supermarket example) searching
planning is required.
s in case of planning we
ing requires explicit knowledge that mean
In above sections, we discussed that plann achieve the goal.
which will be useful in order to
need to know the exact sequence of actions
be same. For example,
planning and the order of execution need not
Advantage of planning is that, the order of t. ~
ry before planning to go to supermarke
you can plan how to pay bills for groce
by dividing/decomposing
can make use of divide and conquer policy
Another advantage of planning is that you .
the goal into sub goals.
~ .
3.14.1 Major Approaches
used for planning : an
many appr oach es to solve plann ing probl ems. Following are few major approaches
There are
rch.
Planning with state space sea
0

Partial ordered planning.


.
cal decomposition (HTN planning)
0

Hierarchical planning / hierarchi


nning with operators.
Planning situation calculus/ pla
00

Conditional planning.
Planning with operators.
oo

Tech Knowledge
Puolicacians
3-39 Knowledge Represen tation using First Order Ly gic
¥ Al and DS-I (MU)
Oo Planning with graphs.
© Planning with propositional logic.
© Planning reactive.
* Out of these major approaches we will be learning about following approaches in etal
© Planning with state space search
Partial ordered planning and
0

Hierarchical planning / Hierarchical decomposition (HTN planning).


08

Conditional planning.
Oo

Planning with operators.


o

3.15 Planning Graphs


* Planning graph is a special data structure which is
used to get better accuracy as it is a directed graph
and is useful to accomplish improved heuristic
i
estimates. i
Any of the search technique can make use J »SetAl ofUterals
literalswhich
are true at that point of time <<
of planning graphs. Also GRAPHPLAN can be used to ter
extract a solution directly.
* Planning graphs work only for propositional NS OF ACHONS 5 id | i
problems ones without variables. You have learnt Q i lea ! n : hich ics Epa at that point of me ; i
Gantt charts. Similarly, in case of planning graphs nics oe vt oe ie -
there are series of levels which match to time ladder
in the plan. Every level has set of literals and a set of
actions. Level 0 is the initial state of planning graph. Fig. 3.15.1: Planning graph level
e Example:
Init (Have(Apple))
0

Goal(Have(Apple) « Ate(Apple))
0

Action(Eat(Apple), PRECOND : Have(Apple)


0

EFFECT : sHave(Apple) « Ate(Apple))


0

Action(Cut(Apple), PRECOND : = Have(Apple)


0

EFFECT : Have(Apple))
o

Lo Ag Ly

Have (Apple) ————_{—}___-. Have (Apple) Have (Apple)

— Have (Apple) L\ — Have (Apple)

Ate (Apple) Ate (Apple)


=

— Ate (Apple) ———--}——— - Ate (Apple) ————f>} —__ _, ate (Apple)


Fig. 3.15.2

ser TechKuowledge
Puotications
Wf Al and DS-I (MU) Knowledge Representation using First Order Logic
3-40
» _ Start at level LO and determine action level AO and next level Li
AO >> all actions whose prerequisite
is satisfled at previous level
9

Connect precondition and effect of acti


ons LO - > Li
8

Inaction is represented by persistence acti


ons.
GO

Level AO contains the possible actions.


Oo

Conflicts between actions are shown by mutual exclusion links.


oo

Level L1 contains all literals that could result from picking any subset of actions in Level AO.

Conflicts between literals which cannot occur together (as a effect of selection action) are represented by
o

mutual exclusion links.


L1 defines multiple states and the mutual exclusion links are the constraints that define this set of states.
Continue until two consecutive levels are the sisame Or contain the same amount of¢ iiitenats-
SSSR UR
on relation holds etree twa
oF

e Oneaction cancels out the effect of another action. e =f onee literal is5 the 5 negation of the “other + ltera
OR. .

If each possible action pair that could achieve


2
|

e One of the effects of action is negation of e


|
|
preconditions of other action. the literal is mutually exclusive.
. :

| e One of the preconditions of one action is mutually


|
exclusion with the precondition of the other
action.

GRAPH PLAN algorithm :


with the help of following algorithm: .
GRAPH PLAN can directly extract ao\uioas from planning graph
Ro SIRT 2
=
APHPLAN (problem) return solution

graph <— EXPAND: “GRAPH ah problem

Properties of planning graph:


d!.
If goal is absent from last level then goal cannot be achieve
If there exists a path to goal then goal is present in the last level.
If goal is present in last level then there may not exist any path.

ney TechKuewledge
Publications
sentation using First Order Logic
3-41 Knowledge Repre may
WF Al and DS-I (MU)

3.16 Planning as State-Space Search SC

he ll
a mail and making coffee
We have seen example of an agent that can perform three tasks of printing, sending
namely lets’ call this agent as office agent.
: three people at a same time to perform these three different tasks then,
When this office agent ° gets order from
finite space.
let us see how planning with state space search problem will look | f we have a

a
When he gets
n the state space grid.
You can understand from Fig, 3.16.1 that the office agent is at location 2500
ly in lesser time.
a task he has to decide which task can be performed more efficient
; f printing task then the
If it finds some input and output locations nearer on state space grid for example in case
probability of performing that task will increase.
igning tasks
But to do this it should be aware of it’s own current location, the locations of people who are assigning and
the locations of the required devices.
it requires complete description of
State space search is unfavourable for solving real-world problems because,
every searched state, also search should be carried out locally.
There can be two ways of representations for a state :
1. Complete world description

2. Path from an initial state

Agent's location is 250


-Shraddha wants to print
-Raj wants to send mail to
ohn throught mailing serve:
-Gerry wants coffee

Fig. 3.16.1 : Office agent example with finite state space

1. Complete world description :


e Description is available in terms of an assignment of a value to each previous suggestion
e Orwecan say that description is available as a suggestion that defines the state.
e Drawback of this types is that it requires a large amount of space.

WwW Tech Knowledge


Publicacions
First Order Logic
j and DS-1 (MU) 3-42 Knowledge Representation using
inane
oo
_
2, Path from an initial state : Representation
e As per the name, path from an initial state gives the eG of state),
eRe ete
ie

sequence of actions which are used to reach a state from


an initial state,
e In this case, what holds in a state can be deduced from
the axiom which specifies the effects of an actions.
e Drawback of these types is that it does not explicitly
specify “What holds in every state”. Because of this it can
be difficult to determine whether two states are same. tion of states
Fig. 3.16.2 : Representa
e Search
3,16.1 Example of State Spac

University Question .
oo Boy
ere
0.
——_—
Explain Water Jug problem with State Space Search Method. ee
Let us take an example of a water jug problem kers on it. There is a pump
gall on one and a 3- gall on one. Neither has any measuring mar
» We have two jugs, a 4 - r into the 4 - gallon jug?
r how can you get exact 2 gallons of wate
that can be used to fill the jugs with wate y), such that x = 0,1,2,3 or 4,
this prob lem can be desc ribe d as of ordered pairs of integers (x,
The state space for the quantity of
e
of wate r in the 4- gall on jug and y = 0,1,2 or 3, representing
ons
representing the number of gall state is (2, n) for any value of
n, since the problem
start state is (0, 0). The goal
water in the 3- gallon jug. The
to be in the 3- gallon jug.
does not specify how many gallons need ented as rules
shown bellow. They are repres
to solv e the problem can be described as
e The oper ator s to be used the new state that results
mat che d agai nst the curr ent stat e and whose right sides describe
whose left side are
from applying the rule.
Rule set :

1. (xy) —— (4y) fill the 4- gallon jug

Ifx<4
2. (xy) ——> (%3) fill the 3-gallon jug

Ifx<3
the 4- gallon jug
3. (xy) ——» (x-dy) pour some water out of

Ifx>0
gallon jug
4. (xy) ——» (x-dy) pour some water out of the 3-
Ify>0
ground
5. (xy) ——» (Oy) empty the 4- gallon jug on the
Ifx>0
jug on the ground
6. (xy) ——» (x,0) empty the 3- gallon
Ify>0
jug into the 4- gallon
7. (xy) ——> (4,y-(4-x)) pour water from the 3- gallon
jug is full
lfx+y>=4andy> 0 jug until the 4-galoon
into the 3-gallon
(xy) ——» (x-(3-y),3)) pour water from the 4- gallon jug
oo

TechKnewledya
Publications
W Aland DS-I (MU) Knowle
Representation using First Order Log;
3-43
SSS

Ifx+y>=3andx>0 jug until the 3-gallon jug Is full


9.
(sy) —» (x+y,0) pour all the water from the 3 -gallon jug Into
Ifx+y<=4andy>0 the 3-gallon jug
10. (xy) ——» (0,x+y) pour all the water from the 4 -gallon jug Into
Ifx +y <=3 andx>0 the 3-gallon jug
11. (0,2) ___, (2,0) pour the 2-gallon from the 3 -gallon Jug Into
the 4-gallon jug
12. (2,y) ——p (0,x) empty the 2 gallon in the 4 gallon on the ground
Production for the water jug problem :

Gallons in the 4- gallon Jug | Gallons in the 3- gallon Rule Applied’


0 0

0 3
3 0 9
3 3 2
4 2 7
0 2 5or12

2 0 9or11
One solution to the water jug problem.
(x, y)

(0,0)
: 4,0) 3)
(4,3) (1, 3)

I~
43) 3) 0 49%

(40)
T~~
(4,3) 0,0) 1)

(4, 0) (0,3)
~~ (0,0) (0, 1)

(4.3) Tw
1) (2,3)

3) 3) @o) 41)
Fig. 3.16.3
On
UF Techkawwiete
pupiscariees
vw a} and DS-1 (MU) 3-44 Knowledge Representation using First Order Logic
3,17 Classification of Planning with State Space Search
—_—_—

As the name suggests state space search planning techniques is based on the spatial searching.
Planning with state space search can be done by both forward and backward state-space search techniques.

Planning ne State
oe oa. Search Th
eas tier

Forward State Space


,Searchvprogression.
ic Diantiers as
Fig. 3.17.1 : Classification of Planning with State Space Search

3.18 Progression Planners


istic planning technique,
“Forward state-space search’ {s also called as “progression planner”. It is a determin
as we plan sequence of actions starting from the initial state in order to attain the goal.
and go to final goal state. While doing
With forward state space searching method we start with the initial state
this we need to consider the probable effects of the actions taken at every state.
world state information, details of the available
Thus the prerequisite for this type of planning is to have initial
actions of the agent, and description of the goal state.
and effects of that action.
Remember that details of the available actions include preconditions
for a simple example where flight 1 is at
See Fig. 3.18.1 it gives a state-space graph of progression planner
from location A to location B. In 1st case only
location A and flight 2 is also at location A. these flights are moving
that after performing that action flight 1
flight 1 moves from location A to location B, so the resulting state shows
y In 2nd case only flight 2 moves from
is at location B whereas flight 2 is at its original location - A. similarl
performing that action flight 2 is at location B
location A to location B and the resulting state shows that after
while flight 1 is at its original location - A.
flights (i.e. their current location), and
It can be observed from the Fig. 3.18.1 that, rectangles show state of the
one location to other location).
lines give the corresponding actions from one state to another (i.e. move from
which can be accepted if
Note that the lines coming out of every state matches to all of the permissible actions
the agent is in that state.

Take Flight1 from


Location A to B —
, on. rm ,

Flight 1 at LocationA
Bete Srees ipa Hy

H Li A.
riety) ng

Take Flight 2 from : ce o : oe a


STE i

Location A to B Ritog diane Sty weer eS ns


Fea

Fig. 3.18.1 : State-space graph of a Progression planners

WP Techknowindge
Publications
3-45 ers
Knowledge Represen tation using First gic
. Order Logic
W Aland DS-1 (MU)
Progression planner algorithm :
1, Formulize the state space search problem :
h don't
‘ siti ve, the literals whic
* Initial state is the first state of the planning problem which has a set of po
appear are considered as false,
onditio ns are satisfied then
* — If preconditions are satisfied then the actions are favoured
i.e. if the prec deleted
ete for that action.
positive effect literals are added for that action else the negative effect literals are de
* — Perform goal testing by checking if the state will satisfy the goal.
e Lastly keep the step cost for each action as 1.
2. Consider example of A* algorithm, A complete graph search is considered as a complete planning algorithm. Functions
are not used.
3. Progression planner algorithm is supposed to be inefficient because of the irrelevant action problem and requirement
of good heuristics for efficient search.

3.19 Regression Planners


e “Backward state-space search” is also called as “regression planner” from the name of this method you can
make out that the processing will start from the finishing state and then you will go backwards to the initial
state.
¢ — So basically we try to backtrack the scenario and find out the best possibility, in-order to achieve the goal to
achieve this we have to see what might have been correct action at previous state.
e In forward state space search we used to need information about the successors of the current state now, for
backward state-space search we will need information about the predecessors of the current state.
© Here the problem is that there can be many possible goal states which are equally acceptable. That is why this
approach is not considered as a practical approach when there are large numbers of states which satisfy the
goal.
e Let us see flight example, here you can see that the goal state is flight 1 is at location B and flight 2 is also at
location B. We can see in Fig. 3.19.1 that if this state is checked backwards we have two acceptable states in one
state only flight 2 is at location B, but flight 1 is at location A and similarly in 2nd possible state flight 1 is already
at location B, but flight 2 is at location A.
e — As we search backwards from goal state to initial state, we have to deal with partial information about the state,
since we do not yet know what actions will get us to goal. This method is complex because we have to achieve a
conjunction of goals.
e In this Fig. 3.19.1 rectangles are goals that must be achieved and lines shows the corresponding actions.

5 Flight 4 at LocationA | Take Flight1 from


Flight 2 at Location B.|*—Location B to A
Bites Re ate
Flight1 at Location B
ee B -Fiight 2.at Location B:
NN Flight 1 at Location B ba apa ery
we h t2 at,Location: Al Take Flight 2 trom
acehie eta aa Location B to A

Fig. 3.19.1 : State-space graph of a regression planners

Tech Knowledge
Publications
4 DS-1 (MU)
Alay 3-46 K
:
Order Logic
nowledge Representation using First
Re gression algorithm

, rirstly predecessors should be determined :


To do this we need to find out Which states will; lead to the goal state after applying some actions
e on it.
We take conjunction of all such sta tes and choose one
action to achieve the goal state.
. If we say that "X” action is relevan
works t action for first conjunct then, only if pre-conditions are satisfied it

e Previous state is checked to see if the sub-goals are achieved

i should not undo preferred literals, If there are positive effects of actions which appea! in
i ust be c
2, eh "he ae eteea
goal then they eleted. Otherwise Each precondition literal of action is added, except it already appears.

3, Main advantage of this method is only relevant actions are taken into consideration. Compared to forward search,
backward search method has much lower branching factor.

3.19.1 Heuristics for State-Space Search

Progression or Regression is not very efficient with complex problems. They need good heuristic to achieve
better efficiency. Best solution is NP Hard (NP stands for Non-deterministic and Polynomial-time).
e There are two ways to make state space search efficient :
or predecessors.
o Use linear method : Add the steps which build on their immediate successors
ordering constraints are imposed
o Use partial planning method : As per the requirement at execution time
on agent.

3.20 Total Order Planning (TOP)

e We have seen in above section that forward and


regression planners impose a total ordering on actions
at all stages of the planning process.
* Incase of Total Order Planning (TOP), we have to
follow sequence of actions for the entire task at once
of
and to do this we can have multiple combinations
one
required actions. Here, we need to remember
taken care of, is
most important thing which should be
while
that TOP should take care of preconditions
creating the sequence of actions.
example, we cannot wear left shoe without
e For
right
wearing the left sock and we cannot wear the
creati ng
shoe without wearing the right sock. So while
ng,
the sequence of actions in total ordered planni
before
wearing left sock action should be executed
wearing the left shoe and wearing the right sock
the right
action should be executed before wearing
shoe. As you can see in Fig. 3.20.1.
Fig, 3.20.1: Total order planning
of Wearing Shoe
TochKnewledge
Publications
First Order Logic
WF Al and DS-1 (MU) 3-47 Knowledge Representation using =
an fail in non-coo j
If there is a cycle of constraints then, total ordered planning cannot give good results. TOP ¢ Perativg
environments. So we have Partial
Ordered Planning method.
3.21 Partial Order Planning

Ui erect | Questions

Q, Explain various types of planning methods for handling indeterminacy. CE


‘MU - May 13, May 14, Dec. 14
Q.__Discuss partial order planning giving suitable example.

* — Incase of Partial Ordered Planning (POP), ordering of the


actions is partial, Also partial ordered planning does not
specify which action will come first out of the two actions
which are placed in plan.
¢ With partial ordered planning, problem can be decomposed,
so it can work well in case the environment is non-
cooperative.
° Take same example of wearing shoe to understand partial
ordered planning.
e Apartial order planning combines two action sequences :
© First branch covers left-sock and left-shoe.
o In this case to wear a left shoe, wearing left sock is the
precondition, similarly. Fig. 3.21.1: Partial order planning of
© Second branch covers right-sock and right-shoe. Wearing Shoe
o —_Here, wearing a right sock is the precondition for wearing the right shoe.
e Once these actions are taken we achieve our goal and reach the finish state.
3.21.1 POP as a Search Problem
e Ifweconsidered POP asa search problem, then we say that states are small plans.
e States are generally unfinished actions. If we take an empty plan then, it will consist of only starting and finishing
actions.
Every plan has four main components, which can be given as follows :
Set of actions :
e These are the steps of a plan. Actions which can be performed in order to achieve goal are stored in set of
actions component.
e For example : Set of Actions = { Start, Rightsock, Rightshoe, Leftsock, Leftshoe, Finish}
e Here, wearing left sock, wearing left shoe, wearing right sock, wearing right shoe are set of actions.
Set of ordering constraints/ preconditions :
e Preconditions are considered as ordering constraints. (i.e. without performing action "x" we cannot
perform action "y")

¢ For example : Set of Ordering ={Right-sock < Right-shoe; Left-sock < Left-shoe} that is In order to wear
shoe, first we should wear a sock.
——
ww TechKnewladge
Publicariens
Ww Aland DS-1 (MU) 3-48 K nowledge i Order
Representation using First Logic
Logi
a
» so the ordering constraint can be W ear Left-sock < wear Left-shoe (Wearing Left-sock action should be
taken before wearing Left-shoe) 0 t Wea .
be taken before wearing right-shoe). ions ™
» _ If constraints are cyclic then it represents Inconsistency
wan to have a consistent plan then there should not be any cycle of preconditions.
e _ Ifwe want
3, Set of causal links :
» Action A achieves effect "E" for action B
(a) Action A Eteal + ActionB

E2408
(o) [ Buy Apple | Gut oe
Fig. 3.21.2(a) : Causal Link Partial Order Planning (b) Causal Link Example
an apple it’s effect can be eating an apple and the
© From Fig. 3.21.2(b) you can understand that if you buy
precondition of eating an apple is cutting apple.
ng constraints
has an effect 7 E and, according to the orderi
e There can be conflict if there is an action C that
it comes after action A and before action B.
This action can
of that we want to make a decorative apple swan.
* — Say we don’t want to eat an apple instead
be between A and B and It does not have effect "E".
ock > Leftsockon
t-sock->Right-sock-on — Right-shoe, Lefts
o For example: Set of Causal Links = {Righ
Finish, leftshoe> leftshoeon > Finish }.
— Leftshoe, Rightshoe — Rightshoeon >
conflicts with the causal links.
o To have aconsistent plan there should not be any
4. Set of open preconditions :
plan. Least commitment
t be achieved by some actions in the
e Preconditions are called open if it canno
e during search.
strategy can be used by delaying the choic
be any open precondition
e Tohaveaconsistent plan there should not
Problem
3.21.2 Consistent Plan is a Solution for POP
causal links and does not
Asconsistent plan does not have cycle of constraints; it does not have conflicts in the
e
a solution for POP problem.
have open preconditions so it can provide
preconditions in order
While solving POP problem operators can add links and steps from existing plans to open
e steps for removing the
to fulfill the open preconditions and then steps can be ordered with respect to the other
try solving the problem
is unattainable, then backtrack the steps and
potential conflicts. If the open precondition
with POP.
se with the help of POP we can progress from vague
e Partial ordered planning is a more efficient method, becau
we can solve a huge state space plan in less number of
plan to complete and correct solution ina faster way. Also
when sub-plans interact.
steps, this is because search takes place only

3.22 Hierarchical Planning


are organized in a hierarchical
® Hierarchical planning is also called as plan decomposition. Generally plans
format.
the help of links
Complex actions can be decomposed into more primitive actions and it can be denoted with
between various states at different levels of the hierarchy. This is called as operator expansion.
Ww TechKnowledge
Publications
3-49 Knowledge Representation using First Order Logic
WF Aland ps. (MU) —<—s

For example:
move (X,¥,2)

pickup (x, y) putdown (x, 2)

Fig. 3.22.1 : Operator expansion


travel from some source to a destination. Also you can
Fig. 3.22.2 shows, how to create a hierarchical plan to
observe, at every level how we follow some sequence of actions.
Traval (source, dest.)

Take-right
|
Take-tran Take-Bus

ae
Catch (Train) Leave (Tran, dest.)
Goto (tran, Source) Buy-Ticket (Train)

Goto (counter) Request


(ticket)
Pay (ticket)
|
Fig. 3.22.2 : Hierarchical planning example
3.22.1 POP One Level Planner
search for
If you are planning to take a trip, then first you have to decide the location. To decide lecation we can
expenses, etc.
various good locations from internet based on, weather conditions, travelling
browser, after that we open
Say we select Rajasthan, with one level planner, first we switch on PC, then we open
time, etc details to book the
Indian Railways website booking site for ticket booking, then we enter the date,
railway ticket. After that we will have to do hotel's ticket booking and so on.
use of one
This type of planning is called one level planning. If the type of problem is simple then we can make
ievel planner. For complex problems, one level planner cannot provide good solution.
C= Switch
cy onrete
~ .
|
Aver computer...
hsbal ae (Kies:

_ Start web
¢4: browser:

“Open Indian ]
Ralvays webate
~

ag,Select, date —
ees

Fig. 3.22.3 : One Level planning example


7 TeaNaowledye
Publications
.
nd DS+1 (MU) ——__ 3-50 Knowledge Representation using First Order Logic
Ala
3 22.2 Hierarchy of Actions
d. Minor activities would cover more
, _ Interms ve tea minor actions, hierarchy of actions can be decide
railway Ticket
recise a i, 0 accomplish the major activities. In case of above example, we can have
back are the major activities.
pooking, Hotel Booking, Reaching Rajasthan, Staying and enjoying there, coming
dinner in palace, Take photos, etc are the Minor
’ Wile oe a taxi to reach railway station, Have candle light
activities.
» Inreal world there can be complex problems.
of a test match(180 overs).
ple : A capta in of a crick et team plans the order of 4 bowlers in 2 days
. For exam
Number of probabilities : 418° = 16%,
large
search space. For plan ordering we have to try out a
» Motivation behind this planning is to reduce the size of order
have limited ways in which we can select and
number of possible plans. With plan hierarchies we
primitive operators. to
rarch ical plann ing major steps are given more impor tance. Once the major steps are decided we attempt
» Inhie
solve the minor detailed actions. case we need to
that majo r steps of plan may run into difficulties at a minor step of plan. In such
e It is possible
ately ordered sequence to devise the plan.
return to the major step again to produce appropri

3.22.3 Planner

First identify a hierarchy of major conditions.


steps), So we postpone the details to next level.
Construct a plan in levels (Major steps then minor
pwPp

e.
Patch major levels as detail actions become visibl

Finally demonstrate.

Example :
can be given as follows :
Actions required for “Travelling to Rajasthan”
e Opening yatra.com (1)
e Finding train (2)
e Buy Ticket (3)
e = Get taxi(2)
e Reach railway station(3)
e Pay-driver(1)
e =Checkin(1)
e Boarding train(2)
e Reach Rajasthan (3)

TechKnewledge
Publications
Logic
Knowledge Representation using First Order =
WE Aland DS-1 (MU) 3-51
Goal Wf

afore “”
poe eee

1 EXTERNAL
} INTERFACE |
*Criticality" :
j «& Maximum !
; H
| Preconditions of
| Dummy «- Goal Wif |
7 |
}
| "Skeleton Plan”
« Dummy !
I
I
L $s seer ton om tem seo Coe eee
TT
a ee ae a ee mn me 4 oa ee
!
| PLANNING
| EXECUTIVE Set State to Initlal
World Model |
| j
\
F
!
| Is
i
\ No “ "Skeleton \ Yes
Plan" |I
|
1 ‘ null |
I s ?
"Step" < First step of Is y I
"Critically" \V 28S ;
i "Skeleton Plan" j
minimum
| "Skeleton Plan" <— Rest
of "Skeleton Plan"! » 2
| No
\ }
Determine '
Plan to Achieve a State j
= Lower, *Criticality* us
1 in which Preconditions of i
the Operator that was RUE re
| Applied in "Step" are True t
Collect Steps Along j
| Successful Path into New 1
pt XR ee eens
nee
' / Resume Process in ‘ ofgSkeleton Plan’.
} 4 I
1 | Higher Abstraction
| Space, Forbidding theey | Oe peonea ~
! @ Choice of "Step" { hai ee \

seciesieoeiieclslo” Vi |p Lowel Space iyi)


Wilesubuaiscwweseuwene”
t
I
|
Generate Successful |
Plan, Build MACROP, t
!
* Exit Through all Levels

Fig. 3.22.4 : Planner

1* Jevel plan:

Buy Ticket (3), Reach Railway Station(3), Reach Rajasthan (3)

—<
Tech Knowledge
Pubtications
rs

nd DS:I (MU) 3-52 Knowledge Representation using First Order Logic


wv Ala
an ievel plan:

(3).
inding train (2), Buy ticket (3), Get taxi(2), Reach railway station (3), Boarding train(2), Reach Rajasthan
gr level plan (final) :
Railway station (3), Pay-driver(1), Check
Opening yatra.com (1), Finding train (2), Buy ticket (3), Get taxi(2), Reach
in(1), Boarding train(2), Reach Rajasthan (3).

nguages
3.3,2293 7Planning La
Wirinccuesh Question
MU - May 13, Dec. si
@. __ Explain STRIPS representation of planning problem.
problems and restrictive enough to allow
» _ Language should be expressive enough to explain a wide variety of
efficient algorithms to operate on it.
e Planning languages are known as action languages.

Stanford Research Institute Problem Solver (STRIPS) :


planner called STRIPS (Stanford Research Institute
e Richard Fikes and Nils Nilsson developed an automated
Problem Solver) in 1971.
languages in
ng language. STRIPS is foundation for most of the
e _ Later on this name was given to a formal planni
current use.
order to express automated planning problem instances in

Action Description Language (ADL) :


proposed ADL in 1987.
ADL is an advancement of STRIPS. Pednault

Comparison between STRIPS and ADL:


ah he a . ADL
Sr. ; iy STRIPS language
No.

Can support both positive and negative literals.


1. | Onlyallows positive literals in the states,
: Same sentence is expressed as >
example : A valid sentence in STRIPS is | For example
For
~Stupid A 7Ugly
expressed as > Intelligent* Beautiful.

2. | Makes use of closed-world assumption (i.e. Makes use of Open World Assumption (i.e.
Unmentioned literals are false) unmentioned literals are unknown)

3. | We only can find ground literals in goals. We can find quantified variables in goals.

For example: Intelligent A Beautiful. For example : 4x At (P1, x) A At(P2, x) is the goal of
having P1 and P2 in the same place in the example of
) the blocks

4. | Goals are conjunctions Goals may involve conjunctions and disjunctions

For example ; (Intelligent A Beautiful). For example ; (Intelligent A (Beautiful V Rich))


f 2
Ww TechKnowledge
Publications
|

Koo lesent rier


s5a__ Repr
Knowledge Representation using First Order Logic
NF lands. (MU)
ADL —
= STRIPS language
No.

5. | Effects are conjunctions Conditional effects are allowed: when P:E means E js
|
an effect only if P is satisfied
a
Equality predicate (x == y) is built in.in
6._| Does not support equality.
Supported for types
7 | Does not have support for types
For example : The variable
_|
p: Person

3.23.1 Example of Block World Puzzle


Tool P| pL Feo

Start Goal

Fig. 3.23.1
Standard sequence of actions is :
1. Grab Zand Pickup Z
2. Then Place Z on the table
3. Grab Y and Pickup Y
4. Then Stack Y onZ
5. Grab X and Pickup X
6. Stack XonY

e Elementary problem is that framing problem in AI is concerned with the question of what piece of knowledge or
information is pertinent to the situation.
¢ To solve this problem we have tom make an Elementary Assumption which is a Closed world assumption. (i.e. If
something is not asserted in the knowledge base then it is assumed to be false, this is also called as “Negation by
failure”)
e Standard sequence of actions can be given as for the block world problem :
on(Y, table) on(Z, table)
on(X, table) on(Y, Z)
on(Z, X) on(X, Y)
hand empty hand empty
clear(Z) clear(X)
clear(Y)
a, Tool rh Tool

HA [y] FA
Start Goal

Fig. 3.23.2

Ww Tech Knowledge
Publications
“(MU 3-54
| atand DSM) os Knowledge Representation using First Order Logic

We can write 4 main rules for the block world etc as fol ows !
Rule Precondition and Deletion List Add List
Rule 1 | pickup(X)
pi | hand empty, on(X,table), holding(X)
clear(X)

Rule 2 | putdown(X) | holding(x) hand empty, on(X,table),


clear(X)

Rule 3 | stack(X,Y) | holding(x), on(XY),

clear(Y) clear(X)

Rule 4 | unstack(X,Y) | on(X,Y), holding(X),

clear(X) clear(Y)
as follows :
Based on the above rules, plan for the block world problem : Start > goal can be specified

eT
«

a
1. unstack(Z,x)
2. putdown(Z)

SS
3, pickup(Y)
4, stack(¥,Z)

PET
5. pickup(X)

POT S
6. — stack(X,Y)
use ofa data structure called "Triangular Table".
e Execution of this plan can be done by making

1 on (C, A) clear (C) unstuck

SS
hand empty (C, A)
holding(C) | putdown (C)
2

ee
handempty | pickup
on (B, table)
3 (B)
clear (C) Holdin | stack

$ g(8) | (BC)
Hand pickup
on (A, table) clear (A)

? empty _| (A)
clear (B) | holdin | stack

: g(a) | (AB)
on (C, table) on (B, C) on (A, B)
clear (A)
7
0 1 2 3 4 5 6

Fig. 3.23.3

that rows have 1 >


e Ina triangular table there are N + 1 rows and columns. It can be seen from the Fig. 3.23.4
table indicates the
| n+1 condition and for columns 0 > n condition is followed, The first column of the triangular
* ~~» starting state and the last row of the triangular table indicates the goal state.
w TechKnowledge
Publications
WF Aland Ds-I (MU) 3-55 Knowledge Representation using First Order Loge

* — With the help of triangular table a tree is formed as shown below to achieve the goal state :

Fig. 3.23.4
one such example.
® — Anagent (in this case robotic arm) can have some amount of fault tolerance. Fig. 3.23.5 shows
Not allowed
Tool [|
hy
Xx] YW] — pl wg ™*_zl
Wrong
Start
move

Fig. 3.23.5
3.23.2 Example of the Spare Tire Problem

;
Q. _ Explain planning problem for spare tyre problem. UE ea ae)
© Consider the problem of changing a flat tire. More precisely, the goal is to have a good spare tire properly
mounted onto the car’s axle, where the initial state has a flat tire on the axle and a good spare tire in the trunk.
To keep it simple, our version of the problem is a very abstract one, with no sticky lug nuts or other
complications.
e There are just four actions: removing the spare from the trunk, removing the flat tire from the axle, putting the
spare on the axle, and leaving the car unattended overnight. We assume that the car is in a particularly bad
neighborhood, so that the effect of leaving it overnight is that the tires disappear.
e The ADL description of the problem is shown. Notice that it is purely propositional. It goes beyond STRIPS in
that it uses a negated precondition, ~At(Flat, Axle), for the PutOn(Spare, Axle) action. This could be avoided by
using Clear (Axle) instead, as we will see in the next example.
Solution using STRIPS :
e Init(At(Flat, Axle) A At(Spare, Trunk ))
e Goal(At(Spare, Axle)) Action(Remove(Spare, Trunk ),
e PRECOND: At(Spare, Trunk)
e EFFECT :-At(Spare, Trunk ) A At(Spare, Ground))
e Action(Remove(Flat, Axle),
e PRECOND: At(Flat, Axle)
e EFFECT: ~At(Flat, Axle) A At(Flat, Ground))
e Action(PutOn(Spare, Axle), |
© PRECOND : At(Spare, Ground) A- At(Flat, Axle) |
© EFFECT:~ At(Spare, Ground) A At(Spare, Axle)) :
e Action(LeaveOvernight)
e PRECOND
e EFFECT:~At(Spare, Ground) A~ At(Spare, Axle) A ~ At(Spare, Trunk) A > At(Flat, Ground) A ~ At(Flat, Axle))

ay Tech Knowledge
Publications
a ee OO
} FI aw

.
wv atand DS-t (MU) using First Order Logic al
Knowledge Representation as
n10e
anning
epresenting Real World Problems as Pl
{aSe
ics °R
earning
324 > r

. _ Explain various types of planning methods for handling indeterminacy.


able and deterministic. Real world problems
» in earlier sections, me have discussed planning that is observ
The planning and acting on real world problems
however are not predictable and completely unobservable.
require more sophisticated approach.
environment, it is very
of real world is uncertainty. In an uncertain
, One of the most important characteristic
important for agent to rely upon its percepts (series of past experience)
refer its percept and should change course of
. Whenever some unexpected condition encountered, agent should plan with
able to cancel or replace the currently executing
action accordingly. In other words agent should be
s.
some other more suitable and reliable plan if something unexpected happen
perception related to world is uncertain. In
uncertain but human
+ It should be noted that real world itself is not also receives the
we try to give human percep tion ability to machine, and hence machine
artificial intelligence, incomplete and incorrect
the real world. So machine has to deal with
perception of uncertainty about
information like human does. availability is
of state depend s on availab le knowledge. In real world, knowledge
e Determining the condition
are non deterministic.
always limited, so most of the time, conditions
inter determinacy is called
or degree of indete rminac y depend s upon the knowledge available. The
® The amount
can have unpredictable effects.
“bounded indeterminacy” when actions
for handling indeterminacy :
e Four planning strategies are there
(i) Sensorless planning
(ii) Conditional planning
ing
(iii) Execution monitoring and replann

(iv) Continuous planning

(i) Sensorless planning : n.


are not based on any perceptio
kno wn as con for man t plan ning. These kinds of planning
Sensorless planning is also
any cost.
plan should reach its goal at
.
The algorithm ensures that
SS

(ji) Conditional planning :


area

bounded indeterminacy
pla nni ngs are som eti mes ter med as contingency planning and deals with
Conditional depending on the condition.
d earl ier. Age nt mak es a plan , eval uate the plan and then execute it fully or partly
discusse
and replanning :
(ii) Execution monitoring
Se Thee

of planning discussed earlier. Additionally it observes the


In this kind of planning agent can employ any of the strategy
alae

it and again executes and observes.


plan execution and If needed, replan
(iv) Continuous planning :
It persist over time and keeps on planning on some
Continuous planning does not stop after performing action.
predefined events.
nment.
These events Include any type unexpected circumstance in enviro
W TechKnowladge
Publications
WF Al and DS-I (MU) 3-57 Knowledge Representation using First Order Logic

3.25 Multi-Agent Planning

* — Whatever planning we have discussed so far, belongs to single user environment. Agent acts alone in a single
user environment.
When the environment consists of multiple agent to, then the way a single agent plan its action get changed.
We have a glimpse of environment where multiple agents have to take actions based on current state. The
environment could be co-operative or competitive. In both the cases agent's action influences each other.
e Few of the multi agent planning Strategies are listed below:

(i) Co-operation

(ii) Multi body planning


(iii) Co-ordination mechanisms
:
(iv) Competition

te
ee eed
(i) Co-operation :

Oe
In co-operation strategy agents have joint goals and plans. Goals can be divided into sub goals but ultimately combined

— weet Delt
to achieve ultimate goal.
(ii) Multibody planning :
Multi body planning is the strategy of implementing correct joint plan.
(iii) Co-ordination mechanisms :

These strategies specify the co-ordination between co-operating agents. Co-ordination mechanism is used in several
co-operating plannings.
(iv) Competition :
Competition strategies are used when agents are not co-operating but competing with each other. Every agent wants to
achieve the goal first

3.26 Conditional Planning

e¢ Conditional planning has to work regardless of the outcome of an action.


* Conditional Planning can take place in Fully Observable Environments (FOE) where the current state of the
agent is known environment is fully observable. The outcome of actions cannot be determined so the
environment is said to be nondeterministic.
* — Inconditional planning we can check what is happening in the environment at predetermined points of the plan
to deal with ambiguous actions.
¢ — It can be observed from vacuum world example, Conditional Planning needs to take some actions at every
state
and must be able to handle every outcome for the action it takes. A State node is represented with a square
and
chance node is represented with a circles.
e For a state node we have an option of choosing some actions. For a chance node agent has to handle every
outcome.
e Conditional Planning can also take place in the Partially Observable Environments (POE) where, we cannot keep
a track on every state. Actions can be Uncertain because of the imperfect sensors.

w TechKnowledge
Publications
P

Aland DS-l (MU) Order Logic


v 3-58 Knowledge Representation using First

Left Suck

GOAL . LOOP Left Suck


Right Suck

a a
sl #8 |) a} ws [AQ]
|as| 7 7a 5 |= ol Tel aa] ©
LOOP
GOAL

example
Fig. 3.26.1 : Conditional Planning - vacuum world
but not about Left. Then, in such
vacu um agen t exam ple if the dirt is at Right and agent knows about Right, set or a
In
»
behi nd when the agent , leave s a clean square. Initial state is also called as a state
cases Dirt might be left
belief state. . Automatic sensing
rvable environments
itional Planning for partially obse
e Sensors play important role in Cond every step. Another method is
auto mati c sens ing an agent gets all the available percepts at
can be useful; with
sensory actions.
ing, with whic h perc epts are obtained only by executing specific
Active sens
Pa ae
°

esie rs
Ae =e op
ee -
- See
Se neenep eet

CleanL

wore aren w we.


-
peta %.
on

-
~, Mee et
- Sheena we pene” ro

-- mae
- - .

7. sea,
A waeniccncensoee*™

le (condition 2)
Fig. 3.26.2: Conditional Planning - vacuum world examp
nay TechNaowledya
Publications
W Aland DS-I (MU) resentation using First Order Logic
3-59 Knowledge Rep

Review Questions

Q.1 Explain in detail the knowledge based agent.

Q.2 Describe WUMPUS WORLD Environment.

Q.3 Specify PEAS properties and type of environment for the same.

Q.4 — Whatis reasoning? What is its role in artificial intelligence?

Q.5 Explain the role of probabilistic reasoning in medical diagnostic.

Q.6 — Explain conditional planning with example.

Q.7 — Write a short note on multi-agent planning.

Q.8 Whatis propositional logic?

Q.9 What is prepositional logic ? Explain with example.

Q.10 Write syntax and semantics and example sentences for propositional logic.

Q.11 Write syntax and semantics and example sentences for propositional logic.

Q.12 Explain the inference process in case of propositional logic with suitable examples.

Q.13 Explain Hom Clause with example.

Q.14 Write a short note on : Drawbacks of propositional logic.

Q.15 Whatis first order logic?

Q.16 Write syntax and semantics of FOL with example.

Q.17 Differentiate between propositional logic and predicate logic.

Q.18 Explain inference process in FOL using Forward Chaining.and Backward Chaining.

[email protected] Compare forward and backward reasoning with suitable example.

Q.20 What is unification 7

Q.21 Explain steps to convert logical statements to clausal form.

Q.22 What Is planning in Al ?

Q.23 Explain planning problem.

Q.24 Explain goal of planning with supermarket example.

Q.25 Write short note on planning graphs.

Q.26 What are the major approaches of planning ?

Q.27 What are different languages used for implementation of planning 7

Q.28 What is one level planner ?

TechKuowledge
Publicettons
3-60 Knowledge Representation using First Order Logic
w Aland DS-I (MU)

Q. 29 Write short note hierarchical planning.

Q. 30 Explain process of generating solution to partial order planning problem.

Q. 31 Explain process of generating solution to partial order planning problem.

Q. 32 Explain Total order planning with example.

Q. 33 Explain regression planners with example.

Q. 34 Explain regression planning.

Q.35 Explain progression planners with example.

Q. 36 How planning strategies are classified ?


000

as
ee EET
Ves
FSET
ee
PS
Fe STEELE
TAT
SN
ETC
SE
ee ST SEE
CRssata
=
~~

Rae -
Sh pe Stel
ath ct [

TO DS

A
INTRODUCTION

A
i
SELatT
Data Analytics, : Lifecycle,

ce
Introduction and Evolution of Data Science, Data Science Vs. Business Analytics Vs. Big Data,
Roles in Data Science Projects, Self-Learning Topics : Applications and Case Studies of Data Science in various

|
Industries.

i el
41 Introduction of Data Science

Data science is the combination of statistics, mathematics, programming, problem-solving, capturing data in

Ni i
data.
ingenious ways, the ability to look at things differently, and the activity of cleansing, preparing, and aligning
This umbrella term includes various techniques that are used when extracting insights and information

i
from data.
Data science is the practice of mining large data sets of raw data, both structured and unstructured, to identify
patterns and extract actionable insight from them. This is an interdisciplinary field, and the foundations of data

il a
science include statistics, inference, computer science, predictive analytics, machine learning algorithm

development, and new technologies to gain insights from big data.


To define data science and improve data science project management, start with its life cycle. The first stage in

i I
the data science pipeline workflow involves capture: acquiring data, sometimes extracting it, and entering it into
the system. The next stage is maintenance, which includes data warehousing, data cleansing, data processing,
data staging, and data architecture.
Data processing follows, and constitutes one of the data science fundamentals. It is during data exploration and

i
processing that data scientists stand apart from data engineers. This stage involves data mining, data
classification and clustering, data modelling, and summarizing insights gleaned from the data—the processes
that create effective data. i

Next step is data analysis, an equally critical stage. Here data scientists conduct exploratory and confirmatory
work, regression, predictive analysis, qualitative analysis, and text mining.
During the final stage, the data scientist communicates insights. This involves data visualization, data reporting,
and the use of various business intelligence tools, and assisting businesses, policymakers, and others in smarter
decision making.

4.2 Evolution of Data Science

The term “Data Science” has emerged only recently to specifically designate a new profession that is expected to
make sense of the vast stores of big data. But making sense of data has a long history and has been discussed by
scientists, statisticians, librarians, computer scientists and others for years.
-1(MU
Introduction to DS
its use, attempts to define it, and related
The f ollowing Bg timeline trace S the evolution of the term "Data Sclence” and
terms:

» i
In 1947, 7, TukTukey coined the term “bit” e e which Claude Shannon used in his 1948 paper “A Mathematical Theory of
ications.” ‘
Communications." In 1977, Tukey published Exploratory Data Analysis, arguing that more emphasis needed to be
placed “ using data to suggest hypotheses to test and that Exploratory Data Analysis and Confirmatory Data
Analysis "can and should - proceed side by side,"
1974 Peter Naur published Concise Survey of Computer Methods in Sweden and the United States. The studied
contemporary data processing methods that are used in a wide range of applications. It is organized around the
concept of data as defined in the IFIP Guide to Concepts and Terms in Data Processing. Naur offered the following
definition of data science: “The science of dealing with data, once they have been established, while the relation
of the data to what they represent is delegated to other fields and sciences.”
1996 Members of the International Federation of Classification Societies (IFCS) met in Kobe, Japan, for their
biennial conference. For the first time, the term “data science” is included in the title of the conference (“Data
science, classification, and related methods”). The classification societies have variously used the terms data
analysis, data mining, and data science in their publications.

e 2001 William S. Cleveland published "Data Science: An Action Plan for Expanding the Technical Areas of the
Because
Field of Statistics.” It was a plan “to enlarge the major areas of technical work of the field of statistics.
the plan is ambitious and implies substantial change, the altered field was called ‘data science.”
ng on Analytics,” a Babson
e May 2005 Thomas H. Davenport, Don Cohen, and Al Jacobson published “Competi
a new form of competition
College Working Knowledge Research Centre report, describing “the emergence of
Instead of competing on
based on the extensive use of analytics, data, and fact-based decision making...

tp
and predictive modelling as
traditional factors, companies began to employ statistical and quantitative analysis
t in the Harvard Business
primary elements of competition. ” The research was later published by Davenpor

SA
g on Analytics: The New
Review (January 2006) and was expanded (with Jeanne G. Harris) into the book Competin

ae
Science of Winning (March 2007).
of the
e january 2009 Harnessing the Power of Digital Data for Science and Society is published. This report
and Technology
Interagency Working Group on Digital Data to the Committee on Science of the National Science
Council stated that “The nation needs to identify and promote the emergence of new disciplines and specialists’
expert in addressing the complex and dynamic challenges of digital preservation, sustained access, reuse and
repurposing of data. Many disciplines are seeing the emergence of a new type of data science and management
expert, accomplished in the computer, information, and data sciences arenas and in another domain science.
These individuals are key to the current and future success of the scientific enterprise,
« September 2011 Harlan Harris wrote in “Data Science, Moore's Law, and Moneyball”: “Data Science’ is defined
as what ‘Data Scientists’ do. What Data Scientists do has been very well covered, and it runs the gamut from data
collection and munging, through application of statistics and machine learning and related techniques, to
interpretation, communication, and visualization of the results. |

W TechKnowledge
Publications
a ee el aL ee eee ee a NE eT

WF Aland ps. (Muy 4-3


Introduction to Ds
|

4.3 Data Science Vs. Business Analytics Vs. Big Data

Business Analytics Big Data


Data Science

Business Analytics is the statistical Data science Is the study of data using
Big data is the raw material used in
Study of business data to gain the field of science.
data
statistics, algorithms and technology.
insights. Characterized by its velocity, variety,
and volume (the 3Vs).

Uses mostly structured data. Data might be unstructured, semi-


Uses both structured and unstructured
data. structured, or structured

Does not involve much coding. It is Coding is widely used. This field is a Data comes from various sources,
more statistics oriented. combination of traditional analytics such as online purchases,
practice with good computer science multimedia forms, instruments,
knowledge. financial logs, sensors, text files, and
others.

The whole analysis is based on Statistics is used at the end of analysis Big data is the raw material for data
Statistical concepts. following coding. science, which affords the
techniques for analyzing the data.

Studies trends and patterns specific Studies almost every trend and pattern. Contains numerous trends and
to business. patterns.

Top industries where business Top industries/applications where data Big data is used in all the industries
analytics is used finance, science is used to produce insights.
healthcare, marketing, retail, supply e-commerce, finance, machine learning,
chain, telecommunications manufacturing.

4.3.1 Data Mining vs Data Science

Data mining is a technique used in business and data science both, while data science is an actual field of
scientific study or discipline. Data mining’s goal is to render data more usable for a specific business purpose.
Data science, in contrast, aims to create data driven products and outcomes—usually ina business context.

Data mining deals mostly with structured data, as exploring huge amounts of raw, unprocessed data is within
the bounds of data science. However, data mining is part of what a data scientist might do, and it's a skill that's
part of the science.
A data scientist is more likely to tackle larger masses of both structured and unstructured data. They will also
formulate, test, and assess the performance of data questions in the context of an overall Strategy. A data
scientist is more likely to look ahead, predicting or forecasting as they look at data.

Ww Tech Knowledge
Publicacions
4a
jand D 5-1 ( (MU) Introduction to DS
I
14 Data Analytics
ne conclusions.
pata analytics is the science of examining raw data to reach certain conclusi
es applyi
analytics involvook h
, ee fee vets tol nee an algorithmic or mechanical process to derive insights and running throug
zations
ati ons , It is use d in seve ral industries, which enables organi
e correl ormed decisions, as well as verify and disprove existing theories
lytics companies erhake
a data ana
ond more inf
i s that are
or models. The focus of data analytics lies in inf erence, which is th e process of deriving ivi conclusion
already knows,
solely based on what the researcher

ured or numerical data using a given question


» The aia analyst is likely to be analyzing a specific dataset of struct
with predictive
ics has more to do with placing historical data in context and less to do
or questions. Data analyt
minded search for the right question; it relies upon
modelling and machine learning. Data analysis isn't an open- do
ermore, unlike data scientists, data analysts typically
having the right questions in place from the start. Furth
not create statistical models or train machine learning tools.
organizational
anal ysts focus on strat egy for busin esses, comparing data assets to various
» Instead, data has already been
or plans . Data analy sts are also more likely to work with localized data that
hypotheses essing raw data
technical data science skills are essential to proc
processed. In contrast, both technical and non-
mathematical, analytical, and statistical skills.
as well as analyzing it. Of course, both roles demand
work. Instead, they tend
business culture approach in their everyday
e Data analysts have less need for a broader scope and purpose will
focus as they analyze pieces of data. Their
to adopt a more measured, nailed-down
data scientist.
almost certainly be more limited than those ofa
more likely to focus on
ship betw een the data anal yst and data is retrospective. A data analyst is
« The relation ady been processed for insights.
into existing data sets that have alre
specific questions to answer digging

4.5 Lifecycle
is a cyclic structure that
map s out such step s for data science professionals. It
e Data analytics architecture its significance and characterist
ics.
omp ass es all the data life cycl e phases, where each stage has
enc one direction, either
data prof essi onal s to proceed with data analytics in
form gui des
© The lifecycle’s circular info rmation, professionals can scrap the
entire research and
Based on the ne wly rece ived
forward or backward. the lifecycle diagram.
step to red o the com plete analysis as per
move back to the initial still no-defined structure
are tal ks of the dat a ana lyt ics lif ecycle among the experts, there is
» However, while there te dat a analytics architecture that
is uniformly followed
're unlike ly to find a con cre
of the mentioned stages. You ing extra phases (when
exp ert. Suc h amb igu ity gives rise to the probability of add
lysis ce or
by every data ana ing for different stages at on
ng the bas ic ste ps. Th er e is also the possibility of work
necessary) and removi
entirely.
skipping a phase
ics Lifecycle
4.5.1 Phases of Data Analyt
into six phases
give the data anal ysis proc ess a structured framework is divided
e Ascientific method that helps
ture.
of data analytics architec
y and Formation
Phase 1: Data Discover
and how to achieve it by
a defi ned goal. In this phas e, you'll define your data’s purpose
Everything begins with lifecycle,
the time you reach the end of the data analytics
W TechKnowledge
Publications
YA SATS TAL A A A A a
ooo
EE

Al and DS-1 (MU) Introduction t


4-5
=
a
The initial stage consists of mapping out the potential use and requirement of data , iste
such asened
where the
—_
information is coming from, what story you want your data to n benefi
convey, and how your organizabe sm
the incoming data.
. requirements
Basically, as a data analysis expert, you'll need to focus on enterprise ; related to data, rather than
data itself. Additionally, your work also includes assessing the tools and systems that are necessary a to read,
organize, and process all the incoming data.
Essential activities in this phase include structuring the business problem in the fo rm of an analytics challenge
and formulating the initial hypotheses (IHs) to test and start learning the data. The subsequent phases are then
based on achieving the goal that is drawn in this stage.
Phase 2 : Data Preparation and Processing
This stage consists of everything that has anything to do with data. In phase 2, the attention of experts moves
from business requirements to information requirements.
The data preparation and processing step involve collecting, processing, and cleansing the accumulated data.
One of the essential parts of this phase is to make sure that the data you need is actually available to you for
processing. The earliest step of the data preparation phase is to collect valuable information and proceed with
the data analytics lifecycle in a business ecosystem. Data is collected using the below methods:

¢ Data Acquisition : Accumulating information from external sources.

° Data Entry : Formulating recent data points using digital systems or manual data entry techniques within
the enterprise.

e Signal Reception : Capturing information from digital devices, such as control systems and the Internet of
Things.
Phase 3 : Design a Model
After mapping out your business goals and collecting a glut of data (structured, unstructured, or semi-
structured), it is time to build a model that utilizes the data to achieve the goal.
There are several techniques available to load data into the system and start studying it:
e ETL (Extract, Transform, and Load) transforms the data first using a set of business rules, before loading it
into a sandbox.

e ELT (Extract, Load, and Transform) first loads raw data into the sandbox and then transform it.
e ETLT (Extract, Transform, Load, Transform) is a mixture; it has two transformation levels.
This step also includes the teamwork to determine the methods, techniques, and workflow to build the model in
the subsequent phase. The model's building initiates with identifying the relation between data points to select
the key variables and eventually find a suitable model.
Phase 4 : Model Building
This step of data analytics architecture comprises developing data sets for testing, training, and production
purposes. The data analytics experts meticulously build and operate the model that they had designed in the
previous step.
They rely on tools and several techniques like decision trees, regression techniques and neural networks for
building and executing the model. The experts also perform a trial run of the model to observe if the model
corresponds to the datasets.

W TechKnowledge
Puplicacions
=

e Al and DS-1 (MU) 4-6 Introduction to DS


== <= Sa SS

phase 5: Result Communication and Publication


Remember the goal you had set for your business in phase 1? Now is the time to check if those criteria are met
by the tests you have run in the previous phase.
The communication step starts with collaboration with major stakeholders to determine if the project results
are a success or failure. The project team is required to identify the key findings of the analysis, measure the
business value associated with the result, and produce a narrative to summarise and convey the results to the
stakeholders.
s
phase 6: Measuring of Effectivenes
As your data analytics lifecycle draws to a conclusion, the final step is to provide a detailed report with key
findings, coding, briefings, technical papers/ documents to the stakeholders.
the sandbox
Additionally, to measure the analysis’s effectiveness, the data is moved to a live environment from
and monitored to observe if the results match the expected business goal. If the findings are as per the objective,
out in phase
the reports and the results are finalized, However, suppose the outcome deviates from the intent set
to change your input
1then. You can move backward in the data analytics lifecycle to any of the previous phases
and get a different output.
stages that define how information is
The data analytics lifecycle is a circular process that consists of six basic
However, the ambiguity in having a
created, gathered, processed, used, and analyzed for business goals.
experts in working with the information.
standard set of phases for data analytics architecture does plague data
achieving them helps in drawing out
But the first step of mapping out a business objective and working toward
the rest of the stages.

4.6 Roles in Data Science Projects

Domain Data Translator Analytics Data Scientist

e~ Domain expertise e Stats and ML


e - Business analysis @ Interpret insights

@ -Solutioning ® Scripting skills

Information Designer Development ML Engineer


Design
@- Information design @ Software engineering
e User centered design e Front/back-end coding
e Interface/visual design (parts) e Data pipelining

Project e ~ Project management


Management « Business analysis/solutioning
e Team handling

Fig. 4.6.1

Data Scientist
Data Scientists find and interpret rich data sources, merge data sources, create visualizations, and use machine
Jearning to build models that aid in creating actionable insight from the data. They know the end-to-end process
of data exploration and can present and communicate data insights and findings to a range of team members
W Tech Knowledge
Publications
s Introduction to Ds
BF Aland DS-1 (MU) 4-7 —
: : in actionable knowledge
* — Inshort, they apply the scientific discovery process, including hypothesis testing, to obtaina :
related to a scientific or business problem.

Data Engineer
. ‘ efforts . They design,
desig develop,P
¢ — Data engineers make the appropriate data accessible and available for data science
and code data-focused applications that capture data, as well as clean the data.
. : tasets).
¢ This role also helps to ensure consistency of datasets (e.g., meaning of attributes across da )

Data Science Architect


, i icati ilities. In other
e Data science architects design and maintain the architecture of data science applications and faci
. and processes
workflows.
words, this role creates and manages relevant data models, data storage systems
* — Inconjunction with the Data Engineer, they manage and merge large amounts of data and their related sources.
Data Science Developer

e Data Science Developers design, develop, and code large data (science) analytics applications to support
scientific or enterprise/business processes. This role enables models to be deployed (i.e, use a model in
production) and requires some expertise in data science, as well as knowledge of how to effectively develop
software applications.
© Sometimes this role is known as a machine learning engineer. Regardless, they help bridge the worlds of data
science and software development.

Data Science Product Owner

e The product owner is responsible for prioritizing what work gets done, ensuring that each work item is clearly
defined from a business context, and that the upcoming work and priorities of the team are visible and
transparent.

e In addition, the product owner must agree that the tasks in the done column are actually done. In short, the
product owner represents all the stakeholders for the project. While a product owner is often the product
manager, it is possible to have separate these roles, in that the product managers has a more strategic focus on
the product's vision, company objectives, and the market (as compared to product owners which are more
tactical and directly involved within the day-to-day data science team by translating a product manager’s
strategy into actionable tasks.

Data/Business Analyst

e Data/Business Analysts analyze a large variety of data to extract information about system, service, or
organization performance and present them in usable/actionable form.
e They better shape a problem for the data scientist to explore. Note the difference between a data analyst and
data scientist.

Subject Matter Expert

Subject matter experts are people with extensive knowledge of how to apply the analytics within a specific
organizational context. This role is accountable to ensure the desired insights are actionable.

Ww TechKnowledge
Publications
| OO EE

gy Aland DSI (MU) Introduction to DS


8
4.7 Applications
of Data Science eel =. a - a
_
4, Finance

The financial industry is one of the Most numbers-driven in the world, and one of the first industries that adopted data
science into the field. As it is fairly known, financial companies are information-driven, and data science is the perfect
helper to get actionable insights and obtain a sustainable development for financial institutions such as banks. Data
science helps in risk assessment and monitoring analysis, and
potential fraudulent behavior, payments, customer
creates a more stable financial
experience; among Many other utilizations. The ability to make data-driven decisions
environment and data scientists make the backbone of the industry.
2, Healthcare
By connecting pattern recognition, analytics, statistics, and deep learning algorithms, data science makes healthcare
more efficient. The demand for data scientists in the healthcare area grows rapidly, according to research published by
the Journal of the American Medical Informatics Association. The ability to quickly process large volumes of data for
clinical and laboratory reports, data scientists enable a more precise diagnosis process by utilizing deep learning
techniques. There are also many companies that market smart wearables, used to track and detect health conditions,
and data science is in the heart of the process. This allows data scientists to reduce the risk of health issues, and directly
impact the state of human wellbeing, not just in the US, but in the entire world.
3, Travel industry
Travel personalization has become an increasingly deeper process than it used to be. The possibility to create customer |
profiles based on segmentation, offering personalized experiences according to their needs and preferences, has its |
foundations in data science. Forecasting the behavior of travelers by knowing where they want to go next, what kind of
prices are they ready to pay, and when to launch special promotions, hugely depends on the level of applying data
scientists‘ skills and abilities.
4. Energy

e The energy industry experiences major fluctuations in prices and higher costs of projects - obtaining high-
quality information has never been so important. Data scientists help in cutting costs, reducing risks,
optimizing investments and improving equipment maintenance. They use predicting models to monitor {
compressors, which, in turn, can reduce the number of downtime days.
e “A day’s production at a small site - 1 000 barrels of oil - represents $30 000 of revenue,” stated Francisco
Sanchez, president of Houston Energy Data Science. Regarding the (data science) tools used in extracting F,
and evaluating data, it can range from Oracle, Hadoop, NoSQL, Python, and various other software and
solutions that can manipulate and analyze large datasets.
5. Manufacturing
e Often referred as industry 4.0 (with the introduction of robotization and automation as the 4" industrial
revolution), the manufacturing industry keeps growing in need of data scientists where they can apply their
knowledge of broad data management solutions through quality assurance, tracking defects, and increasing
the quality of supplier relations.
¢ Similar to the energy industry, utilizing preventive maintenance to troubleshoot potential future equipment
issues is another focus where data scientists can find good usage of their skills. Avoiding delays in the
production process, implementing artificial intelligence and predictive analytics offers the possibility to
manage frequent manufacturing issues : overproduction of products, logistics or inventory. In short, data
scientists help in identifying inefficiencies and tuning the production process.
W TechKnowledye
Pudlicacgions
Introduction to ps
WF Aland Ds-1 (MU) 4-9 ——<— =
6. Gaming
There are 2.5 billion gamers across the world, and the industry is becoming the heart of entertainment,
Data science is used in the industry to build models, analyze optimization points, make predictions or
identify patterns to ultimately improve gaming models. Not just limited to rer ERARCRAR PROSE, Uta
Scientists also work in the monetization, where they need to identify the most valuable players and analyze
general consumer behavior to increase the profitability of the company (the more the players spend, the
higher the profitability).
Another area where data scientists can put their skills to use is in fraud detection; security levels in the
gaming industry must be of highest standards, thus, machine learning algorithms allow faster identification
of suspicious account activities.
7. Pharmaceuticals

Connected to human health, the pharma industry has also emerged as an industry where data science is
increasing its application. For example, a pharmaceutical company can utilize data science to ensure a more
stable approach for planning clinical trials. The patent exclusivity “starts roughly at the same time of its first
clinical trial,” therefore, companies need to resort to data science in order to build precision into their
calculations of the potential success or failure of the clinical trials.
Another application can be seen before the trial even starts, by identifying suitable candidates based on
their body structure such as chemical structure, medical history or other important characteristics. Data
scientists read, evaluate, monitor and perform these analyses.
These are just some of the industries where we see active applications of data science and its benefits. The
future will certainly bring even more usage of this exciting field, and, whether you are a striving data
scientist or already in the field for years, the wealth of career choice is beneficial to all the inquisitive data
_ explorers out there.

4.8 Case Studies of Data Science in various Industries

1. Data Science in Education


Data Science has also changed the way in which students interact with teachers and evaluate their
performance, Instructors can use data science to analyze the feedback received from the students and use it
to improve their teaching.
Data Science can be used to create predictive modeling that can predict the drop-out rate of students based
on their performance and inform the instructors to take necessary precautions.
IBM analytics has created a project for schools to evaluate student’s performance based on their
performance. Universities are using data to avoid retention supplement the performance of their students.
For example, the University of Florida makes use of IBM Cognos Analyticsto keep track of student
performance and make necessary predictions.
Also, MOOCs and online education platforms are using data science to keep track of the students, to
automate the assignment evaluation and to better the course based on student feedback.
2. Data Science in BioTech
Se:

The human gene is composed of four building blocks - A, T, C and G. Our looks and characteristics are
een

determined by the three billion permutations of these four building blocks. While there are genetic defects
and defects acquired during lifestyle, the consequences of it can lead to chronic diseases.

Ww Tech Knowlodye
Punlicacians
ge Aland DS-T(MU)
—_— [Introduction ta DS
Identifying such defects at an —
ensuite. s. : 8 can
Carly stage help the doctors and diagnostic teams to take ke p preventive
Helix is one of the genome an
alysis Companies that provide customers with their genomic details. Also,
several medicine s tailored for
Specific genetic designs have become increasingly popular due to the advent
of new computational met
hod ologies,
Due to the explosion in data, we ¢
‘an understand complex genomic sequences and analyze them on a large
scale,
Data ensts can Use contemporary computing power to handle large datasets and underst
and patterns
of genomic sequences to identify defects and provide insights to physicians
and researchers.
bertaragre with the usage of wearable devices,
data scientists can use the relationship between
genetic characteristics the
and the medical visits to develop a predi
ctive modeling system.
3, Predictive Modeling for Maintaining Oil
and Gas Supply
Crude oil and gas industries face a major
problem of equipment failures which usual
inefficiency of oil wells and their performa ly occurs due to the
nce at a subpar level.
With the adoption of a successful Strategy that advocates for predict
ive maintenance, the well operators can
be alerted of crucial stages for shutdown as well as can be notified of
maintenance periods. This will lead to
a boost in oil production and prevent
further loss.
Data Scientists can apply Predictive Maintenance Strategy to use data in order
to optimize high-value
machinery for manufacturing and refining oil products. With the telemetry data
extracted through sensors,
a Steady stream of historical data can be used to train our machine
learning model.
This machine learning model will predict the failing of machine parts and will notify the operators
of timely
maintenance in order to avert oil losses.

SPs=pemors
A Data Scientist assigned with the development of PdM strategy will help to avoid hazards and
will predict
machine failures, prompting the operators to take precautionary steps.

Ene
Review Questions

Sy See,
Q.1 Explain Difference between Data Science and Business Analytics.

Q.2 Write short note on Phases of Data Analytics Lifecycle.


Q.3 Explain Applications of Data Science.

Q.4 Write short note on Applications of Data Science.

Q.5 What is Data Science.


Q.6 Explain Difference between Data Mining and Data Science.
ras

Q.7 Write short note on Data Science in BioTech.

go0O0
EXPLORATORY DATA
ANALYSIS

Introduction to exploratory data analysis, Typical data formats, Types of EDA, Graphical/Non grapbial itethous,
Univariate/multivariate methods Correlation and covariance, Degree of freedom, Statistical Methods for Evaluation
including ANOVA,
Self-Learning Topics: Implementation of graphical EDA methods.

5.1 Introduction to Exploratory Data

* Exploratory Data Analysis refers to the critical process of performing initial investigations on data so as to
discover patterns, to spot anomalies, to test hypothesis and to check assumptions with the help of summary
statistics and graphical representations.
e In Exploratory data analysis (EDA) data scientists use visualisation and transformation to explore the data in a
systematic way. EDA is an iterative cycle.
e Following is the general process of EDA:
o Generate questions about your data.
o Search for answers by visualising, transforming, and modelling your data.
o Use what you learn to refine your questions and/or generate new questions.

fo Visualise

Import —~» Tidy Transform Communicate

Explore

Program
Fig.5.1.1 : EDT Process
EDA is not a formal process with a strict set of rules. More than anything, EDA is a state of mind. During the
initial phases of EDA user should feel free to investigate every idea that occurs to the user.
Some of these ideas will pan out, and some will be dead ends. As the data exploration continues, user will home
in ona few particularly productive areas that user will eventually write up and communicate to others.
EDA is an important part of any data analysis. Data cleaning is just one application of EDA. To do data cleaning,
we need to deploy all the tools of EDA: visualisation, transformation, and modelling.
e The goals of the EDA process
o Aproper EDA hopes to accomplish several goals:
ef Aland DS-1(MU)
= 5-2 Exploratory Data Analysis
oO To q question the data ;
aand determine if there are
> To determine if the d r
problems inherent in the dataset; ;
ata on hand
3 is suffi
research
clent to answer a particular question or wheth er
additional feature engineeringis required
o vel
Todevelopa framework for answering the research
question;
o Torefine the q question S and/or
research problem based on what you have learned about the data.
5.1.1 Typical Data Formats

ED Ais fundamenta lly a creative process, And like most creative processes, the key to asking quality questions
tne t
!s
tog enerate a large i
TBE quantity of questions. It is difficult to ask revealing questions at the start of your analysisi
because you do not know what insights are contained in your
dataset.
On the other hand, each new question that you ask will expose you to a new aspect of your data and increase
your chance of making a discovery. You can quickly drill down into the most interesting parts of your data—and
develop a set of thought-provoking questions—if you follow up each question with a new question based on
what you find.

There is no rule about which questions you should ask to guide your research. However, two types of questions
will always be useful for making discoveries within your data.
e Youcan loosely word these questions as:
1. What type of variation occurs within my variables?
2, What type of co-variation occurs between my variables?
e _ Lets define few of the important terms before we proceed.
o Avariable is a quantity, quality, or property that you can measure.
° A value is the state of a variable when you measure it. The value of a variable may change from
measurement to measurement.
An observation is a set of measurements made under similar conditions (you usually make all of the
measurements in an observation at the same time and on the same object). An observation will contain
several values, each associated with a different variable. I'll sometimes refer to an observation as a data
point.
Tabular data is a set of values, each associated with a variable and an observation. Tabular data is tidy if
each value is placed in its own “cell”, each variable in its own column, and each observation in its own row.
Variation is the tendency of the values of a variable to change from measurement to measurement. You can
see variation easily in real life; if you measure any continuous variable twice, you will get two different
results. This is true even if you measure quantities that are constant, like the speed of light. Each of your
measurements will include a small amount of error that varies from measurement to measurement.
Categorical variables can also vary if you measure across different subjects (e.g. the eye colors of different
people), or different times (e.g. the energy levels of an electron at different moments). Every variable has its
own pattern of variation, which can reveal interesting information. The best way to understand that pattern
is to visualise the distribution of the variable’s values.

5.1.1(A) Visualising Distributions

* How you visualise the distribution of a variable will depend on whether the variable is categorical or continuous.
A variable is categorical if it can only take one of a small set of values,
* In R, categorical variables are usually saved as factors or character vectors. To examine the distribution of a
categorical variable, use a bar chart:

w TechKnowledge
Publications
Exploratory Data Analysis
¥F Aland Ds-1 (MU) 5-3 =
library(plotly)
geplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))

| -
20000 + . _ ne egos ager
| | | :
| i

| S
| | | _
15000+-— tT i pti | = tc

: | | |
|
{ po ag et Heer }

8 10000 4- | new|. re

|
| |
5000 +~~---—~-f
| C
ob coe : ae Sales i 3 5,3 ? “3 sss :

Fair Good Very Good Premium Ideal


cut

Fig. 5.1.2

© The height of the bars displays how many observations occurred with each x value.
e A variable is continuous if it can take any of an infinite set of ordered values. Numbers and date-times are two
examples of continuous variables. To examine the distribution of a continuous variable, use a histogram:
library(plotly)
ggplot(data = diamonds).+

geom histogram(mapping = aes(x = carat), binwidth = 0.5)


30000 +. vf at etna ee

20000
count

10000 fe veined

carat

: Fig. 5.1.3
Ww Tech Knowledge
Publications
EF Aland DS-1 (MU)
Ee
5-4 Exploratory Data Analysis
A histogram divides the X-axis into
equally spaced bins and
observations that fall in each then uses the height of
bin a bar to display the nu
a carat value between 0.25 aand mber "
nc ; 0.75
nth, ewhBra
ichphareabothe
ve, leftthe andtallesrigt bar shows that almost 30,000 observations have
If you wish to over] ay multip ht edges of the bar.
le histogr ams in
of gecom_histogram(), the same plot, | recommend using geom_fre
Bcom
_freqpoly() pe tforms the qpoly() instead
displaying the counts wit same calculation as geom_histogram(), but inste
h bars, Uses line S instead. It's much easier to unde ad of
rstand overlapping lines than bars.
geplot(data = smaller, mapping = aes(x = car
at, colour = cut)) +
geom_freqpoly(binwidth = 0.1)

4000

cut
~ — Fair
5 —— Good
~ == Very Good
2000qj— fe —- Premium
---- Ideal

wy .
XY ee

carat

Fig. 5.1.4
5.1.2 Typical Values

In both bar charts and histograms, tall bars show the common values of a variable, and shorter bars show less-
common values. Places that do not have bars reveal values that were not seen in your data. To turn this information
into useful questions, look for anything unexpected :
Which values are the most common? Why?
Which values are rare? Why? Does that match your expectations?
Can you see any unusual patterns? What might explain them?
As an example, the histogram below suggests several interesting questions:
Why are there more diamonds at whole carats and common fractions of carats?
Why are there more diamonds slightly to the right of each peak than there are slightly to the left of each peak?
Why are there no diamonds bigger than 3 carats?
geplot(data = smaller, mapping = aes(x = carat)) +
geom_histogram(binwidth = 0.01)
Copy

W Tech Knowledge
Pedlications
ee Aland DS-I (MU) 5.5 Exploratory Data Analysis

ee

2000 +-
count

1000

carat

Fig. 5.1.5
Clusters of similar values suggest that subgroups exist in your data. To understand the subgroups, ask:
e How are the observations within each cluster similar to each other?
e® Howare the observations in separate clusters different from each other?
e Howcan you explain or describe the clusters?
e Why might the appearance of clusters be misleading?
The histogram below shows the length (in minutes) of 272 eruptions of the Old Faithful Geyser in Yellowstone
National Park. Eruption times appear to be clustered into two groups: there are short eruptions (of around 2
minutes) and long eruptions (4-5 minutes), but little in between.
geplot(data = faithful, mapping = aes(x = eruptions)) +
geom_histogram(binwidth = 0.25)
Copy
40

30
count
ro
°o

10

2 3 4 5
eruptions

Fig. 5.1.6
WF TechKnowledya
Publications
€F Aland DS-1 (MU)
=== 5-6 Exploratory Data Analysis
=> SSS

Many of the questions above will Prompt you to explore a relationship between variables, for example, to see if
the values of one variable can explain the behavior of another variable, We'll get to that shortly.
5.1.3 Unusual Values

e Outliers are observations that are unusual; data points that don’t seem to fit the pattern. Sometimes outliers are
data entry errors; other times outliers suggest important new science.

When you have a lot of data, outliers are sometimes difficult to see in a histogram. For example, take the
distribution of the y variable from the diamonds dataset. The only evidence of outliers is the unusually wide
limits on the x-axis.
ggplot(diamonds) +
geom_histogram(mapping = aes(x = y), binwidth = 0.5)

12000

8000

Ee
2 2,
° iy
“ te

A
4000

caret
0

0 20 40 60
y

Fig. 5.1.7

e There are so many observations in the common bins that the rare bins are so short that you can’t see them
(although maybe if you stare intently at 0 you'll spot something).
e Tomake it easy to see the unusual values, we need to zoom to small values of the y-axis with coord_cartesian() :
ggplot(diamonds) +
geom_histogram(mapping = aes(x = y), binwidth = 0.5) +
coord_cartesian(ylim = ¢(0, 50))

W TechKnowledga
Publicacions
WF Aland ps. (Mu) Exploratory Data Analysis=<

i
507 +}

1
dna
count

oa i fi se

20 y 40 60
Fig. 5.1.8

(coord_cartesian() also has anxlim{) argument for when you need to zoom into the x-axis. ggplot2 also
has xlim{) and ylimQ) functions that work slightly differently: they throw away the data outside the limits.)

5.1.4 Missing Values

If you've encountered unusual values in your dataset, and simply want to move on to the rest of your analysis,
you have two options.
1. Drop the entire row with the strange values:
2. diamonds2 <- diamonds %>%
filter(between(y, 3, 20))
I don’t recommend this option because just because one measurement is invalid, doesn’t mean all the
measurements are. Additionally, if you have low quality data, by time that you've applied this approach to every
variable you might find that you don’t have any data left!
3. Instead, 1 recommend replacing the unusual values with missing values. The easiest way to do this is to use
mutate() to replace the variable with a modified copy. You can use the ifelse() function to replace unusual values
with NA:
4, diamonds2 <- diamonds %>%
mutate(y = ifelse(y < 3 | y > 20, NA, y))
ifelseQ) has three arguments. The first argument test should be a logical vector. The result will contain the value
of the second argument, yes, when test is TRUE, and the value of the third argument, no, when it is false.

5.2 Types of EDA


5.2.1 Types Of Exploratory Data Analysis
1. Univariate Non-graphical 2. Multivariate Non-graphical
3. Univariate graphical 4. Multivariate graphical

W Tech Knowledge
Publications
-

¥F Aland DS-L(MU) 5-8 ES


Exploratory Data Analysis
———S—S—S—SS—SE=EaESEa

this is the simplest form of data analysis as during this we use just one variable to
*

1. Univer i
is to know the underlying sample
research the info. The standard goal of univariate non-graphical EDA of the
lly part
distribution/ data and make observations about the population. Outlier detection is additiona
analysis. The characteristics of population distribution include:
to do with typical or middle
e Central tendency : The central tendency or location of distribution has got
es
values. The commonly useful measures of central tendency are statistics called mean, median, and sometim
mode during which the foremost common is mean. For skewed distribution or when there’s concern about
outliers, the median may be preferred.
out the find the
e Spread : Spread is an indicator of what proportion distant from the middle we are to seek
of spread. The variance is that the
info values. the quality deviation and variance are two useful measures
variance is the root of the variance
mean of the square of the individual deviations and therefore the
are the skewness and kurtosis of the
¢ Skewness and kurtosis: Two more useful univariates descriptors
kurtosis may be a more subtle measure of
distribution. Skewness is that the measure of asymmetry and
peakedness compared to a normal distribution
tion
l EDA technique is usually wont to show the connec
Multivariate Non-graphical : Multivariate non-graphica
tion or statistics.
between two or more variables within the sort of either cross-tabula
variables,
called cross-tabulation is extremely useful. For 2
e For categorical data, an extension of tabulation
with column headings that match the amount of
cross-tabulation is preferred by making a two-way table filling the counts
t of the opposite two variables, then
one-variable and row headings that match the amoun
with all subjects that share an equivalent pair of levels.
quantitative variables
tative variable, we create statistics for
e For each categorical variable and one quanti of categorical
then compare the statistics across the amount
separately for every level of the specific variable
variable.
t version of
of ANOVA and comparing medians may be a robus
e Comparing the means is an off-the-cuff version
one-way ANOVA.
not give the
are quantitative and objective, they are doing
Univariate graphical: Non-graphical methods analysis, also
methods are more involve a degree of subjective
complete picture of the data; therefore, graphical
graphics are:
are required. Common sorts of univariate
bar
ram, which may be a barplot during which each
e Histogram: The foremost basic graph is a histog
/total count) of cases for a variety of values.
represents the frequency (count) or proportion (count
lot about your data, including central tendency,
Histograms are one of the simplest ways to quickly learn a
spread, modality, shape and outliers.
nd-leaf plots. It shows all data values
» Stem-and-leaf plots: An easy substitute for a histogram may be stem-a
and therefore the shape of the distribution.
Boxplots are excellent at
e Boxplots: Another very useful univariate graphical technique is that the boxplot.
also as
presenting information about central tendency and show robust measures of location and spread
re

aspects like
providing information about symmetry and outliers, although they will be misleading about
multimodality. One among the simplest uses of boxplots is within the sort of side-by-side boxplots.
e Quantile-normal plots: The ultimate univariate graphical EDA technique is that the most intricate. it’s called
to see how well a
the quantile-normal or QN plot or more generally the quantile-quantile or QQ plot. it’s wont
specific sample follows a specific theoretical distribution. It allows detection of non-normality and diagnosis
of skewness and kurtosis

W Tech Knowledge
Publications
— —
“i

5-9 Explorat ory


ratory ysis
Data Analysis
F Aland DS-1 (MU)
etween two OF more
4. Multivariate graphical: Multivariate graphical data uses graphics to display relationships b
: representing one
sets of knowledge. The sole one used commonly may be a grouped barplot with each ~ able
: site va .
level of 1 of the variables and every bar within a gaggle representing the amount of the oppos
Other common sorts of multivariate graphics are:
the scatterplot , sohas
® Scatterplot: For 2 quantitative variables, the essential graphical EDA technique i s that
and one on the y-axis and therefore the point for every case in
your dataset.

2
one variable on the x-axis
* Run chart: It's a line graph of data plotted over time.
® Heat map: It's a graphical representation of data where values are depicted by color.

ee
and response.
* Multivariate chart: It’s a graphical representation of the relationships between factors
) in two-dimensional plot
e Bubble chart: It's a data visualization that displays multiple circles (bubbles
Perform
In a nutshell: You ought to always perform appropriate EDA before further analysis of your data.
learn about
whatever steps are necessary to become more conversant in your data, check for obvious mistakes,
It is very
variable distributions, and study about relationships between variables. EDA is not an exact science-
important are!

5.2.2. Tools Required For Exploratory Data Analysis

Some of the most common tools used to create an EDA are:


R : An open-source programming language and free software environment for statistical computing and
graphics supported by the R foundation for statistical computing. The R language is widely used among
Statisticians in developing statistical observations and data analysis.
Python : An interpreted, object-oriented programming language with dynamic semantics. Its high level, built-in
data structures, combined with dynamic binding, make it very attractive for rapid application development, also
as to be used as a scripting or glue language to attach existing components together. Python and EDA are often
used together to spot missing values in the data set, which is vital so you'll decide the way to handle missing
values for machine learning.
Apart from these functions described above, EDA can also:
e Perform k-means clustering: Perform k-means clustering: it's an unsupervised learning algorithm where
the info points are assigned to clusters, also referred to as k-groups, k-means clustering is usually utilized in
market segmentation, image compression, and pattern recognition
e EDA is often utilized in predictive models like linear regression, where it’s wont to predict outcomes.
e It is also utilized in univariate, bivariate, and multivariate visualization for summary statistics, establishing
relationships between each variable, and understanding how different fields within the data interact with
one another.

5.3 Graphical/Non Graphical Methods

If we are focusing on data from observation of a single variable on n subjects, i.e, a sample of size n, then in
addition to looking at the various sample, we also need to look graphically at the distribution of the sample. Non-
graphical and graphical methods complement each other.
While the non-graphical methods are quantitative and objective, they do not give a full picture of the data;
therefore, graphical methods, which are more qualitative and involve a degree of subjective analysis, are also
required.
var Tech Knowledge
Publications
— Tee ee ee ———-oe
» —_

-1(MU
BF Aland DS (MY) — 5-10 Exploratory Data Analysis
53.1
— ° . ——————==== —
Multivariate Non-Graphical EDA

Multivariate non-graphical EDA techniques generally show the relationship between two or more variables in
the form of cither cross-tabulation or statistics.
cross-tabulation:

« For categorical data (and quantitative data with only a few different values) an extension of tabulation called
cross-tabulation is very useful. For two variables, cross-tabulation is performed by making a two-way table with
column headings that match the levels of one variable and row headings that match the levels of the other
variable, then filling in the counts of all subjects that share a pair of levels.
e The two variables might be both explanatory, both outcome, or one of each. Depending on the goals, row
percentages (which add to 100% for each row), column percentages (which add to 100% for each column)
and/or cell percentages (which add to 100% over all cells) are also useful.
e Cross-tabulation is the basic bivariate non-graphical EDA technique.

5.3.2 Univariate graphical EDA

Histogram :
¢ The most basic graph is the histogram, which is a barplot in which each bar represents the frequency (count) or
proportion (count/total count) of cases for a range of values. Typically the bars run vertically with the count (or
proportion) axis running vertically. To manually construct a histogram, define the range of data for each bar
(called a bin), count how many cases fall in each bin, and draw the bars high enough to indicate the count.

° — It is often worthwhile to try a few different bin sizes/numbers because, especially with small samples, there may
sometimes be a different shape to the histogram when the bin size changes. But usually the difference is small.
Stem-and-leaf plots :
e Asimple substitute for a histogram is a stem and leaf plot. A stem and leaf plot is sometimes easier to make by
hand than a histogram, and it tends not to hide any information.
e Nevertheless, a histogram is generally considered better for appreciating the shape of a sample distribution than
is the stem and leaf plot. A stem and leaf plot shows all data values and the shape of the distribution.

Boxplots :
e Another very useful univariate graphical technique is the boxplot. The boxplot will be described here in its
vertical format, which is the most common, but a horizontal format also is possible. Boxplots are very good at
presenting information about the central tendency, symmetry and skew, as well as outliers, although they can be
misleading about aspects such as multimodality. One of the best uses of boxplots is in the form of side-by-side
boxplots.
e The term fat tails is used to describe the situation where a histogram has a lot of values far from the mean
relative to a Gaussian distribution. This corresponds to positive kurtosis. In a boxplot, many outliers (more than
the 1/150 expected for a Normal distribution) suggests fat tails (positive kurtosis), or possibly many data entry
errors. Also, short whiskers suggest negative kurtosis, at least if the sample size is large.
* — Boxplots are excellent EDA plots because they rely on robust statistics like median and IQR rather than more
sensitive ones such as mean and standard deviation. With boxplots it is easy to compare distributions usually,
for one variable at different levels of another, with a high degree of reliability because of the use of these robust
statistics.

W Tech Knowledge
Publications
WF Aland Ds (Mu) 5-11 Exploratory Data Analysis
=
Quantile-normal plots :
The final univariate graphical EDA technique is the most complicated. It is c glled the quantile-normal or QN plot
ple follows a
or more generality the quantile-quantile or QQ plot. It is used to see how well a particular sam
particular theoretical distribution.
Although it can be used for any theoretical distribution, we will limit our attention to seeing how well a sample
of data of size n matches a Gaussian distribution with mean and variance equal to the sample mean and variance.
By examining the quantile-normal plot we can detect left or right skew, positive or negative kurtosis, and
bimodality.

5.3.3. Correlation and Covariance

For two quantitative variables, the basic statistics of interest are the sample covariance and/or sample
correlation, which correspond to and are estimates of the corresponding population parameters. The sample
covariance is a measure of how much two variables “co-vary”, i.e., how much (and in what direction) should we
expect one variable to change when the other changes. Sample covariance is calculated by computing (signed)
deviations of each measurement from the average of all measurements for that variable.
Then the deviations for the two measurements are multiplied together separately for each subject. Finally these
values are averaged (actually summed and divided by n-1, to keep the statistic unbiased). Note that the units on
sample covariance are the products of the units of the two variables.
Positive covariance values suggest that when one measurement is above the mean the other will probably also
be above the mean, and vice versa. Negative covariances suggest that when one variable is above its mean, the
other is below its mean. And covariances near zero suggest that the two variables vary independently of each
other. Technically, independence implies zero correlation, but the reverse is not necessarily true.
Covariances tend to be hard to interpret, so we often use correlation instead. The correlation has the nice
property that it is always between -1 and +1, with -1 being a “perfect” negative linear correlation, +1 being a
perfect positive linear correlation and 0 indicating that X and Y are uncorrelated. The symbol r or Ixy is often
used for sample correlations.
The general formula for sample covariance is,

Cov (XY) = ——T


It is worth nothing that Cov (X,X) = Var (X)
The formula for the sample correlation is,
Cov (X,
Cor (X,Y) = an

Where s, is the standard deviation of X and sy is the standard deviation of Y


If variation describes the behavior within a variable, covariation describes the
havior between variables. Covariation is the tendency for the values of two or more variables to vary together in
a related way.
The best way to spot covariation is to visualise the relationship between two or more variables. How you do that
should again depend on the type of variables involved.

W Tech Knowledge
Publicacions
P

6412 mzpforatory Data Analyail


EF Aland DS-1 (MU)
53.4 A categorical and Continuous Variable
down by a categorical variable,
, _ It's common to want to explore the distribution of a continuous variable broken
as in the previous frequency polygon.
the height is
lt appe aran ce of geom _fre qpol y() is not that useful for that sort of comparison because
. The defau smaller than the others, it’s hard to see the
given by the count. That means if one of the groups is much
s with its quality:
differences in shape. For example, let's explore how the price of a diamond varie
= aes(x = price)) +
ggplot(data = diamonds, mapping
binwidth = 500)
geom_freqpoly(mapping = aes(colour = cut),
5000 i
n
wt
'

\
4000
1
1
cut
— Fair
3000 ==: Good
5 asenee Vary Good

8 —— Premium
2000
--- Ideal

1000

sateen,
0
10000 15000 20000
0 5000
price

Fig. 5.3.1

because the overall counts differ so much:


It’s hard to see the difference in distribution
geplot(diamonds) +
aes(x = cut))
geom_bar(mapping =
Copy ©
20000

15000

10000

5000

OT
Fair Good VeryGood Premium Ideal
cut
Fig. 5.3.2
Tech Knowledge
Publications
WF Aland Ds-1 (MU) 5-13 Exploratory Data Analysis
—=——

To make the comparison easier we need to swap what is displayed on the y-axis. Instead of displaying count,
We'll display density, which is the count standardised so that the area under each frequency polygon is one.
geplot(data = diamonds, mapping = aes(x = price, y = ..density..)) +
geom_freqpoly(mapping = aes(colour = cut), binwidth = 500)

5e-04

48-04 +——-

! cut
3e-04 +—
— Fair
= --- Good
8 ~— Very Good
2e-04 4 —- Premium
~- Ideal

je-04

0e+00 4

0 5000 10000 15000 20000


price

Fig. 5.3.3

There’s something rather surprising about this plot - it appears that fair diamonds (the lowest quality) have the
highest average price! But maybe that’s because frequency polygons are a little hard to interpret - there’s a lot going
on in this plot.

Another alternative to display the distribution of a continuous variable broken down by a categorical variable is
the boxplot. A boxplot is a type of visual shorthand for a distribution of values that is popular among statisticians.
Each boxplot consists of:

e Abox that stretches from the 25th percentile of the distribution to the 75th percentile, a distance known as the
interquartile range (IQR). In the middle of the box is a line that displays the median, i.e. 50th percentile, of the
distribution. These three lines give you a sense of the spread of the distribution and whether or not the
distribution is symmetric about the median or skewed to one side.
¢ Visual points that display observations that fall more than 1.5 times the IQR from either edge of the box. These
outlying points are unusual so are plotted individually.

e A line (or whisker) that extends from each end of the box and goes to the farthest non-outlier point in the
distribution.

WwW Tech Knowledge


Pubiicarions
y a ———

Ww Aland DS-I (MU) 5-14 Exploratory Data Analysis


=—_

The actual How a histogram How a boxplot


values ina would display the would display
distribution values (rotated) the values

| | Outliers <= |

1.5xJQR
® Whisker to
t farthest non- —————>
outlier point

Inter-Quartile Range
oF

75th percentile ae
(tOR)
cmoee

50th percentile __,


25th percentile ———>
oc om

Fig. 5.3.4

© Let's take a look at the distribution of price by cut using geom_boxplot():


geplot(data = diamonds, mapping = aes(x = cul, y = price)) +
geom_boxplot(}

% 10000
c

Fair Good Very Good Premium Ideal


cut

Fig. 5.3.5
» We see much less information about the distribution, but the boxplots are much more compact so we can more
easily compare them (and fit more on one plot). It supports the counterintuitive finding that better quality
diamonds are cheaper on average! In the exercises, you'll be challenged to figure out why.
e Cutis an ordered factor: fair is worse than good, which is worse than very good and so on. Many categorical
variables don’t have such an intrinsic order, so you might want to reorder them to make a more informative
isplay. One way to do that is with the reorder() function.

Ww TechKaowledge
Publications
Exploratory Data Analysis
WF Atanas. (MU) 5-15 SSS

ighway mil mileage


* For example, take the class variable in the mpg dataset. You might be interested to know how highway
varies across classes:
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot() ——
, ‘

4 |
40 ++ aaa

: ne ,
30 j

>
»
>

20 ae
? mr

6 >

2seater compact midsize minivan pickup subcompact suv


class
Fig. 5.3.6
To make the trend easier to see, we can reorder class based on the median value of hwy:
ggplot(data = mpg) + ar at
geom_boxplot(mapping = aes(x = reorder(class, hwy, FUN = median), y = hwy))
hwy

20

4 9

pickup suv minivan 2seater subcompact compact midsize


recorder(class, hwy, FUN = median)

Fig. 5.3.7

If you have long variable names, geom_boxplot() will work better if you flip it 90°. You can do that
with coord_flip().

Ww Tech Knowledge
Publications
RF Aland DS-1 (MU) 5-16 Exploratory Data Analysis
geplot(data = mpg) +
geom_boxplot(mapping = aes(x = reorder(class, hwy, FUN = median), y = hwy)) +
coord flip()

midsize +— |, _ | | [_ sf
al | }

|
=§ compact +— ; | |
5e :
| |
e :
x subcompact +—— ————— i oe
5 4
a | |

: 2seater : i |
E aRe|
w
oe ts oh e I
& minivan ® We
> BES ;
2
8@ SUV &
Peea -2-¢-t-8
1 |

pickup +-- als |


Sys °

20 30 40
hwy
Fig. 5.3.8

5.3.5 Two Categorical Variables


of observations for
To visualise the covariation between categorical variables, you'll need to count the number
geom_count():
each combination. One way to do that is to rely on the built-in
ggplot(data = diamonds) +

|
y = color))
geom_count(mapping = aes(x = cut,
-_-+—

tpt
| @—_@—__@—__@-_—__@___@

® 1000
@ 2000
o—
color
ro)

@ 3000
@ 4000

Fair Good
Very Good Premium Ideal
cut
Fig. 5.3.9
ww TechKuowledye
Publications
ay

EF Alandpsa (MU) 5-17 Exploratory Data Analysis

The size of each circle in the plot displays how many observations occurred at each combination of values,
Covariation will appear as a strong correlation between specific x values and specific y values.
Then visualise with geom_tile() and the fill aesthetic:
diamonds %>%
count(color, cut) %>%

gsplot(mapping = aes(x = color, y = cut)) +

geom_tile(mapping = aes(fill = n))

Ideal4

Premium;

a Very Good;

Good)

Fair 7

color

Fig. 5.3.10
If the categorical variables are unordered, you might want to use the seriation package to simultaneously
reorder the rows and columns in order to more clearly reveal interesting patterns. For larger plots, you might want to
try the d3heatmap or heatmaply packages, which create interactive plots.

5.3.6 Two Continuous Variables

e You've already seen one great way to visualise the covariation between two continuous variables: draw a
scatterplot with geom_point().

* You can see covariation as a pattern in the points. For example, you can see an exponential relationship between
the carat size and price of a diamond.

ggplot(data = diamonds) +
geom_point(mapping = aes(x = carat, y = price))

ney YechKuowledge
A

— Publications
a
i
r
Re Exploratory Data Analysis

| 15000
ps
;

eos
«

°
° ga
e e

10000 %
ee
e
] 8
j a
| °
5000

}
| 0
0 1 2 carat «3 4 5
Fig. 5.3.11
e Scatterplots become less useful as the size of your dataset grows, because points begin to overplot, and pile up
into areas of uniform black (as above). You've already seen one way to fix the problem: using the alpha aesthetic
to add transparency.
ggplot(data = diamonds) +
geom_point(mapping = aes(x = carat, y = price), alpha = 1 / 100)

Copy

15000

10000
price

5000

carat

Fig. 5.3.12
But using transparency can be challenging for very large datasets. Another solution is to use bin. Previously you
used geom_histogram() and geom_freqpoly()to bin in one dimension. Now you'll learn how to
use geom_bin2d() and geom_hex() to bin in two dimensions.
e geom_bin2d() and geom_hex() divide the coordinate plane into 2d bins and then use a fill color to display how
many points fall into each bin. geom_bin2d() creates rectangular bins. geom_hex() creates hexagonal
bins. You
will need to install the hexbin package to use geom_hex().

Ww TechKnowledye
Puolications
5-19 Exploratory Data Analysis
¥F Al and DS-1 (MU)

ggplot(data = smaller) +
geom_bin2d(mapping = aes(x = carat, y = price))
# install.packages("hexbin")
geplot(data = smaller) +
geom_hex(mapping = aes(x = carat, y = price))
20000

Ree Sarr i

15900
T8000. a

¢ 10000
19009 Q
a

8
a

5000 $000

0 0
T ' *

1 caral 2 | carat 2

Fig. 5.3.13
e Another option is to bin one continuous variable so it acts like a categorical variable. Then you can use one of the
techniques for visualising the combination of a categerical and a continuous variable that you learned about. For
example, you could bin carat and then for each group, display a boxplot:
ggplot(data = smaller, mapping = aes(x = carat, y = price)) +
geom_boxplot(mapping = aes(group = cut_width(carat, 0.1)))
¢ 4 ;3 :* &
e ‘ FE 3

15000 EL |

ha
8 10000 ! . a
|

o 4 Pr

EE
0
TTieeTT
1 f *

|
|
t

1 carat 2
Fig. 5.3.14

WF Tech Knowledge
Publications
¥F Aland DS-I (MU) Exploratory Data Analysis
5-20
cut_width(x, width), as used above, divides x into bins of width width. By default, boxplots look roughly the
same (apart from number of outliers) regardless of how many observations there are, so it's difficult to tell that
each boxplot summarises a different number of points. One way to show that is to make the width of the boxplot
proportional to the number of points with varwidth = TRUE.

5.3.7 Covariance and Correlation Matrices

When we have many quantitative variables the most common non-graphical EDA technique is to calculate all of
the pairwise covariances and/or correlations and assemble them into a matrix. Note that the covariance of X
with X is the variance of X and the correlation of X with X is 1.0.
For example the covariance matrix of table 5.3.1 tells us that the variances of X, Y , and Z are 5, 7, and 4
respectively, the covariance of X and Y is 1.77, the covariance of X and Z is -2.24, and the covariance of Y and Z
is 3.17,
Similarly the correlation matrix in _gure 4.6 tells us that the correlation of X and Y is 0.3, the correlation of X
and Z is -0.5. and the correlation of Y and Z is 0.6.
Table 5.3.1: Covariance Calculation

Subject ID Age Strength | Age-50 | Str-19 | Product


Jw 38 20 -12 +1 -12

JA 62 15 +12 -4 - 48

TJ 22 30 -28 +11 - 308


JMA 38 21 -12 +2 -24
JMO 45 18 -5 ~1 +5
JQA 69 12 +19 -7 - 133
AJ 75 14 +25 -5 -125
MVB 38 28 -12 +9 - 108
WHH 80 9 +30 -10 ~ 300
Jv 32 22 -18 +3 -54
JKP 51 20 +1 +1 +1

Total 0 0 ~1106
Table 5.3.2: A Covariance Matrix

X Y Z

X} 5.00 | 1.77 | -2.24

Y| 177 | 7.0 | 3.17

Z| -2.24 | 3.17 | 4.0


The correlation between two random variables is a number that runs from -1 through 0 to +1 and indicates a
strong inverse relationship, no relationship, and a strong direct relationship, respectively.

W TechKnowledye
Publications
WF Al and DS-I (MU) 5-21 Exploratory Data Analysis
SS
SS SSS

5.4 Degree of Freedom

Q,___ What Are Degrees of Freedom?

Degrees of freedom refers to the maximum number of logically independent values, which are values that have
the freedom to vary, in the data sample.
° — Degrees of freedom refers to the maximum number of logically independent values, which are values that have
the freedom to vary, in the data sample,
* Degrees of freedom are commonly discussed in relation to various forms of hypothesis testing in statistics, such
as a chi-square,
¢ — Calculating degrees of freedom is key when trying to understand the importance of a chi-square statistic and the
validity of the null hypothesis.
¢ Consider a data sample consisting of, for the sake of simplicity, five positive integers. The values could be any
number with no known relationship between them. This data sample would, theoretically, have five degrees of
freedom.
¢ Four of the numbers in the sample are {3, 8, 5, and 4} and the average of the entire data sample is revealed to be
6.
¢ This must mean that the fifth number has to be 10. It can be nothing else. It does not have the freedom to vary.So
the degree of freedom for this data sample is 4.
The formula for degrees of freedom equals the size of the data sample minus one:

Df = N-1

Where, Df Degree of freedom


N sample size

Degrees of freedom are commonly discussed in relation to various forms of hypothesis testing in statistics, such
as a chi-square. It is essential to calculate degrees of freedom when trying to understand the importance of a chi-
square statistic and the validity of the null hypothesis.
The F-test in ANOVA also tests group means. It uses the F-distribution, which is defined by the DF. However, you
calculate the ANOVA degrees of freedom differently because you need to find the numerator and denominator
DF.
e Analysis of variance (ANOVA) uses F-tests to statistically assess the equality of means when you have three or
more groups.

5.5 Statistical Methods for Evaluation Including ANOVA

Analysis of variance (ANOVA) assesses the differences between group means. It is a statistical hypothesis test
that determines whether the means of at least two populations are different. Ata minimum, you need a continuous
dependent variable and a categorical independent variable that divides your data into comparison groups to perform
ANOVA.

5.5.1 Simplest Form and Basic Terms of ANOVA Tests

The simplest type of ANOVA test is one-way ANOVA. This method is a generalization of t-tests that can assess the
a

difference between more than two group means.

Ww TechXnowledge
Publications
EF Aland DS-l (MU) —y—
Explorator Analysis
Data—
——5-22
————— ——
— SS

Boxplot of Score
40

35 4 mi

304 |

© 254
n Q
o
4 ;
204 |

15;

104
Method 4 Method 2 Method 3 Method 4
| Teaching Method

Fig. 5.5.1
ANOVA tells you whether the differences between group means are statistically significant.

Statisticians consider ANOVA to be a special case of least squares regression, which is a specialization of the
general linear model. All these models minimize the sum of the squared errors.

5.5.2 F Test
® The term F-test is based on the fact that these tests use the F-values to test the hypotheses. An F-statistic is the
ratio of two variances and it was named after Sir Ronald Fisher. Variances measure the dispersal of the data
points around the mean. Higher variances occur when the individual data points tend to fall further from the
mean.
e An F-value is the ratio of two variances, or technically, two mean squares. Mean squares are simply variances
that account for the degrees of freedom (DF) used to estimate the variance. F-values are the test statistic for F-
tests.

e Consider, Variances are the sum of the squared deviations from the mean. If you have a bigger sample, there are
more squared deviations to add up. The result is that the sum becomes larger and larger as you add in more
observations. By incorporating the DF, mean squares account for the differing numbers of measurements for
each estimate of the variance. Otherwise, the variances are not comparable, and the ratio for the F-statistic is
meaningless.
The F-test in One-Way ANOVA
We want to determine whether a set of means are all equal. To evaluate this with
an F-test, we need to use the
proper variances in the ratio. Here’s the F-statistic ratio for one-way ANOVA.
F = between proups variance
within group variance

W Tech Knowledge
Publications
is
Exploratory Data Analys
WF Aland ps-1 (Mu) 5-23
Let's consider following example.
Analysis of variance

Source | DF Adjss | AdjMs | F-Value | p-Value


Factor 3 43.62 14.540 3.30 | 0.031
Error 158.47 4.402

Total 39 202.09

Modei Summary

S R-sq R-sq(adj) R-sq (pred)

2.09805 21.58% 15.05% 3.19%

Means

Factor N Mean StDev 95 % CI

1 10 11.203 1,995 (9.857, 12.548)

2 10 8.938 1,980 (7.592, 10.283)

3 10 10.683 1.102 (9.337, 12.028)

4 10 8.838 1.879 (7.492, 10.184)

To be able to conclude that not all group means are equal, we need a large F-value to reject the null hypothesis.
Is ours large enough?
Atricky thing about F-values is that they are a unitless statistic, which makes them hard to interpret. Our F-value
of 3.30 indicates that the between-groups variance is 3.3 times the size of the within-group variance. The null
hypothesis value is that variances are equal, which produces an F-value of 1. Is our F-value of 3.3 large enough to
reject the null hypothesis?
We don’t know exactly how uncommon our F-value is if the null hypothesis is correct. To interpret individual F-
values, we need to place them in a larger context. F-distributions provide this broader context and allow us to
calculate probabilities.
First, let's assume that the null hypothesis is true for the population. At the population level, all four group
means are equal. Now, we repeat our study many times by drawing many random samples from this population
using the same one-way ANOVA design (four groups with 10 samples per group). Next, we perform one-way
ANOVA on all of the samples and plot the distribution of the F-values. This distribution is known as a sampling
distribution, which is a type of probability distribution.
If we follow this procedure, we produce a graph that displays the distribution of F-values for a population where
the null hypothesis is true. We use sampling distributions to calculate probabilities for how unlikely our sample
statistic is if the null hypothesis is true. F-tests use the F-distribution.
Fortunately, we don’t need to go to the trouble of collecting numerous random samples to create this graph!
Statisticians understand the properties of F-distributions so we can estimate the sampling distribution using the
F-distribution and the details of our one-way ANOVA design.
Our goal is to evaluate whether our sample F-value is so rare that it justifies rejecting the null hypothesis for the
entire population. We'll calculate the probability of obtaining an F-value that is at least as high as our study's
value (3.30).
say Tech Knowledge
Publications
: -] (MU
ef Aland DS iM) 5-24 Exploratory Data Analysis
This probability has a name—the P value! A low probability indicates that our sample data are unlikely when
the null hypothesis is true.

5,6__Self-Learning Topics : Implementation of graphical EDA methods


EDA is based heavily on graphical techniques. You can use graphical techniques to identify the most important
properties of a dataset. Here are some of the more widely used graphical techniques:
o Boxplots
o Histograms
o Normal probability plots
o Scatter plots

5.6.1 Box plots

You use box plots to show some of the most important features of a dataset, such as the following:
o Minimum value
o Maximum value
o Quartiles
Quartiles separate a dataset into four equal sections.
The first quartile (Q:) is a value such that the following is true: 25 percent of the observations in a dataset are
less than the first quartile. 75 percent of the observations are greater than the first quartile.

The second quartile (Qz) is a value such that : 50 percent of the observations in a dataset are less than the second
quartile. 50 percent of the observations are greater than the second quartile. The second quartile is also known
as the median.
ions in a dataset are less than the third
© The third quartile (Qs) is a value such that : 75 percent of the observat
quartile.
quartile.25 percent of the observations are greater than the third
ially different from the rest of
© You can also use box plots to identify outliers. These are values that are substant
so it's important to identify them before
the dataset. Outliers can cause problems for traditional statistical tests,
performing any type of statistical analysis.

5.6.2 Histograms
ws. With a histogram, the
e You use histograms to gain insight into the probability distribution that a dataset follo
represented by a vertical bar.
dataset is organized into a series of individual values or ranges of values, each
m, it's easy to see
* The height of the bar shows how frequently a value or range of values occurs. With a histogra
how the data is distributed.

5.6.3 Scatter Plots


» Ascatter plot is a series of points that show how two variables are related to each other. A random scatter of
points indicates that the two variables are unrelated, or that the relationship between them is very weak.
» If the points closely resemble a straight line, this indicates that the relationship between the two variables is
approximately linear.
e Two variables are linearly related if they can be described with the equation Y = mX + b.

W Tech Knowledge
Puolicartions
SF Aland Ds-1 (MU) 5-25 Exploratory Data Analysis

Xis the independent variable, and Yis the dependent variable. mis the slope, which represents the change
in ¥ due to a given change in X. b is the intercept, which shows the value of Y when X equals zero. The fig. 5.6.1 shows a
scatter plot between two variables in which the relationship appears to be linear.
°
oO
25 - °
oO

w 20-7 oO S
> °
154 S
10, °
Qo

2 4 6 8 10
x

Fig. 5.6.1 : Scatter plot of a linear relationship.

e The points on the scatter plot very nearly form a straight line. It bends a little to the left and bends a little to the
right, but it's roughly straight. This shows that the relationship is linear, with a positive slope.
e — The following fig. 5.6.2 shows a scatter plot between two variables in which Y appears to be rising more rapidly
than x.

250 5 o

200 + o
N 4604 °
> 50 5

100 4 °
°
50 ] , 0 ° °

2 4 6 8 10
x

Fig. 5.6.2 : Scatter plot of a nonlinear relationship.

e See the curve? This relationship is clearly not linear. It is in fact a quadratic relationship. A quadratic relationship
takes the form Y = aX? + bX +c.
e The following fig. 5.6.3 shows a scatter plot in which there doesn’t appear to be any relationship
between X and Y.
Oo

8- °

J o 0
p 6 , °
474° 6
2
2 kl T as T

2 4 6 8 10
xX

Fig. 5.6.3 : Scatter plot with no relationship between the variables X and Y.

The variables in the scatter plot shown are unrelated or independent; you can see this by the lack of any pattern
in the data.

W TechKnowledge
Publicacions
BF Aland DS-1 (MU)
5-26 Exploratory Data Analysis
e In addition to showing the relationship betwe
en two v arlables, a scatter plot can
outliers. The following fig.5.6.4 shows a also show the presence of
d ataset with one observation that is substantially
other observations, different from the

504 3
40-
¥ 30 oo
20 4 ; 5 0 °

10 + 9
2 4 6 8 10
xX

Fig.5.6.4 : Scatter plot with an outlier.


° The outlier point needs to be investigated further to determine
whether it's the result of an error or other
problems. It's possible that the outlier will
need to be removed from the data.
5.6.4 Normal probability plots
¢ Normal probability plots are used to see how close
ly the elements of a dataset follow the normal distr
The assumption of normality is common in many disciplines. For example, it's ibution.
often assumed in finance and
economics that the returns to stocks are normally distributed. The assumption of normality
is very convenient,
and many statistical tests are based on this assumption.
Applying statistical tests that assume normality to a non-normal dataset would give extremely
questionable
results. Therefore, it's important to determine whether or not the data is normally distributed
before conducting
any of these statistical tests.

Review Questions

Q.1__..Write short note on Exploratory Data Analysis.

Q.2° Explain the goals of the EDA process.

Q.3 Define the following terms : 1. Variable 2. Value 3. Observation

Q.4 Write short note on Visualising distributions.

Q.5 Explain typical value.

[email protected] Write short note on unusual value.

Q.7 Write short note on missing value.


Q.8 Explain types of exploratory data analysis.

[email protected] What are the tools required for exploratory data analysis.
@.10 Write short note on Graphical/Non graphical Methods.

O00
sR

INTRODUCTION TO
MACHINE LEARNING

Introduction to Machine Learning, Types of Machine Learning: Supervised (Logistic Regression, Decision Tree, Support
Vector Machine) and Unsupervised (K Means Clustering, Hierarchical Clustering, Association Rules) Issues in Machine
learning, Application of Machine Learning Steps in developing a Machine Learning Application.
Self-Learning Topics : Real world case studies on machine learning

6.1 Introduction to Machine Learning

Machine learning is a subfield of artificial intelligence (AI). The goal of machine learning generally is to
understand the structure of data and fit that data into models that can be understood and utilized for deriving
business decisions.

Although machine learning is a field within computer science, it differs from traditional computational
approaches. In traditional computing, algorithms are sets of explicitly programmed instructions used by
computers to calculate or problem solve.
Machine learning algorithms instead allow for computers to train on data inputs and use statistical analysis in
order to output values that fall within a specific range. Because of this, machine learning facilitates computers in
building models from sample data in order to automate decision-making processes based on data inputs.
Machine learning is a continuously developing field.
Any technology in use today has benefitted from machine learning. Facial recognition technology allows social
media platforms to help users tag and share photos of friends. Optical character recognition (OCR) technology
converts images of text into movable type.
Recommendation engines, powered by machine learning, suggest what movies or television shows to watch next
based on user preferences. Self-driving cars that rely on machine learning to navigate may soon be available to
consumers.

6.2 Types of Machine Learning


In machine learning, tasks are generally classified into broad categories. These categories are based on how
learning is received or how feedback on the learning is given to the system developed.
Two of the most widely adopted machine learning methods are supervised learning which trains algorithms
based on example input and output data that is labelled by humans, and unsupervised learning which provides
the algorithm with no labelled data in order to allow it to find structure within its input data.
<F Aland DS-1 (MU)

6-2 Introduction to Machine Learning
ee

6.2.1 Supervised Learning

¢ In supervised learning, the computer is provided with example inputs that are labeled with their desired
tputs. T The purpose of this method is for the algorithm to be able to “learn” by comparing its actual output with
outputs.
the “taught outputs to find errors, and modify the model accordingly.
Supervised learning therefore uses
patterns to predict label values on additional
unlabeled data,
images of sharks Jabeled as fish and
er vam with supervised learning, an algorithm may be fed data with
be
” cok, oceans labeled as water. By being trained on this data, the supervised learning algorithm should
able to later identify unlabeled shark images as fish and unlabeled ocean images as water.
A cornman use case of supervised learning is to use historical data to predict statistically likely future events. It
may use historical stock market information to anticipate upcoming fluctuations, or be employed to filter out
nk emails. In supervised learning, tagged photos of dogs can be used as input data to classify untagged photos
of dogs.
Correlation and regression are commonly used techniques for investigating the relationship among quantitative
variables. Correlation is a measure of association between two variables that are not designated as either
dependent or independent. Regression at a basic level is used to examine the relationship between one
dependent and one independent variable. Because regression statistics can be used to anticipate the dependent
variable when the independent variable is known, regression enables prediction capabilities.

6.2.1(A) Logistic Regression

* — Logistic regression is another technique borrowed by machine learning from the field of statistics.
¢ — Itis the go-to method for binary classification problems (problems with two class values). In this post you will
discover the logistic regression algorithm for machine learning.
e _ Logistic regression is named for the function used at the core of the method, the logistic function.
e The logistic function, also called the sigmoid function was developed by statisticians to describe properties of
. population growth in ecology, rising quickly and maxing out at the carrying capacity of the environment. It’s an
S-shaped curve that can take any real-valued number and map it into a value between 0 and 1, but never exactly
at those limits.
1/ (1+ e*-value)
Where e is the base of the natural logarithms and value is the actual numerical value that you want to transform.
&
Below is a plot of the numbers between -5 and 5 transformed into the range 0 and 1 using the logistic function.

. [
~—_f
7 Fig. 6.2.1 : Logical Regression

W Tech Knowledge
Pubtications
SF Aland DS-1 (MU) 6-3 Introduction to Machine Learning

* Logistic regression uses an equation as the representation, very much like linear regression.
Greek capital letter
* — Input values (x) are combined linearly using weights or coefficient values (referred to as the
Beta) to predict an output value (y).
e Akey difference from linear regression is that the output value being modelled is a binary values (0 or T)rather
than a numeric value.
Below is an example logistic regression equation :
y = e(b0+b1*x) / (1 + e%(b0 + b1*x))
for the single input
Where y is the predicted output, b0 is the bias or intercept term and b1 is the coefficient
value (x).
be learned from
e Each column in your input data has an associated b coefficient (a constant real value) that must
your training data.
the
e The actual representation of the model that you would store in memory or in a file is the coefficients in
equation (the beta value or b’s).
e Logistic regression models the probability of the default class (e.g. the first class).
e For example, if we are modeling people’s sex as male or female from their height, then the first class could be
male and the logistic regression model could be written as the probability of male given a person’s height, or
more formally:
P(sex = male|height)
e Written another way, we are modeling the probability that an input (X) belongs to the default class (Y=1), we can
write this formally as:

P(X) = P(Y = 1|X)


Note that the probability prediction must be transformed into a binary values (0 or 1) in order to actually make a probability
prediction. .

e Logistic regression is a linear method, but the predictions are transformed using the logistic function. The
impact of this is that we can no longer understand the predictions as a linear combination of the inputs as we
can with linear regression, for example, continuing on from above, the model can be stated as:
p(X) = e4(b0+b1*X) / (1+ e*(b0 + b1*X)).

6.2.1(B) Decision Tree 4

¢ For general use, decision trees are employed to visually represent decisions and show or inform decision
making. When working with machine learning and data mining, decision trees are used as a predictive model.
These models map observations about data to conclusions about the data’s target value.
¢ The goal of decision tree learning is to create a model that will predict the value of a target based on input
variables.
e Inthe predictive model, the data’s attributes that are determined through observation are represented by the
branches, while the conclusions about the data’s target value are represented in the leaves,
e When “learning” a tree, the source data is divided into subsets based on an attribute value test, which is repeated
on each of the derived subsets recursively. Once the subset at a node has the equivalent value as its target value
has, the recursion process will be complete.
e Let's look at an example of various conditions that can determine whether or not someone should go fishing.
This includes weather conditions as well as barometric pressure conditions.
WZ TechXnowledge
Pudlications
d DS-1(MU Introduction to Machine Learning
14 Alan (MU) 6-4 ———
—_— —

° ihe sim een tree depicted in the Fig, 6.2.2, an example is classified by sorting it through the tree to
the appropriate leaf node. This then returns the classification assoclated with the particular leaf, which in this
suitable for going
a Yes or a No. The tree classifies a day’s conditions based on whether or not it is
se is either a Yes
hi ad
ishing.
e — Atrue classification tree data set would have a lot more features than what is outlined above, but relationships
should be straightforward to determine.
e When working with decision tree learning, several determinations need to be made, including what features to
choose, what conditions to use for splitting, and understanding when the decision tree has reached a clear
ending.

Conditions

Barometric Overcast Rain

Rising Falling <> Light Heavy

CSD QS &
Fig. 6.2.2 : Decision Tree Example

6.2.1(C) Support Vector Machine


uses classification algorithms for
e Asupport vector machine (SVM) is a supervised machine learning model that
machine learning algorithm that
two-group classification problems. Support vector machine (SVM) is a type of
ms and add features that
can be used for classification and regression tasks. They build upon basic ML algorith
make them more efficient at various tasks.
© Support vector machines can be used in a variety of tasks, including anomaly detection, handwriting recognition,
and text classification. Because of their flexibility, high performance, and compute efficiency, SVMs have become
a mainstay of machine learning and an important addition to the ML engineer’s toolbox.
e Compared to newer algorithms like neural networks, they have two main advantages : higher speed and better
performance with a limited number of samples (in the thousands). This makes the algorithm very suitable for
text classification problems, where it's common to have access to a dataset of at most a couple of thousands of
tagged samples.
e The basics of Support Vector Machines and how it works are best understood with a simple example. Let’s
imagine we have two tags: red and blue, and our data has two features: x and y. We want a classifier that, given a
pair of (x,y) coordinates, outputs ff it’s either red or blue. We plot our already labelled training data ona plane:
e A support vector machine takes these data points and outputs the hyperplane which in two dimensions it’s
simply a line that best separates the tags, This line is the decision boundary: anything that falls to one side of it
we will classify as blue, and anything that falls to the other as red.

TechKnowledya
Pubtications
= Aland DS-I (MU) 68 Introduction to Machine Learning
WwW

Best hyperplane

o oO Sa~
O
O
A
Oo
/ AA
OSs A
A

Fig. 6.2.3 : In 2D, the best hyperplane is simply a line

However, you can create many different ML models to separate the two classes, and not all of them are of equal
value.
e In some datasets, the classes are not “linearly separable,” which means they can’t be separated with a straight
line. In such cases, ML engineers use “kernel tricks,” mathematical transformations that can make the data
linearly separable.
e — Similarly, the polynomial kernel can be applied to other datasets where the boundaries between classes are
more complicated. For example, the dataset below was generated with the make_moons() function of Python’s
Scikit-Learn machine learning library. Evidently, the two classes are not linearly separable. But an SVM with a
third-degree polynomial kernel can clearly separate the two classes with a smooth boundary (right figure).

15 1.57
Q le

1.0 a C @ aa iy a
D Ls} Pane di a

0.5+—1ng i a
Xo a} i an ne pa 4
0.0 ao a “| 4 .| og Pa 4

Q A a4 ° a 4 a 4
weal ayaa
05 Mh te
4 |
0 +
-1.5 -1.0-05 00 05 10 15 20 25 45-10-05 00 GS 10 15 20 25
Xy xX. 1
Fig. 6.2.4: The kernel trick can help SVMs classify classes that are not linearly separable

e Despite the advent of more advanced machine learning algorithms such as deep neural networks, support vector
machines remain popular because of their fast training time, low compute requirements, and ability to learn with
fewer training examples.

6.2.2. Unsupervised Learning


e In unsupervised learning, data is unlabeled, so the learning algorithm is left to find commonalities among its
input data. As unlabeled data are more abundant than labeled data, machine learning methods that facilitate
unsupervised learning are particularly valuable.

Ww TechKnowtadga
Publicacions
SF Aland DS-1 (MU) 6-6 Introduction to Machine Learning
SS

The goal of unsupervised learning may be as straightforward as discovering hidden patterns within a dataset,
but it may also have a goal of feature learning, which allows the computational machine to automatically
discover the representations that are needed to classify raw data.
Unsupervised learning is commonly used for transactional data. You may have a large dataset of customers and
their purchases, but as a human you will likely not be able to make sense of what similar attributes can be drawn
from customer profiles and their types of purchases,
With this data fed into an unsupervised learning algorithm, it may be determined that women of a certain age

range who buy unscented soaps are likely to be pregnant, and therefore a marketing campaign related to
pregnancy and baby products can be targeted to this audience in order to increase their number of purchases.
Without being told a “correct” answer, unsupervised learning methods can look at complex data that is more
expansive and seemingly unrelated in order to organize it in potentially meaningful ways.
Unsupervised learning is often used for anomaly detection including for fraudulent credit card purchases, and
recommender systems that recommend what products to buy next. In unsupervised learning, untagged ph otos
of dogs can be used as input data for the algorithm to find likenesses and classify dog photos together.

6.2.2(A) K Means Clustering

K-means algorithm explores for a preplanned number of clusters in an unlabelled multidimensional dataset, it
concludes this via an easy interpretation of how an optimized cluster can be expressed.

Primarily the concept would be in two steps;

o Firstly, the cluster centre is the arithmetic mean (AM) of all the data points associated with the cluster.
These two
o Secondly, each point is adjoint to its cluster centre in comparison to other cluster centres.
interpretations are the foundation of the k-means clustering model.
In simple terms, k-means clustering enables us to cluster the data into several groups by detecting the distinct
categories of groups in the unlabelled datasets by itself, even without the necessity of training of data.
This is the centroid-based algorithm such that each cluster is connected to a centroid while following the
objective to minimize the sum of distances between the data points and their corresponding clusters.
As an input, the algorithm consumes an unlabelled dataset, splits the complete dataset into k-number of clusters,
and iterates the process to meet the right clusters, and the value of k should be predetermined.
Specifically performing two tasks, the k-means algorithm

o Calculates the correct value of K-centre points or centroids by an iterative method

o Assigns every data point to its nearest k-centre, and the data points, closer to a particular k-centre, make a
cluster. Therefore, data points, in each cluster, have some similarities and far apart from other clusters.
By specifying the value of k, you are informing the algorithm of how many means or centres you are looking for.
Again repeating, if k is equal to 3, the algorithm accounts it for 3 clusters.

Following are the steps for working of the k-means algorithm;


o K-centres are modelled randomly in accordance with the present value of K.
o K-means assigns each data point in the dataset to the adjacent centre
and attempts to curtail Euclidean
distance between data points. Data points are assumed to be present
in the peculiar cluster as if it is nearby
to centre to that cluster than any other cluster centre.

TechKnowledga
Publications
w Al and DS-I (MU) 6-7 Introduction to Machine Learning

After that, k-means determines the centre by accounting the mean of all data points referred to that cluster
centre. It reduces the complete variance of the intra-clusters with respect to the prior step. Here, the “means”
defines the average of data points and identifies a new centre in the method of k-means clustering.
The algorithm gets repeated among the steps 2 and 3 till some paradigm will be achieve d such as the sum of
distances in between data points and their respective centres are diminished, an appropriate number of
iterations is attained, no variation in the value of cluster centre or no change in the cluster due to data points.

Stopping Criteria for K-Means Clustering

Ona core note, three criteria are considered to stop the k-means clustering algorithm

If the centroids of the newly built clusters are not changing

An algorithm can be brought to an end if the centroids of the newly constructed clusters are not altering. Even
after multiple iterations, if the obtained centroids are same for all the clusters, it can be concluded that the
algorithm is not learning any new pattern and gives a sign to stop its execution/training toa dataset.

If data points remain in the same cluster

The training process can also be halt if the data points stay in the same cluster even after the training the
algorithm for multiple iterations.

If the maximum number of iterations have achieved

At last, the training on a dataset can also be stopped if the maximum number of iterations is attained, for
example, assume the number of iterations has set as 200, then the process will be repeated for 200 times (200
iterations) before coming to end.

Key features of k-means clustering :

It is very smooth in terms of interpretation and resolution.


NP

For a large number of variables present in the dataset, K-means operates quicker than Hierarchical clustering.
While redetermining the cluster centre, an instance can modify the cluster.
Ww

K-means reforms compact clusters.


fF

It can work on unlabeled numerical data.


aA

Moreover, it is fast, robust and uncomplicated to understand and yields the best outcomes when datasets are
A

well distinctive (thoroughly separated) from each other.


The following are a few limitations with K-Means clustering;
Disadvantages of K-means Clustering
The algorithm demands for the inferred specification of the number of cluster/ centres.
An algorithm goes down for non-linear sets of data and unable to deal with noisy data and outliers.
wo

It is not directly applicable to categorical data since only operatable when mean is provided.
Also, Euclidean distance can weight unequally the underlying factors.
mk

The algorithm is not variant to non-linear transformation, i.e provides different results with different portrayals
of data.
Ww Tech Knowledge
Publications
— Es

ine Learning
te {and DS-1 (MU) Introduction to Mach
6-8
=6.2.2(B) Hierarchical Clustering
i
i
Her hical i is an alternative approach to k-means clustering for Identifying groUPs ina data set. In
a cl ustering
does not require us
hlerarch ical clusteri ng will create a hierarchy of clusters and therefore
contrast to k-means,
to pre-specify the number of clusters,
k-means clustering in that its results can be
» Furthermore, hierarchical clustering has an added advantage over
.
based representation called a dendrogram
easily visualized using an attractive tree- DIANA
, Cluster Dendrogram
{

sais a
SeeSs CSSeg G39
Gs

se2 aseOfc eee


ecu cots a 1s Oo

cea
ODODESELOUVAUVAL

hagpesepeaggecspsc ueg aas i" | sac


~~
SOS>RESR OH FO 2ORO
a io
Besu
dGg
Beoelss
BLE SLE s 9 CSG
2a5 8

OOF Ng TO Pees
's q2t
= =Os BSE
GSE 92 FOE GY ode CQc SSR ir= 2e8 ES80

Ssos § Suee°S=0esegsgsso |
LSL ESE S 2chsse
2 cefy-~
Eoss sr aa ot 3
SOS ES QReSe cE OMS OSG
6CHSxXSSs
2
oO5sO9 2 e-3
-3 & ZO ,
AGNES 32s s
o2Zz g

own) clustering.
m-up) versus DIANA (top-d
Fig. 6.2.5 : AGNES (botto
ve NESting) works in a bottom-up
e cl us te ri ng : Co mmo nly referred to as AGNES (AGglomerati
1. Agglomerativ (leaf). At each step of the
That is, each observ ation is initiall y considered as a single-element cluster
manner. bigger cluster (nodes). This
most similar are combined into a new
algorithm, the two clusters that are the a tree which
of just one single big cluster (root). The result is
procedure is iterated until all points are a member
4 dendrogram.
can be displayed using lysis) works in a top- down
rin g : Co mm on ly ref err ed to as DIANA (Divise ANA
Divisive hierarchical cluste
ich all observations are included in a
2.
ers e of AG NE S. It beg ins wit h the roo t, in wh
manner, DIANA is like the rev sters that are considered mos
t
rent cluster is split into two clu
algorithm, the cur ster.
hetgle
sin cluste
erogen eachproste
eour.s.AtThe is iterated until all observations are in their own clu
cesp s OF

WF Tech Knowledge
Publications
SF Aland DS-I (MU) 6-9 Introduction to Machine Learning

K-means vs Hierarchical Clustering

1, K-means clustering produces a specific number of clusters for the disarranged and flat catase where
Hierarchical clustering builds a hierarchy of clusters, not for just a partition of objects under various Clustering
methods and applications.

2. K-means can be used for categorical data and first converted into numeric by assigning rank, where Hierarchical
clustering was selected for categorical data but due to its complexity, a new technique is considered to assign
rank value to categorical features,
3. _K-means are highly sensitive to noise in the dataset and perform well than Hierarchical clustering where it
is less sensitive to noise in a dataset.
4. Performance of the K-Means algorithm increases as the RMSE decreases and the RMSE decreases as the number of
clusters increases so the time of execution increases, in contrast to this, the performance of Hierarchical
clustering is less.
5. K-means are good for a large dataset and Hierarchical clustering is good for small datasets.

6.2.2(C) Association Rules

e Association rule learning is a rule-based machine learning method for discovering relations between variables in
large databases. Goal is to identify strong relations discovered in datasets using some measures such as
confidence or lift.

e Anassociation rule is an implication expression of the form X-+Y, where X and Y are seperate item sets. A more
concrete example based on consumer behaviour would be {Diapers}—{Beer} suggesting that people who buy
diapers are also likely to buy beer. To evaluate the "interest" of such an association rule, different metrics have
been developed. The current implementation make use of the confidence and lift metrics as which we just
mentioned above linkcode
e [fa customer buys bread, he’s 70% likely of buying milk
e In the above association rule, bread is the antecedent and milk is the consequent. Simply put, it can be
understood as a retail store’s association rule to target their customers better. If the above rule is a result of a
thorough analysis of some data sets, it can be used to not only improve customer service but also improve the
company’s revenue.
e Additional to above the association rule based learning techniques are also used in many applications such as
recommendation systems, medical diagnosis, protein sequence, census data or even crime prevention.

¢ The Apriori algorithm is one of the main technique of association rule mining, which can be basically described
as finding the most frequent itemsets in a dataset.
Main Concepts of Association Rules / Apriori Algorithm
1. Support

Support is an indication of how frequently the item set appears in the dataset. In other words, this is an indication of
how popular an itemset is in a dataset
Transactions containing both X and Y
Support ({X}— {Y}) = Total number of transactions

Ww Tech Knowledge
Publicacians
SF Aland Ds-1(MU) Introduction to Machine Learning
6-10
2. Confidence

Confidence is an indication of how often the rule has been found to be true In other words, confidence says how
likely item Y is purchased when item X is purchased

Confidenc - Transactions containing both X and Y


©} > 1) Transactions containing X
Lift

Lift is a metric to measure the ratio of X and Y occur together to X and Y occurrence if they were statistically
independent. In other words, lift illustrates how likely item Y is purchased when item X is purchased, while
controlling for how popular item Y is.
A Lift score that is close to 1 indicates that the antecedent and the consequent are independent and occurrence
of antecedent has no impact on occurrence of consequent.
A Lift score that is bigger than 1 indicates that the antecedent and consequent are dependent to each other, and
the occurrence of antecedent has a positive impact on occurrence of consequent.
A Lift score that is smaller than 1 indicates that the antecedent and the consequent are substitute each other that
means the existence of antecedent has a negative impact to consequent or visa versa.

Therefore having lift bigger than 1 is critial for proving associations


. _F _P(XandY) _confidence(X—Y) confidence (YX)
Lift(kx- Y) = lift(¥—xX)= P(X) x P(Y) ~"support(Y) - —_ support (X)

Conviction

Conviction measures the implication strength of the rule from statistical independence Conviction score is a ratio
between the probability thatX occurs without Y while they were dependent and the actual probability of X existence
without Y. For instance; if (French fries) --> (beer) association has a conviction score of 1.8; the rule would be
incorrect 1.8 times as often (80% more often) if the association between totally independent.
1-support (C)
Conviction (A— C) = 1 ~ confidence (A— C)’ range
: [0, -%]

Consequents & Antecedents

The IF component of an association rule is known as the antecedent. The THEN component is known as the
consequent. The antecedent and the consequent are disjcint; they have no items in common.

=p=
Antecedents Consequents

{
Fig. 6.2.6

6.3 Issues In Machine Learning

1. Poor Quality of Data


e Data plays a significant role in the machine learning process. One of the significant issues that machine
jearning professionals face is the absence of good quality data. Unclean and noisy data can make the whole
process extremely exhausting.

wy Tech Knowledge
Publications
Learning
Introduction to Machine
SF Aland DS-1 (MU) 6-11
We don't want our algorithm
R to make inaccurate or faulty pred ictions. Hence the quality of data is essential
yp ssin g which includes
to enhance the output. Therefore, we need to ensure that the process of data preproce ‘ererutmiost Level
th the utmo ve
removing outliers, filtering missing values, and removing unwanted features, is done wi
of perfection.
2. Under fitting of Training Data
e relationship between input and output
This process occurs when data is unable to establish an accurat
variables. It simply means trying to fit in undersized Jeans.
It signifies the data is too simple to establish a precise relationship. To overcome this issue:
o = Maximi ze
the trainin g time
Enhance the complexity of the model
98

Add more features to the data


6.98

Reduce regular parameters


o

o Increasing the training time of model


3. Over fitting of Training Data
Over fitting refers to a machine learning model trained with a massive amount of data that negatively affect
its performance. It is like trying to fit in Oversized jeans. Unfortunately, this is one of the significant issues
faced by machine learning professionals.
This means that the algorithm is trained with noisy and biased data, which will affect its overall
performance. Let’s understand this with the help of an example. Let's consider a model trained to
differentiate between a cat, a rabbit, a dog, and a tiger.
The training data contains 1000 cats, 1000 dogs, 1000 tigers, and 4000 Rabbits. Then there is a
considerable probability that it will identify the cat as a rabbit. In this example, we had a vast amount of
data, but it was biased; hence the prediction was negatively affected.
We can tackle this issue by:
o Analyzing the data with the utmost level of perfection
© Use data augmentation technique
o Remove outliers in the training set
o _ Select a model with lesser features
4. Machine Learning is a Complex Process
The machine learning industry is young and is continuously changing. Rapid hit and trial experiments are
being carried on. The process is transforming, and hence there are high chances of error which makes the
learning complex. . —=
It includes analyzing the data, removing data bias, training data, applying complex mathematical
calculations, and a lot more. Hence it- is‘a really complicated process which is another big challenge for
Machine learning professionals. :
5. Lack of Training Data —
The most important task you need to do in the machine learning process is to train the data to achieve an
accurate output. Less amount training data will produce inaccurate or too biased predictions. Let us
understand this with the help of an example. Consider a machine learning algorithm similar to training a
child.

ay Tech Knowledge
Pudlications
BF Aland Ds-1 (MU) 6-12 Introduction to Machine Learning

One day you decided to explain to a child how to distinguish between an apple and a watermelon. You will
take an apple and a watermelon and show him the difference between both based on their color, shape, and
taste. In this way, soon, he will attain perfection in differentiating between the two.
But on the other hand, a machine-learning algorithm needs a lot of data to distinguish. For complex
problems, it may even require millions of data to be trained. Therefore we need to ensure that Machine
learning algorithms are trained with sufficient amounts of data.
6. Slow Implementation
This is one of the common issues faced by machine learning professionals. The machine learning models
are highly efficient in providing accurate results, but it takes a tremendous amount of time. Slow programs,
data overload, and excessive requirements usually take a lot of time to provide accurate results.
Further, it requires constant monitoring and maintenance to deliver the best output.
7. Imperfections in the Algorithm When Data Grows
So you have found quality data, trained it amazingly, and the predictions are really concise and accurate.
The model may become useless in the future as data grows. The best model of the present may become
inaccurate in the coming Future and require further rearrangement.
So need regular monitoring and maintenance to keep the algorithm working. This is one of the most
exhausting issues faced by machine learning professionals.

6.3.1 Application of Machine Learning

eae

Automatic FF
“aus > Language Pet oo
-\ Transtation Rey 2k he

: Recognition a

Application of
Machine leaming

( “Product ae A
-fecommend: -®
: -ations | é

mA Malware Ve
ee\ Filtering: |:

W TechKnowledge
Pablicatiaons
RT SE LEE “S

FA land DS L 1 (MU) 6-13


Introduction to Machine Learning
eS
1. Image Recognition
* — Image recognition is one of the most common applications of machin e learning, It is used to identify objects
Persons, places, digital images, etc. The popular use case ecognition and face detection
of image 1
is, Automatic friend tagging suggesti
on:
° — Facebook provides us a feature of auto we upload a photo with oy,
friend tagging suggestion. Whenever
Facebook friends, then we automatically get and the technology behing
a tagging suggestion with name:
this is machine learning's face detection and recognition
algorithm.
* It is based on the Facebook project named "Deep Face," which is responsib le for face recognition ang
person identification in the picture.
2. Speech Recognition
. / ognition, and it'
* While using Google, we get an option of "Search by voice," it comes under speech recagn sa
popular application of machine learning.
"
* — Speech recognition is a process of converting voice instructions into text, and it is also known as Speech to
text’, or "Computer speech recognition." At present, machine learning algorithms are widely used by
various applications of speech recognition. Google assistant, Siri, Cortana, and Alexa are using speech
recognition technology to follow the voice instructions.
3. Traffic prediction

* _ If we want to visit a new place, we take help of Google Maps, which shows us the correct path with the
shortest route and predicts the traffic conditions.

* — It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily congested with
the help of two ways :
o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.

e Everyone who is using Google Map is helping this app to make it better. It takes information from the user
and sends back to its database to improve the performance.
4. Product recommendations

* Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some product
on Amazon, then we started getting an advertisement for the same product while internet surfing on the
same browser and this is because of machine learning.
* Google understands the user interest using various machine learning algorithms and suggests the product
as per customer interest.
e As similar, when we use Netflix, we find some recommendations for entertainment series, movies, etc.,
and
this is also done with the help of machine learning.
5. Self-driving cars:
¢ One of the most exciting applications of machine learning is self-driving cars. Machine learning plays a
significant role in self-driving cars. Tesla, the most popular car manufacturing company is working on self-
driving car.
e It is using unsupervised learning method to train the car models to detect people and objects while driving.

we TechKnowlsdge
Publications
SF Aland DS-1 (MU) 6-14 Introduction to Machine Learning
6. Email Spam and Malware Filterin
g
Whenever we receive a new email, it is filtered automatically as important, normal, and spam. We always
receive an important mail in our Inbox with the impo tant symbol and spam emails in our spam box, and the
technology behind this is Machine learning, Below are some spam filters used by Gmail :
oO Content Filter

o Header filter
oO General blacklists filter
Oo Rules-based filters
© Permission filters
Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree, and Naive Bayes
classifier are used for email spam filtering and malware detection.

7. Virtual Personal Assistant

We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As the name
suggests, they help us in finding the information using our voice instruction. These assistants can help us in
various ways just by our voice instructions such as Play music, call someone, Open an email, Scheduling an
appointment, etc.

These virtual assistants use machine learning algorithms as an important part.


These assistant record our voice instructions, send it over the server on a cloud, and decode it using ML
algorithms and act accordingly.

8. Online Fraud Detection

Machine learning is making our online transaction safe and secure by detecting fraud transaction.
Whenever we perform some online transaction, there may be various ways that a fraudulent transaction
can take place such as fake accounts, fake ids, and steal money in the middle of a transaction. So to detect
this, Feed Forward Neural network helps us by checking whether it is a genuine transaction or a fraud
transaction.

For each genuine transaction, the output is converted into some hash values, and these values become the
input for the next round. For each genuine transaction, there is a specific pattern which gets change for the
fraud transaction hence, it detects it and makes our online transactions more secure.

9. Stock Market trading

Machine learning is widely used in stock market trading. In the stock market, there is always a risk of up and
downs in shares, so for this machine learning's long short term memory neural network is used for the
prediction of stock market trends.

10. Medical Diagnosis

In medical science, machine learning is used for diseases diagnoses. With this, medical technology is
growing very fast and able to build 3D models that can predict the exact position of lesions in the brain.

It helps in finding brain tumours and other brain-related diseases easily,

Ww Tech Knowledge
Publications
BF Aland DS-1 (MU) Introduction to Machine Learning
6-15 _
11. Automatic Language Translation
° Nowadays, if we visit a new place and we are not aware of the language then it‘tis oenot ages.
a problem at sall, cuts
oocle' as for
our known ear ine Learal e that
this also machine learning helps us by converting the text into
which Is 4 Neura
(Google Neural Machin e Translation) provide this feature,
translates the text into our familiar language, and it called as automatici translati ion.
:
ing algorithm, which is
* — The technology behind the automatic translation is a sequence to sequence learning
, language.
used with image recognition and translates the text from one language to another !anguae
6.3.2 Steps in developing a Machine Learning Application
ied, . sequenc es. Below are the
Building a machine learning application is an iterative process and follows a set of
steps involved in for developing machine learning applications:
1. Problem framing
This first step is to frame a machine learning problem in terms of what we want to predict and what kind of
observation data we have to make those predictions. Predictions are generally a label or a target orev it may
be a yes/no label (binary classification) or a category (multiclass classification) or a real number (regression).
Collect and clean the data

* — Once we frame the problem and identify what kind of historical data we have for prediction modeling, the
next step is to collect the data from a historical database or from open datasets or from any other data
sources,
¢ Not all the collected data is useful for a machine learning application. We may need to clean the irrelevant
data, which may affect the accuracy of prediction or may take additional computation without aiding in the
result.
Prepare data for ML application
Once the data ready for the machine learning algorithm, we need to transform the data in the form that the ML
system can understand. Machines cannot understand an image or text. We need to convert it into numbers. It
also requires building data pipeline depending on the machine learning application needs.
Feature engineering
e Sometimes a raw data may not reveal all the facts about the targeted label. Feature engineering is a
technique to create additional features combining two or more existing features with an arithmetic
operation that is more relevant and sensible.
e For example: In a compute engine, it is common for RAM and CPU usage to reach 95%, but something is
messy when RAM usage is at 5% and CPU is at 93%. We can use a ration of RAM to CPU usage as a new
feature, which may provide a better prediction. If we are using deep learning, it will automatically build
features itself; we do not need explicit feature engineering.
5. Training a model
e Before we train the model, we need to split the data into training and evaluation sets, as we need to monitor
how well a model generalizes to unseen data. Now, the algorithm will learn the pattern and mapping
between the feature and the label.
e The learning can be linear or non-linear depending upon the activation function and algorithm. There are a
few hyper parameters that affect the learning as well as training time such as learning rate, regularization,
batch size, number of passes (epoch), optimization algorithm, and more.
Ww TechKnowledge
Publications
BF Aland Ds-1 (MU) 6:16 Introduction to Machine Learning
6. Evaluating and improving model accura
cy
Accuracy is a measure to know how good or bad a model is doing on an unseen validation set. Based on the
e ‘ ;

current learnings, we need to evaluate how a model is doing on a validation set. Depending on the
application, we can use different accuracy metrics. For e.g. for classification we may use, precision and
recall or F1 Score; for object detection, we may use IoU (interaction over union).
If a model is not doing well, we may classify the problem in either of class 1) over-fittin g and 2) under-
fitting.
* When a model is doing well on the training data, but not on the validation data, it is the over-fitting
Scenario. Somehow model is not generalizing well. The solution for the problem includes regularizing
algorithm, decreasing input features, eliminating the redundant feature, and using resampling techniques
like k-fold cross-validation.
* — Inthe under-fitting scenario, a model does poor on both training and validation dataset. The solution to this
may include training with more data, evaluating different algorithms or architectures, using more number
of passes, experimenting with learning rate or optimization algorithm.
¢ — After an iterative training, the algorithm will learn a model to represent those labels from input data and
this model can be used to predict on the unseen data.

6.4 Self-Learning Topics : Real world case studies on machine learning


* Master Machine Learning by getting your hands dirty on Real Life Case studies. You might know the theory of
Machine Learning and know how to create algorithms. But as you know you must get your hands Dirty on Real-
World Case Studies. Knowing Machine learning and Applying it in the real world is totally different.
e These Case Studies will help you tackle big and complex data set and apply machine learning techniques to
achieve good results. These Case Studies will also enhance your resume as you can add these to your Portfolio.
Below are the Case Studies we shall cover in this course

1. REGRESSION Case Studies

e Retail Store Sales Prediction


e Restaurant Sales Prediction
e Inventory Prediction for Optimum Inventor Management
e Tube Assembly Pricing for Optimizing the Manufacturing Facility
e Coal Production Estimation
e Sport Player Salary Prediction

2. CLASSIFICATION Case Studies

e Diabetes Prediction for Preventive Care


e Telecom Network Disruptions Prediction for Planning Preventive Maintenance
e —_ Breast Cancer Prediction for Preventive Care
e Credit Card Fraud Detection
e Heart Diseases Prediction for Preventive Care
e _ Predict whether a Customer Shall Sign a Loan or Not

Wy TechKnowledge
Publications
a“

BF Aland Ds-1 (MU) Introduction to Machine Learning


6-17 —

Review Questions
Q.1 What are Types of Machine
Learning?
Q.2 Write short note on linear regr
ession.
Q.3 Explain decision tree with Suitable
diagram.
Q.4 — Write short note on Support
vector machine.
Q.5 What are the steps for working of the k-me
ans algorithm?
Q.6 — Explain Stopping Criteria for K-Me
ans Clustering.
Q.7 What are the key features of k-means clust
ering.
Q.8 — Explain limitations with K-Means clust
ering.
Q.9 What are the disadvantages of K-means Clust
ering?
Q.10 Write short note on Hierarchical clustering.

000

You might also like