0% found this document useful (0 votes)
20 views127 pages

371CPE Lectures Part2

Uploaded by

ayyash187
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views127 pages

371CPE Lectures Part2

Uploaded by

ayyash187
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 127

Intelligent Systems, 2024 Chapter 1.1 Dr.

Mohammad Alshamri

Intelligent Systems: An Introduction


Intelligence
It is the ability to learn, understand, think, and solve problems.

Human and Artificial Intelligence


Human Intelligence Artificial Intelligence (John McCarthy, 1956)
It is no more than TAKING the right It is no more than CHOOSING the right decision at the
decision at the right time right time.

Artificial Intelligence
AI is a computer program that mimics some level of human intelligence.
AI algorithms can tackle:
• Knowledge,
• Learning,
• Perception,
• Problem-solving,
• Language understanding and/or
• Logical reasoning.

Computer Engineering Department - College of Computer Science Page 1


Intelligent Systems, 2024 Chapter 1.1 Dr. Mohammad Alshamri

Turing Test

Imitation Game
It is an operational test for intelligent behavior. The following illustrate the game:

1. A computer and a human are placed behind the screen.


2. Common questions are posted to them, to identify the machine and the human.
3. If the computer response is true for some percentage, it will be defined as an
intelligent system.
4. Turing predicted that by 2000, a machine might have a 30% chance of fooling a
layperson for 5 minutes.

Computer Engineering Department - College of Computer Science Page 2


Intelligent Systems, 2024 Chapter 1.1 Dr. Mohammad Alshamri

Eugene Goostman (9/6/2014)


This program convinced interrogators
by 33% that he is a boy 13 years old.

Possible Definitions of Intelligent Systems


Systems that act like human Systems that act rationally
Systems that think like human Systems that think rationally

Rational behavior: doing the right thing


Rational: Maximally achieving pre-defined goals
The right thing: thing maximizes goal achievement, given the available
information.

General Definition of Intelligent System


• An intelligent system is a computerized solution that exhibits the ability to learn
from experience, adapt to changing environments, and perform tasks that typically
require human intelligence.
• These systems often utilize artificial intelligence (AI) techniques to process
information, reason, and make decisions.

Robots
• Robots are programmable machines that carry out a series of actions
autonomously, or semi-autonomously (initially, they perform tedious tasks with high
precision)

Computer Engineering Department - College of Computer Science Page 3


Intelligent Systems, 2024 Chapter 1.1 Dr. Mohammad Alshamri

• Three important factors constitute a robot:


1. Robots interact with the physical world via sensors and actuators.
2. Robots are programmable.
3. Robots are usually autonomous or semi-autonomous.

Robotics and Artificial Intelligence


• Robotics and artificial intelligence are not the same thing at
all.
• In fact, the two fields are almost entirely separate.

Industrial Robots
Industrial Robot is a programmed robot that carries out a repetitive series of movements
on two or more axes. Repetitive movements do not require artificial intelligence.

Telerobotic
Telerobotic is the area of robotics concerned with the control of semi-autonomous robots
from a distance.

Computer Engineering Department - College of Computer Science Page 4


Intelligent Systems, 2024 Chapter 1.1 Dr. Mohammad Alshamri

Cobots (Collaborative Robots)


It is a programmed robot that interacts directly and safely with humans in a shared
workspace.

Virtual Assistant (Chatbot)


A virtual assistant is a software agent that can perform tasks or services for an individual.

Google Assistant Apple TV remote control, with which users can


ask Siri virtual assistant to find content to watch

Intelligent System Types


Perceptive system A system that approximates the way a human sees, hears, and feels objects
Vision system A system that captures, stores, and manipulates visual images and pictures
Expert system A system that stores knowledge and makes inferences

Computer Engineering Department - College of Computer Science Page 5


Intelligent Systems, 2024 Chapter 1.1 Dr. Mohammad Alshamri

Learning system A computer system that learns how to function or how to react to situations
based on some feedback.
Natural language A computer system that understands and reacts to statements and commands
processing (NLP) made in a natural language, such as English Automatic Processing of human
language, communication between people and computers

Intelligent Systems in Your Everyday Life


Post Office Automatic address recognition & sorting of mail
Banks Automatic check readers
Signature verification systems
Automated loan application classification
Customer Service Automatic voice recognition
Digital Cameras Automated face detection and focusing
Computer Games Chess
Car Smart driving
Airplane

Automatic Number Plate Recognition

Computer Engineering Department - College of Computer Science Page 6


Intelligent Systems, 2024 Chapter 1.1 Dr. Mohammad Alshamri

History of Artificial Intelligence


• Artificial Intelligence technology is much older than you would imagine.
• Even there are myths of Mechanical men in Ancient Greek and Egyptian Myths.
• The following milestones in the history of AI define the journey from the AI
generation to till date development.

Computer Engineering Department - College of Computer Science Page 7


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

Machine Learning: An Introduction


• ML studies computer algorithms for learning to behave intelligently (complete a task
or make accurate predictions) without human intervention or assistance based on what

was experienced in the past.


• The learning is always done based on some sort of previous observations or data,
such as examples, direct experience, or instruction.
• ML paradigm can be viewed as “Programming by example” “Learn by example”.
• ML is not a question of remembering but also of generalization to unseen cases.
• In fact, it is not possible to build any kind of intelligent system without using
learning to get there (the precious resource for that is the amount of data)
• ML intersects broadly with other fields, especially statistics, but also mathematics,
physics, theoretical computer science and more.

Datasets and Features

• A dataset is a collection of data points or observations related to a specific


problem, task, or domain.
• Example (instance) (Data point) is the object that is being classified. For cancer
tumor classification, patients are the examples.
• Features (variables, attributes): They are properties of each data point. They
represent the input variables used to make predictions or derive insights.
• Label is the category that we are trying to predict. In the cancer classification, the
labels can be “avascular”, “vascular” and “angiogenesis”.

Computer Engineering Department - College of Computer Science Page 1


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

• Types of Features:
1. Numerical Features: They represent numerical values (e.g., age, income).
2. Categorical Features: represent discrete categories or labels (e.g., gender, city).
3. Text Features: They represent textual information (product descriptions, tweets).
4. Temporal Features: They represent timestamps or time-related information.
• Types of Datasets:
1. Structured Datasets: Organized into rows and columns, often represented as
tables or spreadsheets.
2. Unstructured Datasets: Lack a predefined structure, such as text data, images,
audio, or video.
• Dataset Dimensionality: it is the number of features that contribute to the
dimensionality of the dataset.

Example 1
Dataset Features
Housing prices Square footage, number of bedrooms, location, and proximity to amenities
Cancer classification Gender, age, weight, tumor size, tumor shape, blood pressure, etc.
Customer behavior Purchase history, time spent on a website, and demographic information
Diabetes dataset As below

Important Terminologies
Feature Selection:
• Feature selection is a process of selecting a subset of relevant features from the
original set of features to reduce the dimensionality of the feature space, simplify
the model, and improve its generalization performance.

Computer Engineering Department - College of Computer Science Page 2


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

• Feature selection aims to retain the most informative features while discarding less
important ones.
Feature Extraction:
• Feature extraction is a process of transforming the original features into a new set
of features that are more informative and compact.
• The new features still capture the essential information from the original data but
represent it in a lower-dimensional feature space.
• Feature extraction is usually used when the original data was very different (when
you could not use the raw data).

• If the original data were images, then you can extract the redness value, or a
description of the shape of an object in the image.

Feature Scaling
• Feature scaling is a preprocessing technique in ML that involves standardizing or
normalizing the range of independent variables or features of a dataset.
• The goal is to ensure that all features contribute equally to the modeling process,
preventing certain features from dominating due to their scale.
Scaling formula Formula Range
𝑥 − 𝑥𝑚𝑖𝑛 between 0 and 1
Min-Max Scaling (Normalization) 𝑥𝑁𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 =
𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛
𝑥−𝜇 -3 standard deviation
Z-score normalization 𝑥𝑁𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 = up to +3 standard
𝜎 deviation

Feature Engineering
• Feature engineering is the careful preprocessing into more meaningful features,
even if you could have used the old ones.
• Feature engineering involves selecting, transforming, normalization, one-hot
encoding, and creating new features based on existing ones.

Computer Engineering Department - College of Computer Science Page 3


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

• For example:
Instead of using the dataset variables x, y, z you decide to use log(𝑥) − 𝑧 × sqrt(𝑦)
instead, because the derived quantity is more meaningful to solve your problem. You get
better results than without.

Example 2 (OneHot Encoding)

Example 3
Show how feature engineering can help to identify Area Price
(Square feet) (Million Dollar)
errors in the dataset if an expert tells you that the price 1 2400 9
2 3200 15
per square foot cannot be less than $3,400. 3 2500 10
4 2100 1.5
5 2500 8.9

Answer 3
We will add a new column to display the Area Price Cost per
(Square foot) (Million Dollar) square foot
cost per square foot. 1 2400 9 4150
2 3200 15 4944
The results show that the data of house 4 3 2500 10 3950
has a problem 4 2100 1.5 510
5 2500 8.9 3600

Learning System
A computer program is said to learn from experience E with respect to some class
of tasks T and performance measure P, if its performance at tasks in T, as measured by P,
improves with experience E.

A learning system is characterized by the following elements:


1. Task (or tasks) T. 2. Experience E. 3. Performance measure P.

Computer Engineering Department - College of Computer Science Page 4


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

Example 4
Assume a learning system for playing tic-tac-toe game (or nuggets
and crosses). Describe the elements of this learning system.
Answer 4
T: Play tic-tac-toe;

E: Playing against itself (can also be playing against others).


P: Percentage of games won (and eventually drawn).

Types of Learning Systems


Prediction System: It predicts the desired output for a given input based on previous
input/output pairs.
Example: prediction of a stock value given input parameters like market index, interest
rates, and currency conversion.
Regression System: It estimates a function of many variables (multivariate) or single
variable (univariate) from scattered data.
Example: A simple univariate regression problem is 𝑥 4 + 𝑥 3 + 𝑥 2 + 𝑥 + 1
Classification (categorization) System: It classifies an object into one of several
categories (or classes) based on features of the object.
Example:
1. A diagnosis system to classify a patient’s cancer into one of the three categories:
avascular, vascular, angiogenesis.
2. An automated vehicle where a set of vision inputs and the corresponding steering
actions are available to the learner.

Computer Engineering Department - College of Computer Science Page 5


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

Clustering System: It organizes a group of objects into homogeneous segments.


Example:
1. A satellite image analysis system which groups land areas into
forest, urban and water body, for better utilization of natural resources.
2. Finding out malicious network attacks from a sequence of anomalous data packets.
Planning System: It generates an optimal sequence of actions to solve a particular
problem.
Example: A robot path planning to perform a certain task or to move from one place to
another.

Machine Learning Types


1. Supervised Learning: It needs correct answers (ground truth) for each example. This is
called supervision. (It learns to produce the correct output given a new input).
2. Unsupervised Learning: It does not need correct answers for its examples (No
supervision).
3. Reinforcement Learning: The machine produces actions which affect the state of the world
and receives rewards (or punishments). The goal is to learn to act in a way that maximizes
rewards in the long term.
4. Semi- supervised learning: It is an approach to ML that combines a small amount
of labeled data with a large amount of unlabeled data during training.

Supervised Learning
There are two main types:
• Classification: It maps an
input data point to an output
label.
• Regression: It maps an
input data point to a
continuous output value.

Computer Engineering Department - College of Computer Science Page 6


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

Supervised Learning Workflow


Supervised learning
goes through two
stages.
1. Training stage
2. Testing stage

However, the supervised learning workflow comprise the following steps:


1. Ingestion of raw data.
2. Leverage data processing techniques to wrangle, process and engineer meaningful
features and attributes from this data,
3. Leverage ML models to model on these features.
4. Deploy the model for future usage based on the problem to be solved at hand.

Example 5
Assume an instructor is training an agent to become a taxi driver. Then:
• Every time the instructor shouts "Brake!" The agent can learn a condition action rule for
when to brake.
After learning the "Brake” rules, the agent applies them and find the following case:
• Breaking hard on a wet road causes something bad.
Then the agent will learn the effects of its actions.

Computer Engineering Department - College of Computer Science Page 7


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

Supervised Classification
• The goal is to learn a functional mapping between the input data (patterns or
examples) 𝑋, to a class label 𝑌, i.e., 𝑌 = 𝑓(𝑋).

• The function approximates the relationship between input data and output label.
• There are three phases in supervised classification:
1. Training stage: The input is 𝑋, and 𝑌. The output is the mapping, 𝑌 = 𝑓(𝑋).
2. Classification stage: The input is a new example 𝑋𝑡 . The output is the predicted
value, 𝑦𝑡 = 𝑓(𝑋𝑡 ).
3. Output stage: Define the level of the classification. The input is the predicted value
𝑦𝑡 . The output is the predicted label, 𝑌𝑡 .
• Noisy, or incorrect, data labels will clearly reduce the effectiveness of the model.
• The error for any supervised ML algorithm comprises of 3 parts:
1. Bias error.
2. Variance error.
3. The noise.
• The main considerations for supervised learning are:
1. Model complexity: It refers to the complexity of the function you are attempting
to learn — like the degree of a polynomial.
2. Bias-Variance tradeoff.

Bias in Machine Learning


• Bias is the difference between the expected prediction of the model and the correct
value which we are trying to predict. (Bias is calculated using mean squared error (MSE)
or mean absolute error (MAE)).

Computer Engineering Department - College of Computer Science Page 8


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

• Bias occurs due to incorrect model assumption leading to misrepresentation of the


data distribution (miss the relevant relations between features and target outputs
(underfitting)).

Variance in Machine Learning


• Variance is the variability of predictions for a given data point once the model is
trained multiple times on different datasets (different realizations of the model).
• High variance model:
1. It is highly sensitive to the training data and may capture noise as a real pattern.
2. It has a very complex fit to the training data and thus is not able to fit accurately
on the data which it has not seen before (overfitting, the model learns too much
from the training data).

Bias-Variance Tradeoff Scenarios


• High Bias, Low Variance (Underfitting):
1. The model is too simplistic, ignoring important patterns in the data.
2. Performance is poor on both the training set and new data.
• Low Bias, High Variance (Overfitting):
1. The model is overly complex, overfits the data and learns too much from it by
capturing noise in the training data (It tries to memorize the training data instead of
generalizing the relationship between inputs and output variables)

2. Performance is excellent on the training set but degrades on new data.


• Optimal Tradeoff:
1. The sweet spot lies in balancing bias and variance to achieve good performance on
both training and test data.

The tradeoff may be illustrated using a bulls-eye diagram.


• The center circle is a model that perfectly predicts the correct values.

Computer Engineering Department - College of Computer Science Page 9


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

• As we move away from the bullseye, predictions get worse and worse.
• Different individual realizations result in a scatter of hits on the target due to
repeating the model on different training data.

Simple relationship Complex relationship Optimal


Among many parameters, simple model decides to rely only on few of them the others as unimportant at all.
For example, just considering that the Glusoce level and the Blood Pressure decide if the patient has diabetes.

K-Nearest Neighbor Classification


• K-nearest neighbors are measured by a distance function.
𝐾
2
Euclidean distance 𝐷𝑒 (𝑋1 , 𝑋2 ) = √∑(𝑥1,𝑖 − 𝑥2,𝑖 )
𝑖=1

Manhattan distance 𝐷𝑚 (𝑋1 , 𝑋2 ) = ∑|𝑥1,𝑖 − 𝑥2,𝑖 |


𝑖=1

• The distance functions are valid only with continuous variables.

Computer Engineering Department - College of Computer Science Page 10


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

Example 6
Consider the following data concerning credit Age Loan (SR) Default
25 40,000 N
default. Age and Loan are two numerical variables 35 60,000 N
(predictors), and Default is the target. 45 80,000 N
20 20,000 N
Use 1-nearest neighbors and 3-nearest neighbors 35 120,000 N
52 18,000 N
with Euclidean distance to classify unknown case 23 95,000 Y
40 62,000 Y
(Age=48 and Loan=$142,000)
60 100,000 Y
48 220,000 Y
Answer 6 33 150,000 Y
Calculate the Euclidean distance between the test example and all training examples.
2
2 2 2
𝐷𝑒 (𝑋𝑇𝑅 , 𝑋𝑡 ) = √∑(𝑥𝑇𝑅,𝑖 − 𝑥𝑡,𝑖 ) = √(𝑥𝑇𝑅,1 − 𝑥𝑡,1 ) + (𝑥𝑇𝑅,2 − 𝑥𝑡,2 )
𝑖=1

If 𝐾 = 1, then this case is Y


If 𝐾 = 3, then we have 2 Y and 1 N which give us Y.
Age Loan Default Distance
(SR)
25 40,000 N 102000
35 60,000 N 82000
45 80,000 N 62000
20 20,000 N 122000
35 120,000 N 22000 2
52 18,000 N 124000
23 95,000 Y 47000
40 62,000 Y 80000
60 100,000 Y 42000 3
48 220,000 Y 78000
33 150,000 Y 8000 1

48 142,000 ? Test
KNN disadvantages are:
• KNN are computationally expensive
• KNN requires a large memory to store the training data.

Computer Engineering Department - College of Computer Science Page 11


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

Example 7

For this example, we have:


• Space of all possible examples: 𝐸 = (𝑒1 , 𝑒2 , 𝑒3 , … , 𝑒15 ).
• Space of all features: 𝑋 = (𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 , 𝑥6 ).
• Instance space = cardinality = |{𝑋}| = 15
• Concept learning/regression: we must find 𝑓 such as
𝑌 = 𝑓(𝑋) = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3 + 𝑤4 𝑥4 + 𝑤5 𝑥5 + 𝑤6 𝑥6
• Prediction: 𝑓(1, 2, 1, 1, 0, 0) = ?
• Evaluation:
|{𝑋: 𝑓(𝑋) ≠ 𝑌}|
𝐸𝑟𝑟𝑜𝑟(𝑓) =
15

Pattern Classification Tasks

Computer Engineering Department - College of Computer Science Page 12


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

• A pattern is described by a set of features for some phenomenon that repeats


regularly based on a set of rules or conditions.
• Pattern recognition is the process of recognizing patterns using the ML algorithm.
• Pattern recognition:
1. It classifies data based on knowledge extracted from patterns and/or their
representation.
2. The task can be viewed as a two-dimensional matrix (whose axes are the examples
and the features).
3. It attempts to assign each example to one of a given set of classes (patterns).
• The process of pattern recognition matches the information received with the
information already stored, e.g., color of eyes, distance between the eyes, colors
on the clothes and speech pattern.

Unsupervised Learning
• Unsupervised Learning learns patterns in the input data when no specific output
values are supplied.
• A cluster refers to a collection of data points aggregated together because of
certain similarities.
• For instance – a taxi agent might gradually develop a concept of "good traffic
days" and "bad traffic days" without ever being given labels.
• A purely unsupervised learning agent cannot learn what to do, because it has no
information as to what constitutes a correct action or a desirable state.

Computer Engineering Department - College of Computer Science Page 13


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

Unsupervised Learning Example


Assume a basket filled with
some fresh fruits. Suggest a
ML technique for arranging
the same type fruits at one
place. cherry fruit apple banana grapes

• This time the agent does not know anything about fruits, it is the first time he has
seen these fruits so how it will arrange the same type of fruits.
• The agent will take on a fruit and will select any physical character of that fruit
(suppose it is the color).
• The agent will arrange fruits by the color. The groups will be something like this.
1. RED COLOR GROUP: apples & cherry fruits.
2. GREEN COLOR GROUP: bananas & grapes.
• If the agent adds another physical character as size, so now the groups will be
something like this.
1. RED COLOR AND BIG SIZE: apple.
2. RED COLOR AND SMALL SIZE: cherry fruits.
3. GREEN COLOR AND BIG SIZE: bananas.
4. GREEN COLOR AND SMALL SIZE: grapes.

Clustering Algorithms
1. K-Means Clustering: It divides the data into a specific number of groups or clusters by
minimizing the total squared distances between the data points and the centers of each
cluster.
2. Hierarchical Clustering: It develops a hierarchy of clusters by merging or splitting them
depending on their similarity.
3. Density-Based Spatial Clustering of Applications with Noise: DBSCAN identifies
clusters as dense regions of data points separated by sparser regions.

Computer Engineering Department - College of Computer Science Page 14


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

K-Means Clustering Algorithm


• K-Means algorithm identifies a fixed number, 𝐾, of centroids (clusters), and then
allocates every data point to the nearest cluster, while keeping the centroids as
small as possible.
• A centroid is the imaginary or real location representing the center of the cluster.
• K-Means algorithm starts with a first group of randomly selected centroids, which
are used as the beginning points for every cluster, and then performs iterative
(repetitive) calculations to optimize the positions of the centroids
• K-Means algorithm halts creating and optimizing clusters when either:
1. The centroids have stabilized — there is no change in their values because the
clustering has been successful.
2. The defined number of iterations has been achieved.

K-Means Clustering Algorithm Steps


(1) (1) (1)
• Start by a random set of 𝐾 means, 𝑚1 , 𝑚2 , … , 𝑚𝐾 .
• The algorithm proceeds by alternating between two steps:
Assignment Step: It measures distances between K centroids and individual data
(𝑡)
points (each 𝑥𝑝 is assigned to exactly one 𝑆⬚ ).

Computer Engineering Department - College of Computer Science Page 15


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

(𝑡) (𝑡) 2 (𝑡) 2


𝑆𝑖 = {𝑥𝑝 : ‖𝑥𝑝 − 𝑚𝑖 ‖ ≤ ‖𝑥𝑝 − 𝑚𝑗 ‖ } ∀𝑗, 1 ≤ 𝑗 ≤ 𝐾

Update Step: The algorithm updates the position of each centroid as the average
of respective data points belonging to each cluster.
(𝑡+1) 1
𝑚𝑖 = (𝑡)
∑ 𝑥𝑗
|𝑆𝑖 | (𝑡)
𝑥𝑗 ∈𝑆𝑖

• It repeats the process until no centroid moves more than a given threshold.

Example 8

Example 9
Assume eight location points represented by (𝑥, 𝑦):
𝐴1(2, 10), 𝐴2(2, 5), 𝐴3(8, 4), 𝐴4(5, 8), 𝐴5(7, 5), 𝐴6(6, 4), 𝐴7(1, 2), 𝐴8(4, 9)

Computer Engineering Department - College of Computer Science Page 16


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

Assume initial cluster centers are 𝐴1(2, 10), 𝐴4(5, 8) and 𝐴7(1, 2) and the used distance
function is Manhattan distance. Use K-Means Algorithm to find the three cluster centers
after the second iteration.

Answer 9
Calculate the distance of each point from each of the center of the three clusters.
𝑑𝑖𝑠(𝐶2 , 𝐴𝑖 ) = |𝑥𝐶 − 𝑥𝐴 | + |𝑦𝐶 − 𝑦𝐴 |
Iteration 1:

Given Points 𝒅𝒊𝒔(𝑪𝟏 , 𝑨𝒊 ) 𝒅𝒊𝒔(𝑪𝟐 , 𝑨𝒊 ) 𝒅𝒊𝒔(𝑪𝟑 , 𝑨𝒊 ) Point belongs to Cluster

A1(2, 10) 0 5 9 C1

A2(2, 5) 5 6 4 C3

A3(8, 4) 12 7 9 C2

A4(5, 8) 5 0 10 C2

A5(7, 5) 10 5 9 C2

A6(6, 4) 10 5 7 C2

A7(1, 2) 9 10 0 C3

A8(4, 9) 3 2 10 C2

New clusters are:

Cluster Points New centroid

C1 A1(2, 10) (2, 10)

8+5+7+6+4 4+8+5+4+9
( , )
C2 A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A8(4, 9) 5 5
= (6,6)

2+1 5+2
C3 A2(2, 5), A7(1, 2) ( , ) = (1.5,3.5)
2 2

Re-compute the new cluster clusters (The new cluster center is computed by taking the
average of all the points contained in that cluster.)

Computer Engineering Department - College of Computer Science Page 17


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

Iteration 2:

Given Points 𝒅𝒊𝒔(𝑪𝟏 , 𝑨𝒊 ) 𝒅𝒊𝒔(𝑪𝟐 , 𝑨𝒊 ) 𝒅𝒊𝒔(𝑪𝟑 , 𝑨𝒊 ) Point belongs to Cluster

A1(2, 10) 0 8 7 C1

A2(2, 5) 5 5 2 C3

A3(8, 4) 12 4 7 C2

A4(5, 8) 5 3 8 C2

A5(7, 5) 10 2 7 C2

A6(6, 4) 10 2 5 C2

A7(1, 2) 9 9 2 C3

A8(4, 9) 3 5 8 C1

New clusters are:

Cluster Points New centroid

2 + 4 10 + 9
C1 A1(2, 10), A8(4,9) ( , ) = (3,9.5)
2 2
8+5+7+6 4+8+5+4
C2 A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4) ( , ) = (6.5,5.25)
4 4
2+1 5+2
C3 A2(2, 5), A7(1, 2) ( , ) = (1.5,3.5)
2 2

Association Rule Mining


• Association rule mining focuses on discovering interesting relationships or patterns
in transactional data.
• It is commonly used in market basket analysis and recommendations.
• Association rule is a rule that defines the dependency between two sets of objects
consequent is a set of items more likely to be bought by
𝑎𝑛𝑡𝑒𝑐𝑒𝑑𝑒𝑛𝑡 → 𝑐𝑜𝑛𝑠𝑒𝑞𝑢𝑒𝑛𝑡
{𝐵𝑟𝑒𝑎𝑑, 𝐵𝑢𝑡𝑡𝑒𝑟} → {𝑀𝑖𝑙𝑘, 𝐶𝑜𝑓𝑓𝑒𝑒} the consumers if they buy items in the antecedent set

Computer Engineering Department - College of Computer Science Page 18


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

• Itemset: An itemset is a set containing one or more items in the transaction


dataset. For instance, {}, {Milk}, {Milk, Bread}, {Tea, Ketchup}, and {Milk, Tea,
Coffee} are all itemsets.
• The importance of any association rule depends on metrics such as lift, support,
and confidence.
• Support Count of an Itemset: The support count of an item set is the frequency
of an item in the transaction data.
• Support: It indicates an itemset’s popularity within a dataset.
supCount(𝐼)
sup(𝐼) =
𝑁𝐼
• Frequent Itemset: It is an itemset with support greater than minimum support.
• Confidence: It is important for the association rule and defined by:
sup(𝐴 ∪ 𝐶 ) supCount(𝐴 ∪ 𝐶 )
conf(𝐴 → 𝐶 ) = =
sup(𝐴) supCount(𝐴)
• Lift: It defines the strength of an association rule.
sup(𝐴 ∪ 𝐶 ) conf(𝐴 → 𝐶 )
lift(𝐴 → 𝐶 ) = =
sup(𝐴)sup(𝐶 ) sup(𝐶 )

Example 10
1. What are the possible itemsets? Transaction ID Items
T1 Milk, Bread, Coffee, Tea
2. What is the support count for {Milk}, T2 Milk, Bread
T3 Milk, Coffee
{Milk, Bread}, and {Milk, Ketchup}? T4 Bread, Ketchup
T5 Milk, Tea, Sugar

3. What are the frequent itemsets if the minimum support is 0.4.


4. What is the confidence and lift of the association rule {Milk, Bread} → {Coffee}?

Answer 10
The possible itemsets are:
{Milk} 4 {Ketchup} 1 {Milk, Tea} 2 {Bread, Ketchup} 1 {Milk, Bread, Tea}

Computer Engineering Department - College of Computer Science Page 19


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

{Bread} 3 {Sugar} 1 {Bread, Coffee} 1 {Milk, Sugar} 1 {Milk, Coffee, Tea}


{Coffee} 2 {Milk, Bread} 2 {Bread, Tea} 1 {Tea, Sugar} 1 {Bread, Coffee, Tea}
{Tea} 2 {Milk, Coffee} 2 {Coffee, Tea} 1 {Milk, Tea, Sugar} 1 {Milk, Bread, Coffee}

The support is:


supCount({Milk}) = 4 supCount({Milk, Bread}) = 2 supCount({Milk, Ketchup}) = 0

The frequent itemsets are:


{Milk} 0.8 {Coffee} 0.4 {Milk, Tea} 0.4 {Milk, Coffee} 0.4
{Bread} 0.6 {Tea} 0.4 {Milk, Bread} 0.4

The confidence and lift of {Milk, Bread} → {Coffee} is:


sup({Milk, Bread, Coffee} ) 0.2
conf({Milk, Bread} → {Coffee} ) = = = 0.5
sup({Milk, Bread}) 0.4
conf(𝐴 → 𝐶 ) 0.5
lift(𝐴 → 𝐶 ) = = = 1.25
sup(𝐶 ) 0.4

Example 11
Suppose you are building a Face recognition program using the following:
1. Supervised Learning (Classification):
• You have a dataset of faces image and other images.
• The dataset is labeled (classified) into faces images, and other images.
• The program task is to classify new images based on the available dataset.
2. Unsupervised learning (Clustering):
• The goal is to infer the natural structure present within a set of data points.
• The program clusters similar data points for a new given dataset into different groups,
e.g., it can distinguish that faces are very different from landscapes, which are very
different from horses.

Computer Engineering Department - College of Computer Science Page 20


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

Apriori Algorithm
• Apriori algorithm is the most used
algorithm for association rule
mining.
• It uses a breadth-first search strategy
to generate frequent item sets and
then generates association rules from
these item sets.

Example 12
Use the Apriori algorithm on the grocery
store data with minimum support 33.34%
and confidence 60%.
Indicate the association rules that are
generated and highlight the strong ones,
sort them by confidence.

Answer 11
The minimum support of 33.34% means that the minimum support count is 2.
Pass (𝒌) Candidate 𝒌-itemsets and their support Frequent 𝒌-itemsets
𝑘=1 HotDogs(4), Buns(2), Ketchup(2), Coke(3), HotDogs, Buns, Ketchup,
Chips(4) Coke, Chips
𝑘=2 {HotDogs, Buns}(2), {HotDogs, Ketchup}(1), {HotDogs, Buns},
{HotDogs, Coke}(2), {HotDogs, Chips}(2), {HotDogs, Coke},
{Buns, Ketchup}(1), {Ketchup, Chips}(1), {Coke, {HotDogs, Chips},
Chips}(3) {Coke, Chips}
𝑘=3 {HotDogs, Coke, Chips}(2) {HotDogs, Coke, Chips}
𝑘=4 {}

Computer Engineering Department - College of Computer Science Page 21


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

The association rules:


Frequent Itemset Association rule Confidence
HotDogs → Buns 2/4 = 0.5
{HotDogs, Buns}
Buns → HotDogs 𝟐/𝟐 = 𝟏
HotDogs → Coke 2/4 = 0.5
{HotDogs, Coke}
Coke → HotDogs 𝟐/𝟑 = 𝟎. 𝟔𝟕
HotDogs → Chips 2/4 = 0.5
{HotDogs, Chips}
Chips → HotDogs 2/4 = 0.5
𝐂𝐨𝐤𝐞 → Chips 𝟑/𝟑 = 𝟏. 𝟎
{Coke, Chips}
Chips → Coke 𝟑/𝟒 = 𝟎. 𝟕𝟓
HotDogs → (Coke ∧ Chips) 2/4 = 0.5
Coke → (Chips ∧ HotDogs) 𝟐/𝟑 = 𝟎. 𝟔𝟕
Chips → (Coke ∧ HotDogs) 2/4 = 0.5
{HotDogs, Coke, Chips}
(HotDogs ∧ Coke) → Chips 𝟐/𝟐 = 𝟏
(HotDogs ∧ Chips) → Coke 𝟐/𝟐 = 𝟏
(Coke ∧ Chips) → HotDogs 𝟐/𝟑 = 𝟎. 𝟔𝟕
The Strong Association Rules sorted by their confidence are:
Association rule Confidence
Buns → HotDogs 𝟐/𝟐 = 𝟏
𝐂𝐨𝐤𝐞 → Chips 𝟑/𝟑 = 𝟏
(HotDogs ∧ Coke) → Chips 𝟐/𝟐 = 𝟏
(HotDogs ∧ Chips) → Coke 𝟐/𝟐 = 𝟏
Chips → Coke 𝟑/𝟒 = 𝟎. 𝟕𝟓
Coke → HotDogs 𝟐/𝟑 = 𝟎. 𝟔𝟕
Coke → (Chips ∧ HotDogs) 𝟐/𝟑 = 𝟎. 𝟔𝟕
(Coke ∧ Chips) → HotDogs 𝟐/𝟑 = 𝟎. 𝟔𝟕

Applications of Unsupervised Learning


Unsupervised learning finds applications across various domains. Some notable
applications include:

Computer Engineering Department - College of Computer Science Page 22


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

• Customer Segmentation: It groups customers based on their purchasing behavior,


allowing businesses to tailor marketing strategies.
• Anomaly Detection: Identifying abnormal patterns or outliers helps unsupervised
learning to detect fraud, network intrusions, or manufacturing defects.
• Image and Text Clustering: Unsupervised learning can automatically group
similar images or texts, aiding in tasks like image organization, document
clustering, or content recommendation.
• Genome Analysis: It can analyze genetic data to identify patterns and
relationships, leading to insights in personalized medicine and genetic research.
• Social Network Analysis: Unsupervised learning can be used to identify
communities or influential individuals within social networks, enabling targeted
marketing or detecting online communities.

Reinforcement Learning

• It is about learning the optimal behavior in an environment to obtain maximum


reward (This is the most general of the three categories).
• Rather than being told what to do by a teacher, a reinforcement learning agent must
learn from reinforcement.
• For instance:
1. The lack of a tip at the end of the journey gives the agent some indication that its
behavior is undesirable.

Computer Engineering Department - College of Computer Science Page 23


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

2. A robot in an unknown terrain gets a punishment when it hits an obstacle and a


reward when it moves smoothly.
• Reinforcement learning typically includes the sub-problem of learning how the
environment works.

Example 13

Example 14

Computer Engineering Department - College of Computer Science Page 24


Intelligent Systems, 2024 Chapter 1.2 Dr. Mohammad Alshamri

Examples of Machine Learning Problems


There are many examples of machine learning problems:
• Optical character recognition: It categorizes images of handwritten characters.
• Face detection: It finds faces in images (or indicate if a face is present)
• Spam filtering: It identifies email messages as spam or non-spam.
• Topic spotting: It categorizes news articles (politics, sports, entertainment). Google
News categorizes articles on the same story from various online news outlets.
• Medical diagnosis: It diagnoses a patient as a sufferer or non-sufferer of some
disease.
• Spoken language understanding: Within a limited domain, it determines the
meaning of something uttered by a speaker to the extent that it can be classified
into one of a fixed set of categories.
• Customer segmentation: It predicts (for instance) which customers will respond to
a particular promotion.
• Fraud detection: It identifies credit card transactions (for instance) which may be
fraudulent in nature.
• Weather prediction: It predicts (for instance) whether or not it will rain tomorrow.

Goals of Machine Learning


The primary goal of ML research is to develop general purpose algorithms of practical
value without human intervention.
• ML algorithms should be efficient in terms of time and space efficiency.
• Of primary importance, the result of learning is a prediction rule that is as accurate
as possible in the predictions that it makes.

Computer Engineering Department - College of Computer Science Page 25


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

How can human knowledge of all kinds be represented by a computer language so that the computers can use this
knowledge for purposes of reasoning?

Knowledge-based Systems
Data
• Data refers to factual, discrete, and static things and raw observations of the given
area of interest that are not organized to convey any specific meaning.
• Data can be numbers, letters, figures, sounds, or images.

Information
• It is data within a context (data that has been shaped into a form meaningful and
useful to human beings).
For example,
1. a grade point average (GPA) is data, but
2. a student’s name coupled with his or her GPA is information.
• The recipient interprets the meaning and draws conclusions and implications from
the information.
• Information is only as good as the data from which it is derived otherwise ‘garbage
in, garbage out’ or simple GIGO.

Knowledge
It consists of data and/or information that has been organized and processed to convey
understanding, experience, or accumulated learning.
• For example, a company has found over time that students with a grade point average
over 3.0 have had the most success in its management program.
• Based on its experience, that company may decide to interview only those students with
GPAs over 3.0.

Computer Engineering Department - College of Computer Science Page 1


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

Information Types
Information type Meaning Example
Permanent It never changes, like physical laws The earth moves around the Sun
Static It is constant over a period of time Policies and procedures
Dynamic It is continuously changing Prices of shares and gold

Knowledge Components
1. Facts
2. Rules
3. Heuristics

Facts
• Facts represent (claims) a set of raw observations, alphabets, symbols, or
statements that can be true or false at the time that they are used.
"Fire is hot" tap is open
"The earth moves around the Sun" Joe Bloggs works for ACME
"Every car has a battery"
• The fact has three parts:
1. An object (also called a linguistic object).
2. The value of the linguistic object.
3. An operator to assign a value to the linguistic object (like, for example, is, are, or
mathematical operators)

• Examples: x > 0, x is positive, temperature is high, weather is hot.

• Facts can be attributes or relationships.


1. Attributes are properties of object instances (such as my car) or object classes (such as
cars and vehicles).
2. Relationships exist among instances of objects and classes of objects.
3. Attributes and relationships can be represented as a network, known as an associative or
semantic network.
• Facts can be:

Computer Engineering Department - College of Computer Science Page 2


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

1. looked up from a database.


2. already stored in computer memory.
3. determined from sensors connected to the computer.
4. obtained by prompting the user for information.
5. derived by applying rules to other facts.
• Static Fact: It is a fact that remains constant or unchanged throughout the
execution of a program or a particular inference cycle.
Static facts are usually made available to KBS at the outset.
• Transient Fact: It is a fact that applies at a specific instance only, or for a single
run of the system.
Transient facts are usually made available to KBS while the system is running and may change
frequently during the execution of a program or within a specific timeframe.
• The knowledge base may contain defaults, which can be used as facts in the
absence of transient facts to the contrary.
• Both Static and Transient facts may be described as given facts (low-level facts).
• Fact base: It is the collection of all facts which are known to the system at any
given time.

Example 1
A collection of facts about my car are:

Computer Engineering Department - College of Computer Science Page 3


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

Example 2
An example of a semantic network with an overridden default is given below. Here
attributes are treated in the same way as relationships.

Rules (IF-THEN Rules or Production Rules)


• A rule consists of two parts:
1. IF part (antecedent, premise, condition). These are the criteria or constraints that
must be satisfied for the rule to be applicable.
2. THEN part (consequent, conclusion, action): These are the tasks or conclusions
that are executed or drawn when the conditions are met.

IF <condition> THEN <conclusion> IF P THEN Q


IF tap is open THEN water flows It is equivalent to: P⇒Q
IF season is winter THEN it is cold
IF ?x works for ACME THEN ?x earns a large salary
Question mark ? is used to indicate that x is a variable that can be replaced by a constant value.

Computer Engineering Department - College of Computer Science Page 4


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

• Rules are an elegant, expressive, straightforward, and flexible means of expressing


many types of knowledge.
• Usually, the knowledge of the expert is captured in a set of rules, each of which
encodes a small piece of the expert's knowledge.
• A rule can have multiple antecedents or consequents joined by any of the logical
operators AND, OR (or a mixture of them).
• The ordering of rules in a program ideally is not important, and it is possible to add
new rules or modify existing ones without fear of side effects.
• Rule base: It is the set of the available rules in the system.
• Shallow Rule: It is a rule that is specific to one particular situation and would not
apply to other situations (It represents shallow knowledge).
• Deep Rule: It is a valid rule under any circumstance and is not specific to a
particular situation (It represents deep knowledge).
• Rule Firing (Triggering) Process: It is the process evaluating the conditions of
each rule against a given set of data or input.
When the conditions of a rule are satisfied by the input, the rule is said to "fire" and its
associated actions are executed.
• Conflict set: It is a set of rules whose conditions are satisfied based on the input.
• Working memory: It is the memory for the input that may or may not initially
contain any data, assertions or initially known information.
• One or more given facts may satisfy the condition of a rule, resulting in the
generation of a new fact, known as a derived fact (high-level facts).
• Given the rule IF tap is open THEN water flows and the fact tap is open, the derived
fact water flows can be generated.
• The new fact is stored in computer memory and can be used to satisfy the
conditions of other rules, thereby leading to further derived facts.
• Low-level rules are those that depend on low-level facts.

Computer Engineering Department - College of Computer Science Page 5


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

• High-level rules make use of more abstract information.


• Higher-level rules are closest to providing a solution to a problem, while lower-
level rules represent the first stages toward reaching a conclusion.

Example 3
Show some examples of multiple antecedents joined by logical operators.

Answer 3
Multiple antecedents combined by AND
IF (antecedent1 AND antecedent2 … AND antecedentN) THEN consequent
IF (the season is winter AND the temperature is <0 degrees AND it is windy) THEN the weather is cold

Multiple antecedents combined by OR


IF (antecedent1 OR antecedent2 … OR antecedentN) THEN consequent
IF (the season is winter OR the temperature is <0 degrees OR it is windy) THEN it is cold

Multiple antecedents combined by AND and OR


IF (antecedent1 AND antecedent2 … OR antecedentN) THEN consequent
IF ((the season is winter AND the temperature is <0 degrees) OR it is windy) THEN it is cold

An example of a consequent with multiple clauses is:


IF antecedent THEN consequent1, consequent2, … consequent
IF the season is winter THEN (temperature is low AND road is slippery AND forecast is snow)

Example 4
Apply Rule 1 to Fact 1 to get a derived fact.
Fact 1 Joe Bloggs works for ACME
Rule 1 IF ?x works for ACME THEN ?x earns a large salary

Answer 4
Fact 2 Joe Bloggs earns a large salary

Computer Engineering Department - College of Computer Science Page 6


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

Rules Classification
• Some ways to classify the rules are based on their: function, structure, or behavior.
• In terms of structure, rules can be logic, definition, or constraint rules.

Logic Rules
Logic rules are rules with a clearly recognizable IF condition and THEN
conclusion.

• The conclusion of the logic rule changes the value of something in the system.
• Example of logic rule is: IF (x = 1 AND y = 2) THEN z = 3

Definition Rules
Definition rules are rules without any fact (the fact is always true) in the IF part.

IF P (something which is always true) THEN Q = x (Assign or compute a value)


• Definition rules often comprise the majority of rule entities in most applications.
• Definition rules are unconditional, so they are often implemented as procedural
code or logical view of the database.

Constraint Rules
Constraint rules are rules without any fact in the THEN part.

• A constraint describes a violation of a relationship between data entities.


• There is no change of value within THEN data (the value or state of any entity).
• A constraint will often trigger an exception (usually an error process), such as sending
a message (for instance sending the message “x should be a positive number”).

• Example: IF (x= -1 AND y = 2) THEN Raise an exception


• The exception process will:

Computer Engineering Department - College of Computer Science Page 7


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

1. prompt the user to know that there is an error condition.


2. log the error condition to an error file.
3. alter or interrupt the flow of the process step or of the rule engine itself.

Note that
• Based on the conclusion or the consequent of a rule, rules can express:
Relation IF (x > 0) THEN (x is positive)
Recommendation IF (it is rainy) THEN take an umbrella
Directive IF (phone battery signals AND phone battery is empty) THEN (charge the phone)
Heuristic IF phone light is off THEN battery is flat

• Interdependent rules refer to a situation in a rule-based system where the


execution or behavior of one rule is dependent on the outcomes of other rules.

• The interdependencies amongst the rules define a network called an inference


network.

Example 5
The derived fact may satisfy, or partially satisfy, another rule, such as:
Rule 1 IF ?x works for ACME THEN ?x earns a large salary
Rule 2 IF (?x earns a large salary OR ?x has job satisfaction) THEN ?x has
professional contentment
• Rules 1 and 2 are interdependent since the conclusion of one can satisfy the
condition of the other.

Heuristics
▪ They are solutions that experts have employed in similar situations.
1. IF there is a total eclipse of the Sun THEN there is no daylight.
(even though the Sun is in the sky).
2. IF (It is rainy season AND a car was driven through water) THEN (The car
silencer would have water in it AND the car may not start).

Computer Engineering Department - College of Computer Science Page 8


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

Inference Network
Inference network is a network structure that represents the logical relationships
between facts, rules, or pieces of knowledge within a knowledge base .
• Inference network represents a closed world that facilitates the process of drawing
conclusions or making logical inferences based on the available information.
• Each node represents a possible state of some aspect of the world, hence a model
of the current overall state of the world can be maintained.
• Such a model is dependent on the extent of the relationships between the nodes.

• If a change occurs in one aspect of the world, many other nodes could be affected.

• Frame problem: It is the problem of determining what else has been changed in
the world model because of changing one thing.

Example 6
An example of an inference
network is given here.

Example 7
For the inference network of Example 6, assume Joe Bloggs gets a new job. What are the
changes that could happen to the network?

Answer 7
If Joe Bloggs gets a new job, the inference network suggests that the only direct change is his
salary, which could change his professional contentment and happiness.

Computer Engineering Department - College of Computer Science Page 9


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

However, in a more complex model of Joe Bloggs world, many other nodes could also be
affected.

Deduction, Abduction, and Induction

Rule

• Deduction: It is a form of reasoning in which specific conclusions are derived


logically from general premises or assumptions.
It involves applying rules to reach a necessarily true conclusion.
• Abduction: It is a form of reasoning in which the best explanation or hypothesis is
inferred to account for a set of observations or facts.
Many problems, such as diagnosis, involve reasoning in the reverse direction, i.e., we
wish to ascertain a cause, given an effect.
• Induction: It is the process of inferring a rule from a set of examples of causes and
effects.
1. If we have many examples of cause and effect, we can infer the rule (or
inference network) that links them.
2. For instance, if every employee of ACME that we have met earns a large
salary, then we might infer Rule 1.

Example 8
Use the inference network of Example 6 to generate a deduction about Joe Bloggs who
works for ACME and has a stable relationship.

Answer 8
Deduction:
IF Joe Bloggs works for ACME AND is in a stable relationship (the causes) THEN he is happy
(the effect).

Computer Engineering Department - College of Computer Science Page 10


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

Abduction: Given the observation that Joe Bloggs is happy, we can infer that Joe Bloggs
enjoys domestic bliss and professional contentment.

Knowledge-Based System (KBS)


KBS is a computer system that uses AI to analyze knowledge, data, and other
information from sources to make informed decisions and generate new knowledge.

• Knowledge-based systems typically have three components, which include:


1. Knowledge base: It is a collection of information and resources. The system
uses this as its repository for the knowledge it uses to make decisions. In its
simple form, the knowledge base contains rules and facts.
2. Interface engine: It processes data throughout the system. It acts as a search
engine that locates relevant information based on the requests.
3. User interface: It represents how the knowledge-based system appears to
users on the computer. This allows users to interact with the system and
submit requests.

Computer Engineering Department - College of Computer Science Page 11


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

Rule-Based System (RBS)


• RBSs are KBSs that use a set of rules to make decisions or draw inferences.
• A rule-based system consists of a set of IF-THEN rules, a set of facts (assertions),
and some interpreter controlling the application of the rules, given the facts.
• RBS leverages explicit knowledge to make decisions, draws inferences and
provides intelligent responses in a variety of applications, ranging from expert
systems to business rule engines.

Example 9
A boiler control system produces steam to drive a turbine and generator. Water is heated
in the boiler tubes to produce a steam and water mixture that rises to the steam drum,
which is a cylindrical vessel mounted horizontally near the top of the boiler. The purpose
of the drum is to separate steam from the water. Steam is taken from the drum, passed
through the superheater, and applied to the turbine that turns the generator. Many sensors
are fitted to the drum to monitor the following parameters
1. The temperature of the steam
in the drum.
2. The level of water in the
drum (monitored by the voltage
output from a transducer).
3. The status of the pressure
release valve (open or closed).
4. The water flow rate through
the control valve

Suggest a rule-based system to monitor the state of a power station boiler and to advise
appropriate actions.

Computer Engineering Department - College of Computer Science Page 12


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

Answer 9
Rule 1 IF transducer output is low THEN water level is low
Rule 2 IF water level is low THEN open the control valve
Rule 3 If steam pressure is low THEN start the boiler tubes
Rule 4 IF (temperature is high AND water level is low)
THEN (open control valve AND shutdown boiler tubes)
Rule 5 IF steam outlet is blocked THEN replace the outlet pipe
Rule 6 IF pressure release valve is stuck THEN steam outlet is blocked
Rule 7 IF (temperature is high AND NOT (water level is low)) THEN steam pressure is
high
Rule 8 IF steam pressure is high THEN shutdown boiler tubes
Rule 9 IF (pressure is high AND pressure release valve is closed) THEN pressure release
valve is stuck
Rule 10 IF (pressure release valve is open AND water flow rate is high) THEN steam is
escaping
Rule 11 IF steam is escaping THEN steam outlet is blocked
Rule 12 IF water flow rate is low THEN control valve is closed

• The input data to the system (sensor readings) are low-level facts; higher-level facts
are facts derived from them.
• Rules 2, 3, 4, 5, and 8 give recommendations to the boiler operators.
In a fully automated system, such rules would be able to perform their recommended actions
rather than simply making a recommendation.
• The remaining rules involve taking a low-level fact, such as a transducer reading,
and deriving a higher-level fact, such as the quantity of water in the drum.
• Rule 1 is a low-level rule since it depends on a transducer reading.
• Rule 5 is a high-level rule that uses more abstract information (It relates the
occurrence of a steam outlet blockage to a recommendation to replace a pipe).

• Most of the rules of this system are specific to one boiler arrangement and would
not apply to other situations.

Computer Engineering Department - College of Computer Science Page 13


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

• Rule 7 expresses a fundamental rule of physics (the boiling temperature of a liquid


increases with increasing applied pressure).
This is valid under any circumstances and is not specific to the boiler system. It is an example of
a deep rule expressing deep knowledge.

Rule Working Steps


• The task of interpreting and applying the rules belongs to the inference engine.
• The application of rules can be broken down as follows:
1. Selecting rules to examine — these are the available rules (rule base).
2. Determining which rules are applicable (conflict set) — The inference engine
examines each rule condition (IF) in the rule base with facts in the working memory to
form the triggered rules whose conditions are satisfied based on the working memory.
3. Selecting a rule to fire (Conflict Resolution): Apply a conflict resolution strategy if
more than one rule is applicable.
4. Rule firing: When the rule is fired, any actions specified in THEN clause are carried out.
• The action can modify the working memory (add a new fact), the rule base itself,
or do just about anything else the system programmer decides to include.
• Obtain inference chains.
• Stop (or exit) when the conclusion is added to the working memory or if there is a
rule that specifies to end the process.

Example 10
RBS has access to the transducer output and the temperature readings of the boiling
control system. What is the applicable rule if the temperature is high, and transducer level
is found to be LOW?

Answer 10
A sensible set of rules to examine would be Rules 1, 4, and 7, as these rules are
conditional on the boiler temperature and transducer output.

Computer Engineering Department - College of Computer Science Page 14


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

Rule 1 IF transducer output is low THEN water level is low


Rule 4 IF (temperature is high AND water level is low)
THEN (open control valve AND shut down boiler tubes)
Rule 7 IF (temperature is high AND NOT (water level is low)) THEN pressure is high

• If the transducer level is found to be LOW, then Rule 1 is applicable.


• If Rule 1 is selected and used to make the deduction water level is low, then the rule

is said to have fired.


• If the rule is examined but cannot fire (because the transducer reading is not low), the
rule is said to fail.
• The condition part of Rule 4 can be satisfied only if Rule 1 has been fired. For this
reason, it makes sense to examine Rule 1 before Rule 4. If Rule 1 fails, then Rule 4
need not be examined as it too will fail.

Closed-World Assumption
This assumption assumes a proposition is FALSE if we do not know it is TRUE.

Example 11
Consider a rule-based system in a medical diagnosis application.
IF patient has a fever AND patient has a cough, THEN recommend a flu test
Show the rule-firing process of this system.

Answer 11
If the input data indicates that the patient has a fever and cough, the conditions of the rule
are satisfied, and the rule is fired. The system then recommends a flu test.

Inference Chain
• It is the sequence of logical steps or reasoning processes that are followed by an
intelligent system to derive a conclusion or make an inference.

Computer Engineering Department - College of Computer Science Page 15


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

• It represents the flow of rule evaluations and activations that lead to a final
decision or outcome.

Example 12
RBS with the facts A, C, D, Rule 1: IF (A is TRUE AND C is TRUE) THEN B is TRUE
and E and a rule base as given Rule 2: IF (C AND D) THEN F

by the rules 1 to 5. Draw the Rule 3: IF (C AND D AND E) THEN X

inference chain of this system. Rule 4: IF (A AND B AND X) THEN Y


Rule 5: IF (D AND Y) THEN Z

Answer 12

Conflict Resolution Strategies


Conflict resolution is the method of choosing one rule to fire from those that can be
fired.
In many intelligent systems, the order in which rules are used affects the conclusion.
• First applicable Resolution:
[1] It fires the first applicable rule.
[2] When conflicts occur, the system selects the first applicable rule.
• Priority-Based Resolution:
[1] It assigns a priority or rank to each rule, indicating its importance or precedence.
[2] When conflicts occur, the system selects the rule with the highest priority or rank.

Computer Engineering Department - College of Computer Science Page 16


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

• Specificity-Based Resolution (Longest Matching Strategy):


[1] Evaluate the specificity or granularity of rule conditions (based on the number of conditions
of the rules).
[2] This assumes that if the rule has the most conditions, then it has the most relevance to the
existing data.
[3] More specific rules (the rule with the most conditions is chosen) take precedence over
general rules.
• Weighted Resolution:
[1] Assign weights to rules to indicate their relative importance.
[2] Conflicts are resolved by considering the rule with the highest weight.
• Lexicographic Resolution:
[1] Order rules lexicographically based on specific attributes or criteria.
[2] Evaluate rules in lexicographic order, selecting the first rule that matches the conditions.
• Random Selection:
[1] Randomly select one of the conflicting rules.
[2] The system chooses a rule at random when conflicts occur.
• Temporal Resolution (Least Recently Used Strategy):
[1] Consider the temporal aspects of rules, such as their creation or modification time.
[2] The rule created or modified least recently takes precedence.
• User-Defined Resolution:
[1] Allow system users to define resolution strategies based on specific requirements.
[2] Users specify how conflicts should be resolved.
• Rule Combination:
[1] Combine the actions or conclusions of conflicting rules.
[2] Instead of selecting one rule, the system combines the actions of multiple conflicting
rules.
• Fallback Strategies:
[1] Define a fallback or default rule to be applied when conflicts cannot be resolved.
[2] If conflicts persist, the fallback rule is selected.

Computer Engineering Department - College of Computer Science Page 17


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

Inference Engine
1. Forward-chaining (data-driven): Rules are selected and applied in response to the current
fact base.
2. Backward-chaining (goal-driven):

Forward-chaining (Data-driven) Inference Engine


• Data-driven inference engine takes the available information (the “given” facts) and
keeps using the rules to generate as many derived facts as it can.
• The output is therefore unpredictable and may have either
1. The advantage of leading to novel or innovative solutions to a problem or
2. The disadvantage of wasting time generating irrelevant information.
• The data-driven approach might typically be used for problems of interpretation,
where we wish to know whatever, the system can tell us about some data.

Example 13
Show how forward chaining is applied for the system of Example 12 to conclude Z. The
given facts are A, C, D, and E. If multiple rules can fire at a time, then fire the first rule,
which was not fired before.

Computer Engineering Department - College of Computer Science Page 18


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

Answer 13
Cycle 1:
• Matching for generating the conflict set: Match the IF part of each rule against
facts in the working memory (A, C, D, E).
Rule 1: IF (A AND C) THEN B Yes, both A and C are in the database
Rule 2: IF (C AND D) THEN F Yes, since both C and D are in the database
Rule 3: IF (C AND D AND E) THEN X Yes, since all C, D, and E are in the database
Rule 4: IF (A AND B AND X) THEN Y No, X is not in the database at this moment
Rule 5: IF (D AND Y) THEN Z No, Y is not in the database at this moment

• Conflict resolution: among Rule 1, Rule 2, and Rule 3, select the first one if not
applied earlier. Thus, Rule 1 will be fired first.
• Apply the rule (If new facts are obtained add them to working memory).
The consequent of rule 1 is B which is not in the database, so add the new fact.
• Stop condition: Z has not been reached yet, so go again to the first step.

The following illustrates the whole process:


Cycle 1
Working memory A, C, D, E
Conflict set Rule 1, Rule 2, and Rule 3
Conflict resolution Rule 1
Apply the rule The consequent of rule 1 is B which is not in the database, so add the new
fact to the database
Stop (or exit) condition Our conclusion, Z, has not been reached yet.

Cycle 2
Working memory A, B, C, D, E
Conflict set Rule 1, Rule 2, and Rule 3
Conflict resolution Rule 2
Apply the rule The consequent of rule 2 is F which is not in the database, so add the new
fact to the database
Stop (or exit) condition Our conclusion, Z, has not been reached yet

Computer Engineering Department - College of Computer Science Page 19


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

Cycle 3
Working memory A, B, C, D, E, F
Conflict set Rule 1, Rule 2, and Rule 3
Conflict resolution Rule 3
Apply the rule The consequent of rule 3 is X which is not in the database, so add the new
fact to the database
Stop (or exit) condition Our conclusion, Z, has not been reached yet

Cycle 4
Working memory A, B, C, D, E, F, X
Conflict set Rule 1, Rule 2, Rule 3, and Rule 4
Conflict resolution Rule 4
Apply the rule The consequent of rule 3 is Y which is not in the database, so add the new
fact to the database
Stop (or exit) condition Our conclusion, Z, has not been reached yet

Cycle 5
Working memory A, B, C, D, E, F, X, Y
Conflict set Rule 1, Rule 2, Rule 3, Rule 4, and Rule 5
Conflict resolution Rule 5
Apply the rule The consequent of rule 5 is Z which is not in the database, so add the new
fact to the database
Stop (or exit) condition Our conclusion, Z, has been reached yet. Stop

Backward-chaining Inference Engine


• Goal-driven inference engine starts with the stated hypothesis (or goal). Then, the
inference engine tries to find evidence to prove it.
1. If the evidence does not match, then we must start over with a new hypothesis.
2. If the evidence matches, then the correct hypothesis has been made.
• Initially, only those rules that can lead directly to the fulfillment of the goal are
selected for examination. Then the system works backwards from a hypothesized
goal, attempting to prove it by linking the goal to the initial facts.

Computer Engineering Department - College of Computer Science Page 20


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

• To backward chain from a goal in the working memory, the inference engine must
follow the steps:
1. Select rules with conclusions matching the goal.
2. Replace the goal by the rule's premises. These become sub-goals.
3. Work backwards until all sub-goals are known to be true. The backtracking takes
place if:
a. The goal cannot be satisfied by the set of rules currently under consideration; or
b. The goal has been satisfied but the user wants to investigate other ways of
achieving the goal (i.e., to find other solutions).
• Example: A backward-chaining system might be presented with the proposition: a
plan exists for manufacturing a widget.
It will then attempt to ascertain the truth of this proposition by generating the plan, or it may
conclude that the proposition is false, and no plan is possible.
• This strategy is appropriate when a more tightly focused solution is required.

Computer Engineering Department - College of Computer Science Page 21


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

Example 14
Show how backward chaining is applied for the system of Example 12 to conclude Z.
The given facts are A, C, D, and E.

Answer 14
Cycle 1
Rule matching the goal The only rule with a conclusion matching the goal is Rule 5.
Rule 5: IF (D AND Y) THEN Z
Rule's premises D is in the database, but we do not have Y. Add Y as a sub-goal
Stop (or exit) condition The sub-goal is not true, so we back-chain again.
New Working memory A, C, D, E Goals Z, Y

Cycle 2
Rule matching the goal Rule 4 has Y as a conclusion.
Rule 4: IF (A AND B AND X) THEN Y
Rule's premises A and B are in the database, but we do not have X. Add B and X as sub-
goals.
Stop (or exit) condition All goals are not true, so we back-chain again
New Working memory A, C, D, E Goals Z, Y, B, X

Cycle 3
Rule matching the goal Rule 3 has X as a conclusion.
Rule 3: IF (C AND D AND E) THEN X
Rule's premises All premises C, D, and E are in the database. Remove X from goals, add it to
the database, and fire Rule 3.
Stop (or exit) condition The remaining goals are not true, so we back-chain again
New Working memory A, C, D, E, X Goals Z, Y, B

Cycle 4
Rule matching the goal Rule 3 has B as a conclusion.
Rule 1: IF (A AND C) THEN B
Rule's premises All premises A and C are in the database. Remove B from goals, add it to the
database, and fire Rule 1.
Stop (or exit) condition The remaining goals are not true, so we back-chain again

Computer Engineering Department - College of Computer Science Page 22


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

New Working memory A, B, C, D, E, X Goals Z, Y

Cycle 5 - Go back to Cycle 2


Rule matching the goal Rule 4 has Y as a conclusion.
Rule 4: IF (A AND B AND X) THEN Y
Rule's premises All premises A, B, and X are in the database. Remove Y from goals, add it to
the database, and fire Rule 4.
Stop (or exit) condition The remaining goals are not true, so we back-chain again
New Working memory A, B, C, D, E, X, Y Goals Z

Cycle 6 - Go back to Cycle 1


Rule matching the goal Rule 5 has Z as a conclusion.
Rule 5: IF (D AND Y) THEN Z
Rule's premises All premises D and Y are in the database. Remove Z from goals, add it to the
database, and fire Rule 5.
Stop (or exit) condition All goals are true, so stop the process
New Working memory A, B, C, D, E, X, Y, Z Goals

Meta-Rules
Meta knowledge is extra knowledge about the knowledge the system possesses to improve
its performance
Meta-rules are rules which are not specifically concerned with knowledge about the
application at hand, but rather with knowledge about how it should be applied.
• Meta-rules define how conflict resolution will be used, and how other aspects of
the system itself will run.
• Meta-rules are “rules about rules” (or more generally, “rules about knowledge”).
• Some examples of meta-rules might be:
Meta-Rule 1: PREFER rules about shutdown TO rules about control valves
Meta-Rule 2: PREFER high-level rules TO low-level rules

Computer Engineering Department - College of Computer Science Page 23


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

Explanation Module
• This module is made in support of RBS to explain its reasoning. This gives users
of the system confidence in the accuracy or wisdom of the system’s decisions.
• The explanation can be divided into two categories:
1. How has the conclusion been derived? (would normally be applied when the system has
completed its reasoning)
2. Why a particular line of reasoning is being followed (It is applicable while the system is
carrying out its reasoning process). This type is appropriate for an interactive intelligent
system, which involves a dialogue between a user and the computer. During such a
dialogue the user will often want to establish why particular questions are being asked.
• If either type of explanation is incorrect or impenetrable, the user is likely to
distrust or ignore the system’s findings.
• Explanation facilities are desirable for increasing user confidence in the system, as
a teaching aid and as an aid to debugging.
• The quality of explanation can be improved by placing an obligation on the rule-
writer to provide an explanatory note for each rule.

Example 15
For the boiler control system, the following would be a typical explanation for a
recommendation to replace the outlet pipe.

Answer 15
Replace outlet pipe
BECAUSE (Rule 3) steam outlet is blocked

steam outlet is blocked


BECAUSE (Rule 4) release valve is stuck

release valve is stuck


BECAUSE (Rule 5) pressure is high AND release valve is closed

Computer Engineering Department - College of Computer Science Page 24


Intelligent Systems, 2024 Chapter 2.1 Dr. Mohammad Alshamri

pressure is high
BECAUSE (Rule 7) temperature is high AND NOT(water level is low)

NOT(water level is low)


BECAUSE (Rule 8) NOT(transducer output is low)

release valve is closed


temperature is high are supplied facts
NOT(transducer output is low)

Computer Engineering Department - College of Computer Science Page 25


Intelligent Systems, 2024 Chapter 2.2 Dr. Mohammad Alshamri

Decision Trees
Decision Tree: It is a tree-structured classifier for getting all the possible
solutions to a problem based on given conditions. Each internal node corresponds
to an attribute (feature), and every terminal node corresponds to a class (label).
• Decision tree identifies the best possible course of action using a set of hierarchical
decisions on the features.

General Structure of a Decision Tree


The elements of a decision tree representation have the following meaning:
1. Each internal node tests a feature (attribute).
2. Each branch corresponds to an attribute value (Decision rule, Split criterion).
3. Each leaf node assigns a class.

• Decision tree is constructed by recursively partitioning the input data into subsets
based on the values of the input variables.
• Decision tree is used for classification of unseen test instances with the use of top-
down traversal from the root to a unique leaf.
• The algorithm stops the growth of the tree based on a stopping criterion.
• The stopping criterion could be:
1. a maximum depth for the tree,

Computer Engineering Department - College of Computer Science Page 1


Intelligent Systems, 2024 Chapter 2.2 Dr. Mohammad Alshamri

2. a minimum number of instances in each leaf node,


3. all training examples in the leaf belong to the same class
4. or other criteria.
• The two basic operations for building the decision tree are splitting and pruning:
1. Splitting: It is the process of dividing the decision node/root node into sub-
nodes according to the given conditions.
2. Pruning: It is the process of removing unwanted branches from the tree.

Example 1
Suppose a candidate who has a job offer and wants to decide whether he should Accept
the offer or Not based on three features. Assume the features’ order is Salary (>50000$),
Distance from the office, and Cab facility. Suggest a decision tree for solving this
problem.

Answer 1
The decision tree starts with the root
node (Salary) that splits further into the
next decision node and one leaf node
(Declined offer).
The next decision node further splits
into one decision node and one leaf
node.
Finally, the decision node splits into
two leaf nodes (Accepted offers and
Declined offer).

Computer Engineering Department - College of Computer Science Page 2


Intelligent Systems, 2024 Chapter 2.2 Dr. Mohammad Alshamri

Example 2
A person will try to decide if he/she should Age Experience Rank Nationality Go
36 10 9 UK NO
go to a comedy show or not based on the 42 12 4 USA NO
23 4 6 N NO
registered information about the comedian. 52 4 4 USA NO
The decision tree can be used to decide if 43 21 8 USA YES
44 14 5 UK NO
any new shows are worth attending to. 66 3 7 N YES
35 14 9 UK YES
52 13 7 N YES
35 5 9 N YES
24 3 5 USA NO
18 3 7 UK YES
45 9 9 UK YES

Example 3
Assume Ali has recorded various attributes of the weather and whether his friend Basel
played tennis or not over two weeks and two weeks.

• For each example, we have five feature values: day, outlook, temperature,
humidity, and wind.
• In fact, Day is not a useful feature since it is different for every example. So, we
will focus on the other four input features.
• Given a data set, we can generate many different decision trees.

Computer Engineering Department - College of Computer Science Page 3


Intelligent Systems, 2024 Chapter 2.2 Dr. Mohammad Alshamri

General Structure of a Decision Tree


Step-1: Begin the tree with the root node which contains the complete dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
Step-3: Divide the node into subsets that contain possible values for the best attribute.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created in
step-3.
Continue this process until a stage is reached where you cannot further classify
the nodes and is called the final node as a leaf node

Attribute Selection Measure (ASM)


• ASM is a measure to select the best attribute for splitting the tree (Selects the
attribute that maximizes the separation of the different classes among the children’s
nodes).
• A node/attribute having the highest ASM is split first.
• There are two popular techniques for ASM, which are:
1. Information Gain
2. Gini Index

Entropy
• Entropy measures the: (all having the same meaning)
1. amount of information contained in a class.
2. class impurity associated with a given attribute.
3. randomness of a given feature.
• It is the highest for a feature with equal probable classes and reduces as some
classes appear more (If entropy is high, the randomness is high).
• In general, entropy for a given feature having 𝑆 classes is:
𝑆 𝑆
1
𝐻 = ∑ 𝑝𝑖 log 2 ( ) = − ∑ 𝑝𝑖 log 2 𝑝𝑖
𝑝𝑖
𝑖=1 𝑖=1

Computer Engineering Department - College of Computer Science Page 4


Intelligent Systems, 2024 Chapter 2.2 Dr. Mohammad Alshamri

𝑚𝑖
where 𝑝𝑖 = is the probability of the 𝑖 𝑡ℎ class.
𝑀

• For a binary class, 𝑝1 probability of ‘No’, 𝑝2 probability of ‘Yes’


𝐻 = −𝑝1 log 2 𝑝1 − 𝑝2 log 2 𝑝2
• If 𝐻 = 0, then the feature is perfectly classified (It has the same class).
• Assume a feature, F with S values to identify the class. Then each value will
represent a subset of data with some classes and hence we can find the entropy for
it, 𝐻𝐹:𝑠 .
• The total entropy for this feature is:
𝑠=𝑆

𝐻𝐹 = ∑ 𝑤𝑠 × 𝐻𝐹:𝑠
𝑠=1

where 𝑤𝑠 is the proportion of the feature value, 𝑠, within the feature, 𝐹.

Information Gain
• Information gain measures the change in entropy after the segmentation of a
dataset (𝐷) based on an attribute, 𝐹 (How much uncertainty in 𝐷 was reduced after
splitting it based on attribute 𝐹 ).
• The information gain is:

𝐼𝐺𝐹 = 𝐻𝐶:𝐷 − ∑ 𝑤𝑠 × 𝐻𝐶:𝐹:𝑠 = 𝐻𝐶:𝐷 − 𝐻𝐶:𝐹


𝑠∈𝑆

𝐷𝐹:𝑠 represents the data subset created from splitting the dataset based on the value,
𝑠, of the attribute 𝐹 such that 𝐷 = ⋃𝑠∈𝑆 𝐷𝐹:𝑠 .
|𝐷𝐹:𝑠 |
𝑤𝑠 = |𝐷|
is the proportion of the number of elements in 𝐷𝐹:𝑠 to the number of

elements in 𝐷.
𝐻𝐶:𝐹:𝑠 is the entropy of subset 𝐷𝐹:𝑠 .

Computer Engineering Department - College of Computer Science Page 5


Intelligent Systems, 2024 Chapter 2.2 Dr. Mohammad Alshamri

Gini Impurity Index


• The Gini impurity measures how often a randomly chosen element from the set
would be incorrectly labeled if it were randomly labeled according to the
distribution of labels in the subset (It measures the probability of misclassification of a
random sample).
• Attribute with the low Gini index should be preferred as compared to the high one.
• The Gini impurity of a node can be calculated as follows:
𝑖=𝐶

𝐺 = 1 − ∑ 𝑝𝑖2
𝑖=1

where 𝐶 is the number of classes, and 𝑝𝑖 is the probability of a randomly chosen


element in the node being labeled as class, 𝑐𝑖 .
• The value of 𝐺 ranges between 0 and a maximum value of 0.5.
• The overall Gini impurity for the split can be calculated as a weighted average of
the Gini impurities for the child nodes, where the weights are proportional to the
number of data points in each node: (𝑆 is the number of splits, if it is binary, then 𝑆 = 2)
𝑠=𝑆

𝐺𝑠𝑝𝑙𝑖𝑡 = ∑ 𝑤𝑠 × 𝐺𝑠
𝑠=1

Example 4
Suppose a binary classification problem for whether a
person will buy a particular product based on his age
and income. Find the overall Gini impurity for the
first split if ASM chooses age with a threshold of 35.

Computer Engineering Department - College of Computer Science Page 6


Intelligent Systems, 2024 Chapter 2.2 Dr. Mohammad Alshamri

Answer 4
• The left child node will contain the data where age is less than or equal to 35, and
the right child node will contain the data where age is greater than 35.
• With 𝑎𝑔𝑒 ≤ 35, there are two data points, one of which buys the product, and one
of which does not (the probability of a randomly chosen element being labeled as "Yes" or
"No" is 1/5, 𝑝1=𝑌𝑒𝑠 = 𝑝2=𝑁𝑜 = 0.5). Therefore, the Gini index for this branch is:
𝑖=2

𝐺𝑎𝑔𝑒≤35 = 1 − ∑ 𝑝𝑖2 = 1 − (0.52 + 0.52 ) = 0.5


𝑖=1

• With 𝑎𝑔𝑒 > 35, there are three data points, two of which buys the product (the
probability of a randomly chosen element being labeled as "Yes" is 2/3, 𝑝1=𝑌𝑒𝑠 = 0.67), and

one of which does not (the probability of a randomly chosen element being labeled as "No"
is 1/3, 𝑝2=𝑁𝑜 = 0.33) Therefore, the Gini index for this branch is:
𝑖=2

𝐺𝑎𝑔𝑒>35 = 1 − ∑ 𝑝𝑖2 = 1 − (0.672 + 0.332 ) = 0.44


𝑖=1

• The overall Gini impurity for the split is:


𝑠=2

𝐺𝑠𝑝𝑙𝑖𝑡 = ∑ 𝑤𝑠 × 𝐺𝑠 = 𝑤𝑎𝑔𝑒≤35 × 𝐺𝑎𝑔𝑒≤35 + 𝑤𝑎𝑔𝑒>35 × 𝐺𝑎𝑔𝑒>35


𝑠=1
2 3
𝐺𝑠𝑝𝑙𝑖𝑡 = × 0.5 + × 0.444 ≈ 0.48
5 5
• The decision tree algorithm will try different thresholds and different features to
find the split that minimizes the Gini impurity.

Example 5
Assume a dataset of patients with information about their age, gender, blood pressure,
and cholesterol level to identify whether patients have heart disease.
1. Calculate the Gini impurity of the entire dataset.

Computer Engineering Department - College of Computer Science Page 7


Intelligent Systems, 2024 Chapter 2.2 Dr. Mohammad Alshamri

2. Assume ASM chooses age with a


Patients “Yes” “No”
threshold of 50 as the first split Dataset 500 200 300
feature. Calculate the Gini impurity 𝑎𝑔𝑒 ≤ 50 300 100 200
for this split using age. 𝑎𝑔𝑒 > 50 200 100 100

Answer 5
• Gini impurity of the entire dataset (𝑝1=𝑌𝑒𝑠 = 500 = 0.4) (𝑝2=𝑁𝑜 = 500 = 0.6) is:
200 300

𝑖=2

𝐺𝐷 = 1 − ∑ 𝑝𝑖2 = 1 − (0.42 + 0.62 ) = 0.48


𝑖=1

• Gini impurity of the split, 𝑎𝑔𝑒 ≤ 50 (𝑝1=𝑌𝑒𝑠 = 300 = 0.33) (𝑝2=𝑁𝑜 = 300 = 0.67) is:
100 200

𝑖=2

𝐺𝑎𝑔𝑒≤50 = 1 − ∑ 𝑝𝑖2 = 1 − (0.332 + 0.672 ) = 0.44


𝑖=1

• The Gini impurity of the split, 𝑎𝑔𝑒 > 50 (𝑝1=𝑌𝑒𝑠 = 𝑝2=𝑁𝑜 = 200 = 0.5) is:
100

𝑖=2

𝐺𝑎𝑔𝑒>50 = 1 − ∑ 𝑝𝑖2 = 1 − (0.52 + 0.52 ) = 0.5


𝑖=1

Decision Tree Learning Algorithm


• Iterative Dichotomiser 3
• C4.5 (Statistical Classifier): it is an improvement to ID3. C4.5 builds decision trees
from a set of training data in the same way as ID3.
• CART (Classification and Regression Tree).

Iterative Dichotomiser 3 Algorithm


ID3, the earliest algorithm, works by selecting the best attribute to split the data at each
node using entropy and information gain.

Computer Engineering Department - College of Computer Science Page 8


Intelligent Systems, 2024 Chapter 2.2 Dr. Mohammad Alshamri

ID3 Steps
1. Calculate the Information Gain of each feature.
2. If all rows of a feature value do not belong to the same class, split the feature into
subsets using the feature value for which the Information Gain is maximum.
3. Fix a decision tree node using the feature with the maximum Information gain.
4. If all rows belong to the same class, make the current node as a leaf node with the
class as its label.
5. Repeat for the remaining features until you run out of all features, or the decision
tree has all leaf nodes.

Example 6
Build the decision tree for the following Outlook Temp Humidity Windy Play Golf
Rainy Hot High FALSE No
Golf dataset. Rainy Hot High TRUE No
Overcast Hot High FALSE Yes
Answer 6 Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
The entropy of the dataset is: Sunny Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
5 5 9 9
𝐻𝐷 = − ( ) log 2 ( ) − ( ) log 2 ( ) Rainy Mild High FALSE No
14 14 14 14 Rainy Cool Normal FALSE Yes
𝐻𝐷 = 0.94 Sunny Mild Normal FALSE Yes
Rainy Mild Normal TRUE Yes
There are three categories for Outlook: Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Sunny, Overcast, and Rainy Sunny Mild High TRUE No
The entropy for each category is:
Outlook No Yes H
2 2 3 3
Sunny 2 3 𝐻𝑂𝑢𝑡𝑙𝑜𝑜𝑘:𝑆𝑢𝑛𝑛𝑦 = − ( ) log 2 ( ) − ( ) log 2 ( ) = 0.971
5 5 5 5
0 0 4 4
Overcast 0 4 𝐻𝑂𝑢𝑡𝑙𝑜𝑜𝑘:𝑂𝑣𝑒𝑟𝑐𝑎𝑠𝑡 = − ( ) log 2 ( ) − ( ) log 2 ( ) = 0
4 4 4 4
3 3 2 2
Rainy 3 2 𝐻𝑂𝑢𝑡𝑙𝑜𝑜𝑘:𝑅𝑎𝑖𝑛𝑦 = − ( ) log 2 ( ) − ( ) log 2 ( ) = 0.971
5 5 5 5
The total entropy for this feature is:

Computer Engineering Department - College of Computer Science Page 9


Intelligent Systems, 2024 Chapter 2.2 Dr. Mohammad Alshamri

𝑠=𝑆
5 4 5
𝐻𝑂𝑢𝑡𝑙𝑜𝑜𝑘 = ∑ 𝑤𝑠 × 𝐻𝑂𝑢𝑡𝑙𝑜𝑜𝑘:𝑠 = ( ) × 0.971 + ( ) × 0 + ( ) × 0.971 = 0.693
14 14 14
𝑠=1
𝑠=𝑆

𝐼𝐺𝑂𝑢𝑡𝑙𝑜𝑜𝑘 = 𝐻𝐷 − ∑ 𝑤𝑠 × 𝐻𝑂𝑢𝑡𝑙𝑜𝑜𝑘:𝑠 = 𝐻𝐷 − 𝐻𝑂𝑢𝑡𝑙𝑜𝑜𝑘 = 0.94 − 0.693 = 0.247


𝑠=1

The entropy for each category of Humidity (Normal and High) is:
Humidity No Yes H
4 4 3 3
Normal 4 3 𝐻𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦:𝑁𝑜𝑟𝑚𝑎𝑙 = − ( ) log 2 ( ) − ( ) log 2 ( ) = 0.985
7 7 7 7
1 1 6 6
High 1 6 𝐻𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦:𝐻𝑖𝑔ℎ = − (7) log 2 (7) − (7) log 2 (7) = 0.592
The total entropy for this feature is:
𝑠=𝑆
7 7
𝐻𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = ∑ 𝑤𝑠 × 𝐻𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦:𝑠 = ( ) × 0.985 + ( ) × 0.592 = 0.7885
14 14
𝑠=1
𝑠=𝑆

𝐼𝐺𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 𝐻𝐷 − ∑ 𝑤𝑠 × 𝐻𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦:𝑠 = 𝐻𝐷 − 𝐻𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 0.94 − 0.7885 = 0.1515


𝑠=1

The three categories for Temp are: Cool, Mild, and Hot. The entropy for each category is:
Temp No Yes H
1 1 3 3
Cool 1 3 𝐻𝑇𝑒𝑚𝑝:𝐶𝑜𝑜𝑙 = − ( ) log 2 ( ) − ( ) log 2 ( ) = 0.811
4 4 4 4
2 2 4 4
Mild 2 4 𝐻𝑇𝑒𝑚𝑝:𝑀𝑖𝑙𝑑 = − (6) log 2 (6) − (6) log 2 (6) = 0.918
2 2 2 2
Hot 2 2 𝐻𝑇𝑒𝑚𝑝:𝐻𝑜𝑡 = − ( ) log 2 ( ) − ( ) log 2 ( ) = 1
4 4 4 4
The total entropy for this feature is:
𝑠=𝑆
4 6 4
𝐻𝑇𝑒𝑚𝑝 = ∑ 𝑤𝑠 × 𝐻𝑇𝑒𝑚𝑝:𝑠 = ( ) × 0.811 + ( ) × 0.918 + ( ) × 1 = 0.91
14 14 14
𝑠=1
𝑠=𝑆

𝐼𝐺𝑇𝑒𝑚𝑝 = 𝐻𝐷 − ∑ 𝑤𝑠 × 𝐻𝑇𝑒𝑚𝑝:𝑠 = 𝐻𝐷 − 𝐻𝑇𝑒𝑚𝑝 = 0.94 − 0.91 = 0.03


𝑠=1

The two categories for Windy: TRUE and FALSE. The entropy for each category is:

Computer Engineering Department - College of Computer Science Page 10


Intelligent Systems, 2024 Chapter 2.2 Dr. Mohammad Alshamri

Windy No Yes H
3 3 3 3
TRUE 3 3 𝐻𝑊𝑖𝑛𝑑𝑦:𝑇𝑅𝑈𝐸 = − ( ) log 2 ( ) − ( ) log 2 ( ) = 1
6 6 6 6
2 2 6 6
FALSE 2 6 𝐻𝑊𝑖𝑛𝑑𝑦:𝐹𝐴𝐿𝑆𝐸 = − ( ) log 2 ( ) − ( ) log 2 ( ) = 0.811
8 8 8 8
The total entropy for this feature is:
𝑠=𝑆
6 8
𝐻𝑊𝑖𝑛𝑑𝑦 = ∑ 𝑤𝑠 × 𝐻𝑊𝑖𝑛𝑑𝑦:𝑠 = ( ) × 1 + ( ) × 0.811 = 0.892
14 14
𝑠=1
𝑠=𝑆

𝐼𝐺𝑊𝑖𝑛𝑑𝑦 = 𝐻𝐷 − ∑ 𝑤𝑠 × 𝐻𝑊𝑖𝑛𝑑𝑦:𝑠 = 𝐻𝐷 − 𝐻𝑊𝑖𝑛𝑑𝑦 = 0.94 − 0.892 = 0.048


𝑠=1

Outlook has the maximum IG and hence,


it will be used as the first attribute for
splitting.

Outlook Temp Humidity Windy Play Golf Outlook Temp Humidity Windy Play Golf
Rainy Hot High FALSE No Sunny Mild High FALSE Yes
Rainy Hot High TRUE No Sunny Cool Normal FALSE Yes
Rainy Mild High FALSE No Sunny Cool Normal TRUE No
Rainy Cool Normal FALSE Yes Sunny Mild Normal FALSE Yes
Rainy Mild Normal TRUE Yes Sunny Mild High TRUE No
The complete decision tree is:

Computer Engineering Department - College of Computer Science Page 11


Intelligent Systems, 2024 Chapter 2.2 Dr. Mohammad Alshamri

CART (Classification And Regression Tree) for Decision Tree


• CART is an advanced decision tree where each fork is split into a predictor
variable and each node has a prediction for the target variable at the end.
• The splitting criterion for CART is based on Gini impurity (for classification) or
variance reduction (for regression).
• CART works by recursively partitioning the training data into smaller subsets
using binary splits.
• At each node of the tree, the algorithm selects a feature and a threshold that best
separates the training data into two groups, based on the values of that feature.
• The process continues recursively until a stopping criterion is met.
• Advantages
1. CART can handle mixed input data types (including numerical and categorical
data) unlike many linear combination methods like logistic regression or SVM
2. CART is a simple and intuitive algorithm that is easy to understand and interpret.
3. CART can handle missing values by imputing them with surrogate splits.
4. CART can handle multi-class classification problems (multi-class CART).

Tree Pruning
• Pruning prevents overfitting by restricting or reducing tree growth.
• Pruning can be done using pre-pruning or post-pruning.
• Pre-pruning occurs before or during the growth of the tree.
• Post-pruning: It allows the tree to grow as deep as the data will allow, and then
trim (prune) branches that do not effectively change the classification error rates.
1. Advantage: Post-pruning may not miss significant relationships between attribute
values and classes if the tree is allowed to reach its maximum depth.
2. Disadvantage: It requires additional computations, which may be wasted when
the tree needs to be trimmed back.

Computer Engineering Department - College of Computer Science Page 12


Intelligent Systems, 2024 Chapter 2.2 Dr. Mohammad Alshamri

Rule Induction
• Rule induction is the process of deducing IF-THEN rules from a dataset.
• Decision rules explain an inherent relationship between the attributes and class
labels in a dataset.
• There are two ways:
1. Direct approach: Direct extraction from the dataset. This can be done using:
1. Sequential Covering using Repeated Incremental Pruning to Produce Error
Reduction (RIPPER) algorithm
2. Sequential Covering using Learn-One-Rule.
2. Indirect (Passive) approach: Derived from previously built decision trees from
the same dataset which is the easiest way to extract rules.

Example 7
Induce the set of rules from the decision tree of the Golf dataset of Example 6.
Answer 7
Rule 1 IF (Outlook = Overcast) THEN Play = Yes
Rule 2 IF (Outlook = Rainy) AND (Windy = FALSE) THEN Play = Yes
Rule 3 IF (Outlook = Rainy) AND (Windy 5 TRUE) THEN Play = No
Rule 4 IF (Outlook = Sunny) AND (Humidity = High) THEN Play = No
Rule 5 IF (Outlook = Sunny) AND (Humidity = Normal) THEN Play = Yes

Direct Rule Induction: Sequential Covering


• This approach iteratively attempts to find all the rules class by class directly from
the dataset.
• RIPPER is a popular rule-based learning algorithm that is used for constructing
rule sets from labeled training data.
• The extracted rules are tested using a validation set based on two metrics:

Computer Engineering Department - College of Computer Science Page 13


Intelligent Systems, 2024 Chapter 2.2 Dr. Mohammad Alshamri

Rule accuracy: It is the ratio of the correct records covered by the rule (𝑁𝑐 ) to all records
covered by the rule (𝑁).
𝑁𝑐
𝑅𝑎𝑐𝑐 =
𝑁
Pruning metric (It evaluates the need for pruning the rule): It calculates the difference
between positive (𝑁𝑝 ) and negative (𝑁𝑛 ) validation records covered by the rule to the
total.
𝑁𝑝 − 𝑁𝑛
𝑃𝑚𝑒𝑡𝑟𝑖𝑐 =
𝑁𝑝 + 𝑁𝑛

RIPPER Steps
1. The algorithm starts with the selection of class labels one by one (The first class is
usually the least-frequent class label)

2. Training stage: Develop all the rules for the selected class.
3. Validation stage: The rule model of the selected class is evaluated with a
validation dataset used for pruning to reduce generalization errors such that:
a. Iteratively remove a conjunct if it improves the pruning metric.
b. Aggregate all rules that identify the class data points to form a rule group.
4. In multi-class problems, steps 2 and 3 are repeated for the next class label.

Learn-One-Rule
1. Learn-one-rule starts with an empty rule condition set:
IF {} THEN first class
Obviously, the accuracy of this rule is the same as the proportion of “first class” data points in

the dataset.
2. Then the algorithm greedily adds conjuncts until the rule accuracy reaches 100%.
If the addition of a conjunct decreases the accuracy, then the algorithm:
a. looks for other conjuncts or
b. stops and starts the iteration of the next rule.

Computer Engineering Department - College of Computer Science Page 14


Intelligent Systems, 2024 Chapter 2.2 Dr. Mohammad Alshamri

3. After a rule is developed, then all the data points covered by that rule are
eliminated from the dataset and then
4. If some data points are left, start creating the second rule (Step 1) and so on.

Example 8
For the following rule, A, B, C, D are called conjuncts and Y is the class.
IF (A AND B AND C AND D) THEN Y
Discuss how rule pruning is applied for this rule.

Answer 8
Rule pruning first removes conjunct D and measures the metric value.
1. If the quality of the metric is improved conjunct D is removed.
2. If not, then the pruning is checked for CD, BCD and so on.

Example 9
Assume a dataset of two attributes (dimensions) on the
X and Y axis and two-class labels marked by “+” and
“-”. Illustrate how learn-one-rule is applied for this
dataset.
Answer 9
• The least-frequent class is “+”, therefore the
algorithm focuses on generating all rules for “+”
class.
• Learn-One-Rule starts developing the first rule
such that it should cover all “+” data points
using a rectilinear box with none or as few “-”
as possible.

Computer Engineering Department - College of Computer Science Page 15


Intelligent Systems, 2024 Chapter 2.2 Dr. Mohammad Alshamri

• Rule r1 is developed to identify the area of four


“+” in the top left corner.
𝑁𝑐 4
𝑅𝑎𝑐𝑐 = = =1
𝑁 4
• Remove the data points covered by r1.
• Rule r2 is developed to identify the area of three
“+” in the down center area.
𝑁𝑐 3
𝑅𝑎𝑐𝑐 = = =1
𝑁 3
• Now the rules for “-” will be developed to identify the reaming areas.

Computer Engineering Department - College of Computer Science Page 16


Intelligent Systems, 2024 Chapter 3.1 Dr. Mohammad Alshamri

• An agent is an assistant that takes


care of specific tasks for us.
• The environment is generally the
domain or world of the agent.
• Agent observes the environment
through sensors and acts upon the environment using actuators.
• Agent changes the environment by performing actions to achieve its goals.
• Intelligent agents learn or use knowledge to achieve their goals.
• An agent is autonomous if its behavior is determined by its own experience (with
the ability to learn and adapt)
• Software agent is an autonomous computer program that carries out tasks on
behalf of users.

Computer Engineering Department - College of Computer Science Page 1


Intelligent Systems, 2024 Chapter 3.1 Dr. Mohammad Alshamri

Example 1 (Thermostat Agent)

A thermostat agent senses the temperature of a physical system and performs actions to
maintain the temperature near a desired set point (a "closed loop" control device).

Percept
Percept is the agent’s perceptual input at any given instant.

Percept Sequence
It is the complete history of everything the agent has ever perceived.

Example 2 (Human Agent)


Percept Light, sound, solidity, ….
Sensors eyes, ears, skin, and other organs
Effectors (actuators) hands, legs, vocal tract, and so on
Actions Pickup, throw, speak, ….

Example 3
Agent Sensors Actuators
Robotic agent Cameras, infrared range finder, various motors, grippers, wheels,
microphone, accelerometers, … speakers, …
Software agent keystrokes, file contents, Displaying on the screen, writing
network packets, … files, sending network packets, ….

Example 4
Consider a hand-held calculator as an agent. For 2 + 5 = 7,
1. Specify the percept sequence?

Computer Engineering Department - College of Computer Science Page 2


Intelligent Systems, 2024 Chapter 3.1 Dr. Mohammad Alshamri

2. What is the action?

Answer 4
1. Percept sequence “2 + 5 =”
2. Action is displaying “7”

Agent Function (Action Selection Function)


Agent function is an abstract mathematical description that maps any given percept
sequence (the entire percept sequence observed to date) to an action.
𝑓: 𝑃 → 𝐴
• An agent can be defined simply by its agent function which describes the agent’s
behavior.

Agent Program
Agent program implements an agent function (accepts percepts, combines them with any
stored knowledge (internal state), and returns an action).

Example 5
Assume a two-location vacuum cleaner
1. Specify the percept?
2. What are the possible actions?
3. Write the agent function?

Answer 5

1. The vacuum agent perceives which square it is in and whether there is dirt in it.

Computer Engineering Department - College of Computer Science Page 3


Intelligent Systems, 2024 Chapter 3.1 Dr. Mohammad Alshamri

Location Cleanliness
A B A B

2. The agent can choose to move left, move right, suck up the dirt, or do nothing.

3. Agent function: It is defined through the following rules.


IF the current square is dirty, THEN suck
IF the current square is clean, THEN move to the other square
IF the two squares are clean, THEN do nothing

Example 6
What will happen if the set of actions is Left, Right, and Suck?

Answer 6
• Once all the dirt is cleaned up, the agent will oscillate needlessly back and forth,
• If the performance measure includes a penalty of one point for each movement left
or right, the agent will fare poorly.

Example 7
List the set of actions for a thermostat agent.

Answer 7
The actions are turning the heat ON or turning the heat OFF or taking NO action.

Agent State Representation


The state of an agent describes the status of the agent and its percepts. The state can be
atomic, factored, or structured.
• Atomic state:
✓ Each state of the world is a black box that has no
internal structure.
✓ Example: finding a driving route, each state is a city.

Computer Engineering Department - College of Computer Science Page 4


Intelligent Systems, 2024 Chapter 3.1 Dr. Mohammad Alshamri

• Factored state:
✓ Each state is defined by a set of features and each of
which has a value.
✓ Example: GPS location, amount of gas in the tank.

• Structured state:
✓ Each state is expressed in the form of objects and relations between them.
✓ Example: Natural language processor

Possible States for a Vacuum Cleaner


For 𝑛 squares, the possible states are:
𝑃𝑠 = 𝑛 × 2𝑛

Example 8
How many possible states for the two-square
vacuum cleaner.
Answer 8
The agent is in one of two locations, each of
which might or might not contain dirt.
𝑃𝑠 = 𝑛 × 2𝑛 = 2 × 22 = 8

States 7 and 8 are goal states

Computer Engineering Department - College of Computer Science Page 5


Intelligent Systems, 2024 Chapter 3.1 Dr. Mohammad Alshamri

Agent Function Tabulation


It is a table showing the percept sequence of an agent and the corresponding actions.

Environment Types
Fully observable: vs Partially
can access the observable: can
complete state of observe a subset of
the environment the environment
at each point in due to noisy,
time inaccurate or
incomplete sensor
data.
Deterministic: the next state of vs Stochastic: the next state of the environment is
the environment is completely random in nature which is not unique and cannot
determined by the current state be completely determined by the agent.
and the agent’s action A partially observable environment can appear to
be stochastic
(Strategic: the environment is deterministic
except for actions of other agents)
Episodic: agent’s experience is divided into vs Sequential: Previous and
independent, atomic episodes in which the agent current decision affects all
perceives and performs a single action in each episode. future decisions
Static: The environment is unchanged vs Dynamic: environment keeps changing
while an agent is deliberating (the agent itself when the agent is up with some
does not need to keep sensing while deciding

Computer Engineering Department - College of Computer Science Page 6


Intelligent Systems, 2024 Chapter 3.1 Dr. Mohammad Alshamri

what action to take, and does not need to action.


worry about time) Semidynamic: environment does not
change with time, but the agent’s
performance score does
Discrete: at any given state there are vs Continuous: at any given state there are
only finite actions to choose from. infinite actions to choose from
(Note discrete/continuous distinction applies to
states, time, percepts, or actions)

Competitive: it competes against vs Collaborative: multiple agents cooperate to


another agent to optimize the output. produce the desired output

Known: the output for all probable vs unknown: the agent should gain knowledge
actions is given. about how the environment works.

Single agent: An agent operating by vs Multiagent: many agents affect each other’s
itself in an environment. performance measure.

Example 9
Pick and Place robot is used to detect defective parts from the conveyor belts.

Environment Description
Agent Deterministic Episodic / Static / Discrete / Competitive /
Observable Agents
/ stochastic sequential dynamic continuous Collaborative
Crossword
fully deterministic sequential static discrete Competitive single
Puzzle
Deterministic:
The board is Discrete: it Competitive:
only a few
fully has only a the agents
possible moves
observable, finite compete to win
Chess at the current sequential static single
and so are the number of the game
state and these
opponent’s moves for which is the
moves can be
moves each game output
determined
Self-Driving Environment Stochastic: the sequential Dynamic: it Continuous Collaborative: multi

Computer Engineering Department - College of Computer Science Page 7


Intelligent Systems, 2024 Chapter 3.1 Dr. Mohammad Alshamri

Cars is partially actions are not is set in as their Agents


(Taxi driver) observable unique; they motion and actions are cooperate with
(Roller because vary from time the driving, each other to
coaster ride) what’s to time. environment parking, etc. avoid
around the keeps which collisions and
corner is not changing cannot be reach their
known. every numbered. destination
instant (goal)
Multi: it
involves
Football 11
game players
in each
team
Episodic:
the decision
depends on
the current
part (there
Pick and
is no
Place robot
dependency
between
current and
previous
decisions)

Performance Measure
1. It is an objective criterion for the success of an agent's behavior.
2. There is no fixed performance measure for all tasks and agents.
3. Intelligent agents are supposed to maximize their performance measure.

Example 10
List some performance measures for the vacuum cleaner.

Answer 10
1. Amount of dirt cleaned up,
2. Amount of time taken,
3. Amount of electricity consumed,
4. Amount of noise generated.

Example 11
Discuss the following two performance measures for vacuum cleaner
1. “The amount of dirt cleaned up in a single eight-hour shift.”

Computer Engineering Department - College of Computer Science Page 8


Intelligent Systems, 2024 Chapter 3.1 Dr. Mohammad Alshamri

2. “Clean floor: average cleanliness over time.”

Answer 11
1. “The amount of dirt cleaned up in a single eight-hour shift.”
A rational agent can maximize this performance measure by cleaning up the dirt,
then dumping it all on the floor, then cleaning it up again, and so on.
2. “Clean floor: average cleanliness over time.”
This rewards the agent for having a clean floor. For example, one point could be
awarded for each clean square at each time step (perhaps with a penalty for
electricity consumed and noise generated).

Task Environment of an Agent


▪ In designing an agent, the first step must always be to
specify the task environment as fully as possible.
▪ This comes by specifying Performance measure,
Environment, Actuators, and Sensors (PEAS Description)

Example 12
What is the PEAS description for an automated taxi driver?

Answer 12
Performance
Agent Environment Actuators Sensors
measure
Cameras, sonar,
Safe, fast, legal, Roads, other traffic, Steering, accelerator,
speedometer, GPS,
Taxi driver comfortable trip, pedestrians, brake, signals, horn,
odometer, accelerometer,
maximize profits customers display
engine sensors, keyboard

Example 13
Which is a more complex problem, an automated vacuum cleaner or an automated taxi
driver? Why?

Computer Engineering Department - College of Computer Science Page 9


Intelligent Systems, 2024 Chapter 3.1 Dr. Mohammad Alshamri

Answer 13
An automated taxi driver because there is no limit to the novel combinations of
circumstances that can arise.

Rational Agent
• Rational agent is one that does the right thing—conceptually speaking, every entry
in the table for the agent function is filled out correctly.
• The “right thing” can be specified by a performance measure defining a
numerical value for any environment history.
• Rational agent will choose actions to maximize some performance measure.

Rationality depends on 4 things:


1. Performance measure of success
2. Agent’s prior knowledge of the environment
3. Actions the agent can perform
4. Agent’s percept sequence to date

Rational Action
Whichever action maximizes the expected value of the
performance measure given the percept sequence to date.

Wumpus World Problem


▪ The wumpus world is a cave consisting of rooms connected by passageways.
▪ Lurking somewhere in the cave is the terrible wumpus, a beast that eats anyone
who enters its room.
▪ The wumpus can be shot by an agent, but the agent has only one arrow.
▪ Some rooms contain bottomless pits that will trap anyone who wanders into these
rooms (except for the wumpus, which is too big to fall in).

Computer Engineering Department - College of Computer Science Page 10


Intelligent Systems, 2024 Chapter 3.1 Dr. Mohammad Alshamri

▪ The only mitigating feature of this bleak environment is the possibility of finding a
heap of gold.

PEAS Description

Computer Engineering Department - College of Computer Science Page 11


Intelligent Systems, 2024 Chapter 3.1 Dr. Mohammad Alshamri

Agent Architectures
Reactive agent The decision-making is implemented in some form of direct mapping
from situation to action.
Logic-based agent The decision about what action to perform is made via logical deduction.
Belief-Desire- The decision-making depends upon the manipulation of data structures
Intention agents representing the beliefs, desires, and intentions of the agent.
Layered The decision-making is realized via various software layers, each of
architecture which is explicitly reasoning about the environment at different levels of
abstraction.

Agent Types
Agents can be grouped into five classes based on their degree of perceived intelligence
and capability.
• Simple Reflex Agent
• Model-based Agent.
• Goal-based Agent.
• Utility-based Agent.
• Learning Agent.

Computer Engineering Department - College of Computer Science Page 12


Intelligent Systems, 2024 Chapter 3.1 Dr. Mohammad Alshamri

Simple Reflex Agent


• Simple Reflex agent selects an action
based only on the current state.
• The agent function is based on
the condition-action rule.
• The agent function succeeds when
the environment is fully observable.

• This agent is rational only if a correct decision is made based on current precepts.
• Example:
✓ Robotic vacuum cleaner that deliberates in an infinite loop, each percept
contains a state of a current location [clean] or [dirty] and accordingly it decides
the action whether to [suck] or [continue moving].

✓ Medical diagnosis system:


IF the patient has reddish brown spots, THEN start the treatment for measles.

What is the Model?


• A model is a simplified representation (graphical, mathematical, symbolic, physical, or
verbal) of a real-world thing on a smaller scale than the original that shows what
the real thing looks like or how it works.

Computer Engineering Department - College of Computer Science Page 13


Intelligent Systems, 2024 Chapter 3.1 Dr. Mohammad Alshamri

Real Thing Model

𝑓(𝑥) = 𝑥 2

• The objectives of a model include


(1) to facilitate understanding by eliminating unnecessary components: Since
most real-world things are very complicated (have numerous parts) and
much too complex (parts have dense interconnections) to be comprehended
in their entirety, a model contains only those features that are of primary
importance to the model maker's purpose
(2) to aid in decision-making by simulating 'what if' scenarios,
(3) to explain, control, and predict events based on past observations.

Computer Engineering Department - College of Computer Science Page 14


Intelligent Systems, 2024 Chapter 3.1 Dr. Mohammad Alshamri

Abstraction
• Abstraction is a process of simplification by removing detail from a representation
and replacing it with concepts.
• For example, King Khalid University, without saying its position, state, or country.

Model-based Agent

• Model-based agent maintains an internal state via a model of the world to choose
the actions.
1. Model: The knowledge about “how things happen in the world”.
2. Internal State represents percept history which is the history of all that an agent
has perceived to date.
3. Model-based agent needs memory for storing the percept history.
• Model-based agent can handle partially observable environments by keeping
track of the part of the world it cannot see now (using a model about the world).
• Example: self-steering mobile vision where it is necessary to check the percept
history to fully understand how the world is evolving.
• To update the state, the model requires information about −
✓ How the world evolves independently of the agent.
✓ How the agent’s actions affect the world.

Computer Engineering Department - College of Computer Science Page 15


Intelligent Systems, 2024 Chapter 3.1 Dr. Mohammad Alshamri

Goal-based Agent
• Goal-based agent has a goal and a strategy to reach that goal.
• The agent program combines the goal information with the environment model to
choose the action that improves the progress towards the goal (not necessarily the
best one).
• Goal-based agent is proactive, not reactive in its decision-making.
• Two important aspects for goal-based agents are searching and planning.
• Example:
✓ GPS system to find a path to a certain destination.
✓ Any search robot that has an initial location and wants to reach a destination.

Utility-based Agent

Computer Engineering Department - College of Computer Science Page 16


Intelligent Systems, 2024 Chapter 3.1 Dr. Mohammad Alshamri

• Utility-based agent is the improved version of the goal-based agent.


• A utility function maps a state (or sequence of states) onto a real number, which
describes the associated degree of happiness (utility) of that state.
• Utility-based agents make decisions based on maximizing a utility value.
• Utility-based agent chooses the action that maximizes the utility of that state to
provide the best solution (It measures how good the outcome is).
• Utility-based agents are often used in applications where they must compare and
select among multiple options, such as resource allocation, scheduling, and game-
playing.
• Example:
✓ A GPS finding the shortest/fastest/safer to a certain destination.
✓ A route recommendation system that solves for the 'best' route to reach a
destination.

Agent Learning
The idea behind learning is that percepts should be used not only for acting, but also for
improving the agent's ability to act in the future:
• Learning is essential for unknown environments, i.e., when designer lacks
omniscience.
• Learning is useful as a system construction method, i.e., exposing the agent to
reality rather than trying to write it down.
• Learning modifies the agent's decision mechanisms to improve performance.

Learning Agent
• Learning agent learns from its past experiences to improve its performance and has
learning capabilities via machine learning techniques.

Computer Engineering Department - College of Computer Science Page 17


Intelligent Systems, 2024 Chapter 3.1 Dr. Mohammad Alshamri

• Learning agents follow a cycle of observing, learning, and acting based on


feedback. They:
1. Interact with their environment,
2. Learn from feedback,
3. Analyze performance,
4. Look for new ways to improve performance and
5. Modify their behavior for future interactions.
• Learning agent is the only agent that can perform in every type of environment.
• A learning agent has mainly four conceptual components: Critic, Learning element,
Performance element, and Problem generator.

• Learning element: It is responsible for making improvements by learning from


the environment.
1. Learning element modifies the performance element so that it makes better
decisions.
2. Learning element takes feedback from the critic.
The design of a learning element is affected by:
1. Which components of the performance element are to be learned?
2. What feedback is available to learn these components?
3. What representation is used for the components?

Computer Engineering Department - College of Computer Science Page 18


Intelligent Systems, 2024 Chapter 3.1 Dr. Mohammad Alshamri

• Critic: It is designed to tell the learning element how well the agent is doing with
respect to a fixed performance standard.
1. The critic employs a fixed standard of performance which is necessary because the
percepts themselves do not indicate the agent's success.
2. For example, a chess program may receive a percept indicating that it has
checkmated its opponent, but it needs a performance standard to know that this is
a good thing; the percept itself does not say so.
3. It is important that the performance standard is a fixed measure that is
conceptually outside the agent. Otherwise, the agent could adjust its performance
standards to meet its behavior.
• Performance element: It is responsible for selecting and executing external actions (It
decides what actions to take) based on the information from the learning element.
• Problem generator: It is responsible for suggesting actions that will lead to new and
informative experiences for the learning element to improve its performance.

Computer Engineering Department - College of Computer Science Page 19


Intelligent Systems, 2024 Chapter 3.2 Dr. Mohammad Alshamri

Goal-based Problem-Solving Agent


Problem Solving Agent tries to come up with a sequence of actions that will bring the
environment into a desired state

Important Terms
1. States: The possible world states, 𝑆 = {𝑠1 , 𝑠2 , 𝑠3 , … }
2. Initial state: s0
3. Actions: 𝐴 = {𝑎1 , 𝑎2 , 𝑎3 , … }
Given a state 𝑠, 𝐀𝐂𝐓𝐈𝐎𝐍𝐒(𝐬) returns the set of actions that can be executed for a
state 𝐬. We say that each of these actions is applicable in 𝑠.
4. Transition model (𝝆): This model describes what each action does for transiting
the agent from one state to another (Actions cause transitions between states).
𝜌: 𝑆 × 𝐴 → 𝑆
• Transition model is specified by a function 𝐑𝐄𝐒𝐔𝐋𝐓(𝐬, 𝐚) that returns the
state that results from doing action 𝐚 in state 𝐬.
• 𝐑𝐄𝐒𝐔𝐋𝐓(𝐬, 𝐚) means the agent knows the consequences of its actions.
5. Successors: 𝑠𝑢𝑐(𝐬) is the set of states reachable from a given state, 𝐬, by a single
action (Each action changes the state).
6. StepCost(s, a, s ′ ): It is the cost of taking action 𝐚 in state 𝐬 to reach state 𝐬 ′ .
7. Path (𝑷): It is a sequence of states connected by a sequence of actions from one
𝑎1 𝑎2 𝑎3 𝑎𝑁
state to another; 𝑠0 → 𝑠1 → 𝑠2 → … → 𝑠𝑁 : such that 𝑠𝑁 is a goal state.
𝑃 = [𝑠0 𝑎1 𝑠1 𝑎2 𝑠2 … 𝑎𝑁 𝑠𝑁 ] ∀𝑖 ∈ {1, … , 𝑁} 𝜌(𝑠𝑖−1 , 𝑎𝑖 ) = 𝑠𝑖
8. Goal test (𝑮) Function: It determine whether a given state is a goal state or not
(𝐺: 𝑆 → 𝑏𝑜𝑜𝑙).
9. Path cost: It is a function that assigns a cost to each path (The cost of a path is the
sum of the costs of individual actions along the path).

Computer Engineering Department - College of Computer Science Page 1


Intelligent Systems, 2024 Chapter 3.2 Dr. Mohammad Alshamri

10. Solution: It is the sequence of actions and states the agent takes from the initial
state to the final (goal) state.
• The solution quality is measured by the path cost function.
11. Optimal solution: It is the solution that has the lowest path cost among all
solutions.
12. Search: The process of looking for a solution sequence, involving a systematic
exploration of alternative actions.
13. State accessibility: It describes if the agent can determine via its sensors in which
state it is or not.
14. State space: the set of all states reachable from the initial state by any sequence of
actions. It if defined be initial state, actions, and transition model.

Well-Defined Problem Formulation


• Problem Formulation is the process of deciding what actions and states to
consider, given a goal and an initial state.
• A problem is a tuple (𝑆, 𝑠0 , 𝐴, 𝜌, 𝐺, 𝑃)
• Goal formulation: It defines a set of one or more (desirable) world states. The goal
is set based on the current situation and the agent’s performance measure.
• Problem Space Graph − It represents the problem space; states are shown by
nodes and operators are shown by edges.
• Intelligent agents are supposed to act in such a way that the environment goes
through a sequence of states that maximizes the performance measure.

Example 1
Water Jug Problem: You have a two-gallon jug and a one-gallon jug; neither have any
measuring marks on them at all. Initially both are empty. You need to get exactly one
gallon into the two-gallon jug. Formulate this problem.

Computer Engineering Department - College of Computer Science Page 2


Intelligent Systems, 2024 Chapter 3.2 Dr. Mohammad Alshamri

Answer 1
• A state is defined by the content of each jug, 𝑠 = (2_𝑔𝑎𝑙𝑙𝑜𝑛 𝑗𝑢𝑔, 1_𝑔𝑎𝑙𝑙𝑜𝑛 𝑗𝑢𝑔).
• 𝑺 = {0,1,2} × {0,1} = {(0,0), (1,0), (2,0), (0,1), (1,1), (2,1)}
• Initial state is 𝑠0 = (0,0)
• Goal: 𝐺 = {(1,0), (1,1)}
• 𝑨 = {𝑓2, 𝑓1, 𝑒2, 𝑒1, 𝑡21, 𝑡12} : 𝑓2 fill jug 2, 𝑒2 empty jug 2, 𝑡21 transfer one
gallon of 2_gallon jug to 1_gallon jug.
• 𝝆 is given by the following diagram and table.

A graphical view of the transition function (initial state shaded, goal states outlined bold):
• Path cost is the number of actions in the path.
• There are an infinite number of solutions. Example solutions are:
[𝑓1, 𝑓2, 𝑒2, 𝑡12] [𝑓1, 𝑒1, 𝑓2, 𝑡21, 𝑡12, 𝑓1, 𝑒2, 𝑡12] [𝑓2, 𝑡21]

Example 2
Problem States Actions
8-puzzle Tile configurations Up, Down, Left, Right
8-queens Partial board configurations Add queen, remove queen
(incremental formulation)
8-queens Board configurations Move queen
(complete-state formulation)
TSP Partial tours Add next city, pop last city

Computer Engineering Department - College of Computer Science Page 3


Intelligent Systems, 2024 Chapter 3.2 Dr. Mohammad Alshamri

Theorem Proving Collection of known theorems Rules of inference


Vacuum World Current Location and status of Left, Right, Suck, No operation
all rooms
Road Navigation Intersections Road segments
(Route Finding)
Internet Searching Pages Follow link
Counterfeit Coin Problem A given weighing Outcome of the weighing (less,
equal, greater)
• Incremental formulation starts with an empty state and involves operators that augment the state description

• A complete state formulation starts with all 8 queens on the board and moves them around

Problem’s Types
There are four essentially different types of problems.
• Single state problem.
• Multiple state problem.
• Contingency problem.
• Exploration problem.

Single State Problem


States and Complete world state knowledge
Actions Complete action knowledge
State Each state is known exactly after any sequence of actions.
State Agent knows exactly which state it will be in through sensors
accessibility
Consequences of Consequences of actions are known to the agent
actions
Goal For each known initial state, there is a unique goal state that is guaranteed
to be reachable via an action sequence
Prediction Exact prediction is possible
Examples For Deterministic and Fully observable environment like Vacuum world

Computer Engineering Department - College of Computer Science Page 4


Intelligent Systems, 2024 Chapter 3.2 Dr. Mohammad Alshamri

Multiple State Problem (Sensorless) (Conformant)


States and Incomplete world state knowledge
Actions Incomplete action knowledge
State It is not known exactly but limited to a set of possible states after each
action (The agent only knows which group of world states it is in)
State Agent cannot know exactly which state it will be in through sensors, but
accessibility reasoning can be used to determine the set of possible states
Consequences of Consequences of actions are not completely known to the agent as actions,
actions or the environment might exhibit randomness
Goal Due to ignorance, there may be no fixed action sequence that leads to the
goal
Prediction Semi-exact prediction is possible
Examples For non-observable problem like vacuum world without sensors

Contingency Problem
State State is unknown in advance, may depend on the outcome of
actions and changes in the environment
State accessibility Some essential information may be obtained through sensors only
at execution time
Consequences of actions Consequences of action may not be known at planning time
Goal Instead of single action sequences, there are trees of actions
Prediction Exact prediction is impossible: It is impossible to define a complete
sequence of actions that constitute a solution in advance because information
about the intermediary states is unknown.
Examples Nondeterministic and/or partially observable problems

Exploration Problem
State The set of possible states may be unknown
State accessibility Some essential information may be obtained through sensors only
at execution time
Consequences of actions Consequences of actions may not be known at planning time
Goal Goal cannot be completely formulated in advance because states
and consequences may not be known at planning time
Prediction Effects of actions are unknown
Examples For problems with unknown state space.

Computer Engineering Department - College of Computer Science Page 5


Intelligent Systems, 2024 Chapter 3.2 Dr. Mohammad Alshamri

Example 3
1. What are ACTIONS(𝑠5 ) and ACTIONS(𝑠6 )?
2. What is 𝐑𝐄𝐒𝐔𝐋𝐓𝐒(𝐬𝟓 , 𝐫𝐢𝐠𝐡𝐭)?
3. What are 𝑠𝑢𝑐(𝐬𝟓 ) and 𝑠𝑢𝑐(𝐬𝟔 )?
𝑠5 𝑠6
Answer 3
This simplest case is called a single-state problem.
𝐀𝐂𝐓𝐈𝐎𝐍𝐒(𝐬𝟓 ) = {right, suck}
𝐀𝐂𝐓𝐈𝐎𝐍𝐒(𝐬𝟔 ) = {left, suck}
𝐑𝐄𝐒𝐔𝐋𝐓𝐒(𝐬𝟓 , 𝐫𝐢𝐠𝐡𝐭) = 𝑠6
𝑠8
𝑠𝑢𝑐(𝐬𝟓 ) = {𝐬𝟓 , 𝐬𝟔 }
𝑠𝑢𝑐(𝐬𝟔 ) = {𝐬𝟓 , 𝐬𝟖 }

Example 4
What is the state space of the
Vacuum World domain if
Actions are: Left, Right, and
Suck?

Navigation Problem
Given an initial and a goal state(s) defined in the same environment, a system should use
its knowledge (prior knowledge if available, or accumulated knowledge) to plan and execute
a feasible trajectory from a start to a goal state.

Computer Engineering Department - College of Computer Science Page 6


Intelligent Systems, 2024 Chapter 3.2 Dr. Mohammad Alshamri

Navigation Problem Definition


1. The state corresponds to being in a particular city along the way: 𝑠 = IN(city)
2. Initial state: s0
3. StateSpace: 𝐒𝐭𝐚𝐭𝐞𝐒𝐩𝐚𝐜𝐞 = { IN(city1 ), IN(city2 ), … , IN(cityn )}
4. The action corresponds to the movement to another city: 𝑎 = Go(city)
5. ACTIONS(s): {𝑎1 , 𝑎2 , 𝑎3 …}. It is the set of all possible actions that can be
applied to current state, 𝑠.
6. Transition model: It is a model to get the target state based on the current state
and the action performed. For our example, RESULT(s, a) = s ′ is the transition
model where s ′ is the new state.
7. StepCost(s, a, s ′ ): The number of KM or the number of minutes for one step.
8. GoalTest(): It is a binary measure, T/F.
𝑎1 𝑎2 𝑎3 𝑎𝑁−1
9. PathCost(s1 → s2 → s3 → … → s𝑁 ): The costs of individual steps.
𝑎1 𝑎2 𝑎𝑁−1
PathCost(s1 → s2 → … → s𝑁 )= StepCost(s1 , 𝑎1 , s2 )+ StepCost(s2 , 𝑎2 , s3 )+…

Example 5
Draw the Problem Space Graph of navigation in Romania.

Computer Engineering Department - College of Computer Science Page 7


Intelligent Systems, 2024 Chapter 3.2 Dr. Mohammad Alshamri

Example 6
Assume you are in Arad and you want to reach Bucharest
What is a state? 𝑠 = IN(city)
What is an action? a = GO(city)
What is the initial state? s0 = IN(Arad)
The applicable actions are:
What is 𝐀𝐂𝐓𝐈𝐎𝐍𝐒(𝐈𝐍(𝐀𝐫𝐚𝐝))
{Go(Sibiu), Go(Timisoara), Go(Zerind)}
What is the goal? s𝑁 = {IN(Bucharest)}
Transition model RESULT(IN(Arad), Go(Zerind)) = IN(Zerind)
Successors (𝐈𝐍(𝐀𝐫𝐚𝐝)) {IN(Sibiu), IN(Timisoara), IN(Zerind)}

Example 7
Calculate the path cost of [Oradea → Sibiu → Fagaras → Bucharest] if the step cost is:
1. 1 unit per step.
2. Measured in Km?

Answer 7
1. Path cost = 1+1+1=3 units
2. Path cost = 151+99+211=361 Km

Computer Engineering Department - College of Computer Science Page 8


Intelligent Systems, 2024 Chapter 3.2 Dr. Mohammad Alshamri

Simple “Formulate, Search, Execute” Design


The problem undergoes:
1. Problem formulation
2. Search process: A search algorithm takes a problem as input and returns a
solution in the form of an action sequence.
3. Execution phase: Once a solution is found, the actions it recommends can be
carried out.

Search Process
Search is the process of examining different possible sequences of actions that lead to
goal state(s) and then choosing the best one.

Search Terminologies
• Problem Space − It is the environment in which the search takes place. (A set of
states and set of operators to change those states)
• Problem Instance − It is Initial state + Goal state.
• Depth of a problem − Length of the shortest path or the shortest sequence of
operators from initial state to goal state.

Computer Engineering Department - College of Computer Science Page 9


Intelligent Systems, 2024 Chapter 3.2 Dr. Mohammad Alshamri

• Space Complexity − The maximum number of nodes that are stored in memory.
• Time Complexity − The maximum number of nodes that are created.
• Search cost: It is the time and storage requirements to find a solution.

Measuring Problem-solving Performance


The effectiveness of a search can be measured in at least three ways:
• Does it find a solution?
• Is it a good solution? (Low cost)
• What is the time and memory required to find a solution? (Search cost)

Example 8
For 8-puzzle problem, specify
the problem components?

Answer 8
1. States: Tile configurations (location of blank and location of the 8 tiles)
2. Initial State: Initial configuration of the puzzle.
3. Goal formulation: as shown in the figure.
4. Goal test: Match the given state to the Goal state
5. Actions: move blank to Left, Right, Up, or Down.
6. Transition model: Move one tile to the blank. This will move the blank.
7. Path Cost: The total cost is the length of path as each step costs 1 unit.

Example 9
The 5-Queens problem requires arranging 5 queens on
an 5 × 5 (chess) board such as the queens do not attack
each other. Specify the problem components.

Computer Engineering Department - College of Computer Science Page 10


Intelligent Systems, 2024 Chapter 3.2 Dr. Mohammad Alshamri

Answer 9
1. States: 0 to 5 queens arranged on the chess board.
2. State is 𝑠 = (𝑝1 , 𝑝2 , 𝑝3 , 𝑝4 , 𝑝5 ) where 𝑝1 is the position of the queen in column 1.
Example: 𝑠𝑖 = (3,6,0,0,0): There are two queens in row 3 and row 6.
3. Initial state: No queen on board, 𝑠0 = (0,0,0,0,0).
4. Goal formulation: A configuration where no queen attacks another.
5. Goal test: 5 queens on the board such that no queen attacks another.
6. Actions: place a queen on an empty square.
7. Transition model: place a queen on an empty square such that no queen attacks
another.
8. Step cost: 0 (we are only interested in the solution). Note that every goal state is
reached after exactly 5 actions.

Missionaries and Cannibals Problem


It is a classic river-crossing logic puzzle that derives from the famous Jealous Husbands
problem.
• There are three missionaries and three cannibals on the left bank of a river. They
wish to cross over to the right bank.
• The Constraints are:
1. Whenever cannibals outnumber missionaries, the missionaries will become
the cannibals' dinner!
2. Boat can hold only two people.
3. Boat cannot travel empty.
• The minimal number of crossings to ferry 𝑛 ≥ 3 missionaries and 𝑛 cannibals
across a river using a two-person boat and bank-to- bank crossings, is 4𝑛 − 1.

Computer Engineering Department - College of Computer Science Page 11


Intelligent Systems, 2024 Chapter 3.2 Dr. Mohammad Alshamri

Example 10 (Missionaries and Cannibals)


Plan an action sequence of crossings for this problem that will take everyone safely to the
opposite bank.

Answer 10
1. States: A state is described by:
(𝑀𝐿 , 𝐶𝐿 , 𝐵) 𝑴𝑳 : Number of missionaries on the left bank.
𝑪𝑳 : Number of cannibals on the left bank.
𝑩: Location of boat (𝐿, 𝑅).
2. Initial state: (3,3, 𝐿).
3. Goal: (0, 0, R)
4. Goal test: All missionaries are safe in the right bank.
5. Operator: A move is represented by the number of missionaries and the number
of cannibals taken in the boat at one time. There are 5 possible combinations:
(2 Missionaries, 0 Cannibals)
(1 Missionary, 0 Cannibals)
(1 Missionary, 1 Cannibal)
(0 Missionary, 1 Cannibal)
(0 Missionary, 2 Cannibals)
6. Transition model: move the boat with someone on it.
7. Path cost: The number of crossings.

Example 11
What is the Goal test for chess?

Answer 11
The goal is to reach a state called “checkmate,” where the opponent’s King is under
attack and cannot escape.

Computer Engineering Department - College of Computer Science Page 12


Intelligent Systems, 2024 Chapter 3.2 Dr. Mohammad Alshamri

Donald Knuth Problem (1964)


Knuth conjectured that, starting with the number 4, a sequence of
factorial, square root, and a floor operation will reach any desired
positive integer.

Example 12
How to reach 5 from 4 of Knuth problem?

Answer 12
The problem definition is very simple:
▪ States: Positive numbers.
▪ Initial state: 4.
▪ Goal: 5
▪ Goal test: State if we reach the desired positive integer.
▪ Actions: Apply factorial, square root, or floor operation (factorial for integers
only).
▪ Transition model: As given by the mathematical definitions of the operations.
▪ Path cost: number of factorials and square roots.

Example 13 [VLSI Layout]


This problem requires positioning millions of components and connections
on a chip to minimize area, minimize circuit delays, minimize stray capacitances, and
maximize manufacturing yield.

Computer Engineering Department - College of Computer Science Page 13


Intelligent Systems, 2024 Chapter 4.1 Dr. Mohammad Alshamri

Fuzzy Sets
A set is a Many that allows itself to be thought of as a One. Georg Cantor.

Crisp (Classical) (Clear) Set


• A crisp set is a set containing elements that belong to the
set or not.
• Crisp set has sharp and rigid boundaries.
• The order does not matter

Which rose is red?

Example 1 (Fuzzy words)

Computer Engineering Department - College of Computer Science Page 1


Intelligent Systems, 2024 Chapter 4.1 Dr. Mohammad Alshamri

Example 2
Suppose we want to represent the following with the classical set theory
• Intelligent students in a class. • Comfortable houses.
• Tall persons. • Temperature.
• Healthy person.

Example 3
Comparison between bipolar and MOS technology Bipolar MOS
Integration Low Very high
is fuzzy Power High Low
Cost Low Low

Fuzzy (Vague) Set


• It is a set containing elements that have varying degrees of membership.
• Fuzzy set is a generalization of the classical set theory where the boundaries of
the set are NOT sharp.
• Elements in a fuzzy set can also be members of other fuzzy sets.

𝐴 = {(𝑥1 , 𝜇𝐴 (𝑥1 )), (𝑥2 , 𝜇𝐴 (𝑥2 )), (𝑥3 , 𝜇𝐴 (𝑥3 )), … }

Universe of Discourse
▪ The universe of discourse, 𝑋, is the space of all elements which can be either
continuous or discrete.
▪ Any fuzzy set 𝐴 defined on a universe of discourse 𝑋 is a subset of that universe.

Computer Engineering Department - College of Computer Science Page 2


Intelligent Systems, 2024 Chapter 4.1 Dr. Mohammad Alshamri

Membership (Indicator) (Characteristic) Function


• Membership function is a mapping from some given universe of discourse to the
unit interval, [0,1], indicating the membership of an element to a set 𝐴.
𝜇𝐴 : 𝑋 → [0, 1]
𝜇𝐴 (𝑥): is called the membership degree of 𝑥in 𝐴.
• Membership function characterizes the fuzziness of fuzzy sets.
• A fuzzy set is totally characterized by its membership function
• For classical set theory, the membership is either 0 (not owned) or 1 (full part)
• For fuzzy set theory, the membership can be any value between 0 and 1.

Shape of Membership Function


The shape of the membership function is usually chosen arbitrarily by following the
advice of the expert or by statistical studies. The shape can be:
• Triangular • exponential,
• Trapezoidal • Gaussian (bell-shaped)
• Sigmoid (s-shaped), • or any other form

trapezoidal bell-shaped

triangular Sigmoid Sigmoid

Example 4
Represent the group of young people using crisp and fuzzy sets.

Computer Engineering Department - College of Computer Science Page 3


Intelligent Systems, 2024 Chapter 4.1 Dr. Mohammad Alshamri

The set of young people is:


𝐵 = {set of young people}
We can represent it using crisp set as:
𝐵 = [0,20]
We can also represent it using fuzzy set
as besides. Here we have a gradual
membership to the set B.
This way 25 years old would still be young to a degree of 50 percent.

Example 5
Triangle membership function in
graphical form and mathematical
form is:

mathematical form graphical form

Example 6

Identity function of a crisp set and membership function of fuzzy Graphical representation of a crisp set and a
set fuzzy set

Computer Engineering Department - College of Computer Science Page 4


Intelligent Systems, 2024 Chapter 4.1 Dr. Mohammad Alshamri

Example 7

A person of height 1.79m would belong to both tall and short fuzzy sets with a particular degree of
membership.

Important Terminology
▪ Height of 𝑨 [ℎ(𝐴)]: it is the upper bound of the codomain of its membership
function.
▪ Support of A [𝑠𝑢𝑝𝑝(𝐴)]: is the set of elements of 𝑋 belonging to at least some 𝐴.
𝑠𝑢𝑝𝑝(𝐴) = {𝑥 ∈ 𝑋 ∶ 𝜇𝐴 (𝑥) > 0}
▪ Kernel of A: It is the set of elements of 𝑋 belonging entirely to 𝐴.
𝑘𝑒𝑟𝑛𝑒𝑙(𝐴) = {𝑥 ∈ 𝑋 ∶ 𝜇𝐴 (𝑥) = 1}
▪ 𝜶-cut of A: It is the classical subset of elements with a membership degree greater
than or equal to 𝛼.
𝐴𝛼 = {𝑥 ∈ 𝑋 ∶ 𝜇𝐴 (𝑥) ≥ 𝛼}

Computer Engineering Department - College of Computer Science Page 5


Intelligent Systems, 2024 Chapter 4.1 Dr. Mohammad Alshamri

Example 8

Fuzzy Set Operations


• Complement: The complement of 𝐴 is defined by:
𝐴𝑐 = {𝑥 ∈ 𝑋 ∶ 𝜇𝐴𝑐 (𝑥) = 1 − 𝜇𝐴 (𝑥)}
• Intersection: The of 𝐴 and 𝐵 is defined by:
𝐴 ∩ 𝐵 = {𝑥 ∈ 𝑋 ∶ 𝜇𝐴∩𝐵 (𝑥) = min(𝜇𝐴 (𝑥), 𝜇𝐵 (𝑥))}
• Union: The union of 𝐴 and 𝐵 is defined by:
𝐴 ∪ 𝐵 = {𝑥 ∈ 𝑋 ∶ 𝜇𝐴∪𝐵 (𝑥) = max(𝜇𝐴 (𝑥), 𝜇𝐵 (𝑥))}

Example 9

Find 𝐴 ∪ 𝐵, 𝐴 ∩ 𝐵, and
𝐴𝑐 ?

A: fuzzy interval between 5 and 8 B: A fuzzy number about 4.

Computer Engineering Department - College of Computer Science Page 6


Intelligent Systems, 2024 Chapter 4.1 Dr. Mohammad Alshamri

Example 10
𝑋 = {1,2,3,4}
𝐴 = {(1,0.4), (2,0.6), (3,0.7), (4,0.8)}
𝐵 = {(1,0.3), (2,0.65), (3,0.4), (4,0.1)}

Complement of 𝐴 𝐴𝑐 = {(1,0.6), (2,0.4), (3,0.3), (4,0.2)}

Union 𝐴 ∪ 𝐵 = {(1,0.4), (2,0.65), (3,0.7), (4,0.8)}

Intersection 𝐴 ∩ 𝐵 = {(1,0.3), (2,0.6), (3,0.4), (4,0.1)}

Simple Difference in Sets

𝜇𝐴/𝐵 (𝑥) = 𝜇𝐴∩𝐵𝑐 (𝑥) = min(𝜇𝐴 (𝑥), 1 − 𝜇𝐵 (𝑥))

Crisp Sets Fuzzy Sets

Example 11
𝑋 = {𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 }
𝐴 = {(𝑥1 , 0.2), (𝑥2 , 0.7), (𝑥3 , 1)}
𝐵 = {(𝑥1 , 0.5), (𝑥2 , 0.3), (𝑥3 , 1), (𝑥4 , 0.1)}

Complement of 𝐵 𝐵𝑐 = {(𝑥1 , 0.5), (𝑥2 , 0.7), (𝑥4 , 0.1)}

Difference 𝐴/𝐵 = 𝐴 ∩ 𝐵𝑐 = {(𝑥1 , 0.2), (𝑥2 , 0.7)}

Distance between two Fuzzy Sets


𝑛
𝑛
𝑑ℎ (𝐴, 𝐵) = ∑ |𝜇𝐴 (𝑥𝑖 ) − 𝜇𝐵 (𝑥𝑖 )| 2
𝑑𝑒 (𝐴, 𝐵) = √ ∑ (𝜇𝐴 (𝑥𝑖 ) − 𝜇𝐵 (𝑥𝑖 ))
𝑖=1, 𝑥𝑖 ∈𝑋
𝑖=1, 𝑥𝑖 ∈𝑋

Hamming Distance Euclidean Distance

Computer Engineering Department - College of Computer Science Page 7


Intelligent Systems, 2024 Chapter 4.1 Dr. Mohammad Alshamri

Example 12
𝑋 = {𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 }
𝐴 = {(𝑥1 , 0.2), (𝑥2 , 0.7), (𝑥3 , 1)}
𝐵 = {(𝑥1 , 0.5), (𝑥2 , 0.3), (𝑥3 , 1), (𝑥4 , 0.1)}

Hamming Distance 𝑑ℎ (𝐴, 𝐵) = 0.3 + 0.4 + 0 + 0.1 = 0.8

Euclidean Distance 𝑑𝑒 (𝐴, 𝐵) = √0.32 + 0.42 + 0 + 0.12 =

Disjoint Sets
Two fuzzy sets 𝐴, and 𝐵 are disjoint if and only if: (all the following are equivalent)
∀𝑥 ∈ 𝑋 ∶ 𝜇𝐴 (𝑥) = 0 ∨ 𝜇𝐵 (𝑥) = 0 Every element has zero membership value to A or B
∀𝑥 ∈ 𝑋 ∶ min(𝜇𝐴 (𝑥), 𝜇𝐵 (𝑥)) = 0 For every element, the minimum membership value is zero

∄𝑥 ∈ 𝑋 ∶ 𝜇𝐴 (𝑥) > 0 ∧ 𝜇𝐵 (𝑥) > 0 No element has positive membership value to both sets

Linguistic Variables
• Linguistic variables are variables whose values are words or sentences in a natural
or artificial language.
• Example: Linguistic variable: speed - Linguistic values: slow, medium, fast

Example 13

Linguistic Hedges
• Linguistic hedges are special linguistic terms by which other linguistic terms are
modified.

Computer Engineering Department - College of Computer Science Page 8


Intelligent Systems, 2024 Chapter 4.1 Dr. Mohammad Alshamri

• Hedges modify the shape of fuzzy sets, fuzzy truth values, fuzzy probabilities, or
fuzzy predicates.
• Hedges include adverbs such as very, somewhat, more or less, fairly, and slightly.

Example 14
Use hedge “very” for the proposition “x is tall” and “x is short”
Answer 14
This proposition may be modified by the
hedge “very” in any of the following
three ways:
“x is very tall is true”
“x very short”

Example 15
The hedge, shape function, and graphical representations of some hedges in FL are.
Little Slightly Very Extremely
1.3 1.7 2 3
ℎ𝑙𝑖𝑡𝑡𝑙𝑒 (𝑥) = (𝜇(𝑥)) ℎ𝑠𝑙𝑖𝑔ℎ𝑡𝑙𝑦 (𝑥) = (𝜇(𝑥)) ℎ𝑣𝑒𝑟𝑦 (𝑥) = (𝜇(𝑥)) ℎ𝑒𝑥𝑡𝑟𝑒𝑚𝑒𝑙𝑦 (𝑥) = (𝜇(𝑥))

Very very More or less "about", "near", "too", "approximately"


Somewhat "close to"
4
ℎ𝑉_𝑉𝑒𝑟𝑦 (𝑥) = (𝜇(𝑥)) ℎ𝑠𝑜𝑚𝑒𝑤ℎ𝑎𝑡 (𝑥)
1.2
= (𝜇(𝑥))

Computer Engineering Department - College of Computer Science Page 9


Intelligent Systems, 2024 Chapter 4.2 Dr. Mohammad Alshamri

Fuzzy Logic Systems (FLS)


Logic
Logic is the study of correct reasoning.

Fuzzy Logic
Fuzzy Logic is a form of multi-valued logic derived from fuzzy set theory to deal with
reasoning that is approximate rather than precise (resembles human reasoning).
• Human decision making includes a range of possibilities between YES and NO,
such as: CERTAINLY YES, POSSIBLY YES, CANNOT SAY, POSSIBLY NO,
CERTAIN LY NO.

Fuzzy Logic System


FLS produces acceptable but definite output in response to incomplete, ambiguous,
distorted, or inaccurate (fuzzy) input (It gives a nonlinear mapping of an input data set to
a scalar output data).

Elements of Fuzzy Logic System

Computer Engineering Department - College of Computer Science Page 1


Intelligent Systems, 2024 Chapter 4.2 Dr. Mohammad Alshamri

General Steps for Building a Fuzzy Logic System


1. Define linguistic variables and values.
2. Construct membership functions for each value (define the meaning of the input and
output values used in the rules)

3. Construct knowledge base of rules.


4. Convert crisp data into fuzzy data sets using membership functions (Fuzzification).
5. Evaluate rules in the rule base (Inference Engine).
6. Combine results from each rule (Inference Engine).
7. Convert output data into non-fuzzy values (Defuzzification).

Defining Linguistic Variables and Values


1. The shape of the fuzzy set is generally less important than the number of curves
and their placement.
2. Generally, from three to seven curves are appropriate to cover the required range
of an input value or the "universe of discourse" in fuzzy jargon.
3. The statement temperature is low is an example of a fuzzy statement involving a
fuzzy set (low temperature) and a fuzzy variable (temperature).

Fuzzy Rules
• Fuzzy rules are a collection of linguistic statements (IF-THEN rules) that describe
how the FLS should decide regarding classifying an input or controlling an output.
IF (temperature is high AND humidity is high) THEN room is hot
IF wind is strong THEN sailing is good.
IF project duration is long THEN completion risk is high.
IF speed is slow THEN stopping distance is short.
• The number of fuzzy rules required is dependent on:
1. The number of variables, 2. The number of fuzzy sets, and
3. The ways in which the variables are combined in the fuzzy rule conditions.

Computer Engineering Department - College of Computer Science Page 2


Intelligent Systems, 2024 Chapter 4.2 Dr. Mohammad Alshamri

Defining Fuzzy Rule Result


• If several rules affect the same fuzzy set of the same fuzzy variable, they are
equivalent to a single rule whose conditions are joined by the disjunction OR. For
example, these two rules:
Rule 1: IF temperature is high THEN pressure is high

Rule 2: IF water level is high THEN pressure is high

are equivalent to this single rule:


Rule 3: IF (temperature is high OR water level is high) THEN pressure is high

• If a fuzzy rule has multiple antecedents, the fuzzy operator (AND or OR) is used to
obtain a single number that represents the result of the antecedent evaluation.
This number (the truth value) is then applied to the consequent membership function.

Fuzzification (Fuzzifier) Module


This module transforms the system inputs (crisp numbers) into fuzzy sets by generating
membership values for a fuzzy variable using membership functions.
• From the crisp inputs, the system determines the degree to which these inputs
belong to each of the appropriate fuzzy sets.
• The inputs are mapped into fuzzy numbers by drawing a line up from the inputs to
the input membership functions above and marking the intersection point.

Example 1
Assume a fuzzy system with the following fuzzy sets.

What is the fuzzifier output if the temperature is 350°C and the water level is 1.2m.

Computer Engineering Department - College of Computer Science Page 3


Intelligent Systems, 2024 Chapter 4.2 Dr. Mohammad Alshamri

Answer 1
• The temperature, 350°C, is a member of both fuzzy sets high and medium.
• The possibility that the temperature is high is 𝜇𝐻𝑇 = 0.75 and the possibility that
the temperature is medium is 𝜇𝑀𝑇 = 0.25.
• The water level of 1.2m is a member of both fuzzy sets low and medium.
• The possibility that the water level is low is 𝜇𝐿𝑊 = 0.6 and the possibility that the
water level is medium is 𝜇𝑀𝑊 = 0.4.

Fuzzy Inference Engine


• It simulates the human reasoning process by making fuzzy inference on the inputs
from the Fuzzifier and the fuzzy rule base.
• This engine evaluates fuzzy rules and applies them through the following steps:
1. Inference Process sees which of the rules can be fired and to what degree. All rules may
fire to a degree between zero and unity. The consequence of a fuzzy rule is computed
using two steps:
• Computing the rule strength by combining the fuzzified inputs using the fuzzy
combination process.
• Clipping the output membership function at the rule strength.
2. Composition Process is an “averaging” procedure to compute the effective contribution
of each fuzzy rule. In fact, it combines the outputs of all fuzzy rules to obtain one fuzzy
output distribution.

Computer Engineering Department - College of Computer Science Page 4


Intelligent Systems, 2024 Chapter 4.2 Dr. Mohammad Alshamri

• There are two basic inference engines for fuzzy systems.


1. Mamdani Inference Engine. This we will discuss here.

2. Sugeno Inference Engine.

Example 2

De-Fuzzification (De-fuzzifier) Module


• This module transforms the fuzzy set obtained by the inference engine into a crisp
value that can be used in the “real” world (The final output of a fuzzy system must be a
crisp number).

• The common technique for de-fuzzifying is Centroid technique which takes the
output distribution and finds its center of mass to come up with one crisp number.
• Centroid technique [Center of gravity (COG)] finds the point where a vertical
line would slice the aggregate set into two equal masses.
𝑏
∑𝑏𝑥=𝑎 𝑥 𝜇𝐴 (𝑥)
∫𝑥=𝑎 𝑥 𝜇𝐴 (𝑥)𝑑𝑥
𝐶𝑂𝐺 = 𝑏 = 𝑏
∫𝑥=𝑎 𝜇𝐴 (𝑥)𝑑𝑥 ∑𝑥=𝑎 𝜇𝐴 (𝑥)

Computer Engineering Department - College of Computer Science Page 5


Intelligent Systems, 2024 Chapter 4.2 Dr. Mohammad Alshamri

Example 3
Assume the following fuzzy control system.
Rule 1: IF project funding is adequate OR project staffing is small THEN risk is low
IF x is A3 OR y is B1 THEN z is C1
Rule 2: IF project funding is marginal AND project staffing is large THEN risk is normal
IF x is A2 AND y is B2 THEN z is C2
Rule 3: IF project funding is inadequate THEN risk is high
IF x is A1 THEN z is C3

Show how the Mamdani system works if the inputs are project funding (x1), and project
staffing (y1).

Answer 3
Step 1: Fuzzification
1. Take the crisp inputs, project funding (x1), and project staffing (y1)
2. Determine the degree to which these inputs belong to each of the appropriate
fuzzy sets.
3. The fuzzified inputs are: 𝜇(𝑥=𝐴1) = 0.5, 𝜇(𝑥=𝐴2) = 0.2, 𝜇(𝑦=𝐵1) = 0.1, and
𝜇(𝑦=𝐵2) = 0.7

Step 2: Rule Evaluation


Apply the fuzzified inputs to the antecedents of the fuzzy rules.

Computer Engineering Department - College of Computer Science Page 6


Intelligent Systems, 2024 Chapter 4.2 Dr. Mohammad Alshamri

To evaluate the disjunction of the rule 1 antecedents,


𝜇(𝐴3∪𝐵1) = max{𝜇𝐴3 , 𝜇𝐵1 } = max{0, 0.1} = 0.1

To evaluate the conjunction of the rule 2 antecedents,


𝜇(𝐴2∩𝐵2) = min{𝜇𝐴2 , 𝜇𝐵2 } = min{0.2, 0.7} = 0.2

Step 3: Aggregation of the Rule Outputs


• Aggregation is the process of unification of the outputs of all rules.
• We take the membership functions of all rule consequents previously clipped or
scaled and combine them into a single fuzzy set.

Computer Engineering Department - College of Computer Science Page 7


Intelligent Systems, 2024 Chapter 4.2 Dr. Mohammad Alshamri

Step 4: Defuzzification
Divide the aggregated curve
into slots and then apply
COG.
∑𝑏𝑥=𝑎 𝑥 𝜇𝐴 (𝑥)
𝐶𝑂𝐺 = 𝑏
∑𝑥=𝑎 𝜇𝐴 (𝑥)

Example 4
Assume a fuzzy system with the following rules.
Rule 1: IF temperature is high THEN pressure is high
Rule 2: IF temperature is medium THEN pressure is medium
Rule 3: IF temperature is low THEN pressure is low
Rule 4: IF temperature is high AND water level is NOT low THEN pressure is high

What is the possibility of the pressure variable if the measured temperature is 350°C and
the water level is 1.2m.

Answer 4
Step 1: Fuzzification
• The temperature, 350°C, is a member of both fuzzy sets high and medium (𝜇𝐻𝑇 =
0.75, 𝜇𝑀𝑇 = 0.25).
• For a water level of 1.2m, the possibility that the water level is low is 𝜇𝐿𝑊 = 0.6
and the possibility that the water level is medium is 𝜇𝑀𝑊 = 0.4.

Computer Engineering Department - College of Computer Science Page 8


Intelligent Systems, 2024 Chapter 4.2 Dr. Mohammad Alshamri

• The possibility of the water level not being low is: 𝜇̅̅̅̅̅
𝐿𝑊 (1.2𝑚) = 1 −

𝜇𝐿𝑊 (1.2𝑚) = 1 − 0.6 = 0.4

Step 2: Rule Evaluation


• As a result of firing the rules, the pressure will be somewhat high and somewhat
medium.
• Apply the fuzzified inputs to the antecedents of the fuzzy Rule 1 and Rule 2 will
both be fired. The clipped outputs are 𝜇𝐻𝑃 = 0.75, and 𝜇𝑀𝑃 = 0.25.
• To evaluate the conjunction of the rule 4 antecedents,
𝜇(𝐻𝑇∩𝐿𝑊 𝐿𝑊 } = min{0.75, 0.4} = 0.4
̅̅̅̅̅) = min{𝜇𝐻𝑇 , 𝜇̅̅̅̅̅

• Thus, the possibility that the pressure is high, 𝜇𝐻𝑃 = 0.4, if it has not already been
set to a higher value.

Example 5: FL for Fuzzy Automatic Washing Machine


FL optimizes the life span of the washing machine by controlling the following:
1. water intake,
2. water temperature,
3. wash time,
4. rinse performance, and
5. spin speed.
The important three factors for this control system are:
1. Water level adjustment: FL detects the type and amount of laundry in the drum
and allows only as much water to enter the machine as is really needed for the
loaded amount. So, less water will heat up quicker - which means less energy
consumption.
2. Foam detection: FL compensates for too much foam by an additional rinse cycle.
3. Imbalance compensation: In the event of imbalance, FL calculates the maximum
possible speed, sets this speed, and starts spinning.

Computer Engineering Department - College of Computer Science Page 9


Intelligent Systems, 2024 Chapter 4.2 Dr. Mohammad Alshamri

Example 6
Show how to build a
rule base for a simple
Air Conditioner FLS
that control the AC by comparing the room temperature and the target temperature value.

Answer 6
• Typically, the air conditioner has a fan which blows/cools/circulates fresh air and
has a cooler which is under thermostatic control.
• The amount of air being compressed is proportional to the ambient temperature.

Step 1: Define linguistic variables and values


Linguistic variable: Temperature: {very-cold, cold, warm, hot, very_hot}
Step 2: Construct membership functions for the fuzzy variable
The membership functions of temperature variable can be a one from the following:

Step 3: Construct knowledge base rules


Create a matrix of room temperature values versus target temperature values that an air conditioning
system is expected to provide.

Computer Engineering Department - College of Computer Science Page 10


Intelligent Systems, 2024 Chapter 4.2 Dr. Mohammad Alshamri

Step 4: Build a set of rules into the knowledge base in the form of IF-THEN structures.
Rule 1: IF (temperature is very_cold AND target is cold) THEN heat
Rule 2: IF ((temperature is very_cold OR temperature is cold) AND target is warm) THEN heat
Rule 3: IF ((temperature is very_cold OR temperature is cold OR temperature is warm) AND
target is hot) THEN heat
Rule 4: IF ((temperature is very_cold OR temperature is cold OR temperature is warm OR
temperature is hot) AND target is very_hot) THEN heat
Rule 5: IF (temperature is very_hot AND target is hot) THEN cool
Rule 6: IF ((temperature is very_hot OR temperature is hot) AND target is warm) THEN cool
Rule 7: IF ((temperature is very_hot OR temperature is hot OR temperature is warm) AND
target is cold) THEN cool
Rule 8: IF ((temperature is very_hot OR temperature is hot OR temperature is warm OR
temperature is cold) AND target is very_cold) THEN cool
Rule 9: IF ((temperature is hot OR temperature is very_hot) AND target is warm) THEN cool
Rule 10: IF temperature is very_cold AND target is very_cold THEN nochange
Rule 11: IF temperature is cold AND target is cold THEN nochange
Rule 12: IF temperature is warm AND target is warm THEN nochange
Rule 13: IF temperature is hot AND target is hot THEN nochange
Rule 14: IF temperature is very_hot AND target is very_hot THEN nochange

Example 7: Fuzzy Air-Conditioner


Show how to build a rule base for air-conditioner to control its speed for the enclosed
space.

Answer 7
1. The room temperature may be defined with five fuzzy sets, cold, cool, pleasant,
warm, and hot.

Computer Engineering Department - College of Computer Science Page 11


Intelligent Systems, 2024 Chapter 4.2 Dr. Mohammad Alshamri

2. The corresponding speeds of the motor controlling the fan on the air-conditioner
have five graduations: minimal, slow, medium, fast, and blast fuzzy sets.

3. The rules governing the air-conditioner maybe defined as follows:

Example 8
What is the output if the air-conditioner is required to operate at 16𝑜 𝐶?

Answer 8
1. Fuzzification: 16𝑜 𝐶 corresponds to Cool/Pleasant fuzzy sets, 𝜇𝐶𝑜𝑜𝑙 = 0.3, and
𝜇𝑃𝑙𝑒𝑎𝑠𝑎𝑛𝑡 = 0.3.
2. Inference: Check the rules which contain the above linguistic values. Rule 2 and
rule 3 will be fired. The clipped outputs of the speed fuzzy variable are 𝜇𝑆𝑙𝑜𝑤 =
0.3, and 𝜇𝑀𝑒𝑑𝑖𝑢𝑚 = 0.3.

Computer Engineering Department - College of Computer Science Page 12


Intelligent Systems, 2024 Chapter 4.2 Dr. Mohammad Alshamri

3. Composition: Create new membership function of the alpha levelled functions for
Cool and Pleasant.

4. Defuzzification: Examine the fuzzy sets of Slow and Medium and obtain a speed
value.
∑𝑏𝑥=𝑎 𝑥 𝜇𝐴 (𝑥) (20 + 30 + 40 + 50) × 0.3
𝐶𝑂𝐺 = 𝑏 = = 35
∑𝑥=𝑎 𝜇𝐴 (𝑥) 4 × 0.3

Computer Engineering Department - College of Computer Science Page 13

You might also like