0% found this document useful (0 votes)
259 views177 pages

Module 3 - Decision Tress and Artificial Neural Networks

This document discusses decision trees, including: 1. Decision trees is a widely used classification algorithm that uses tree-like models to predict outcomes. 2. Features of decision trees include approximating discrete-valued functions, representing learned functions as decision trees, and being robust to noisy data. 3. Decision trees are best suited to problems where instances are represented by attribute-value pairs and the target concept can be approximated by a set of if-then rules.

Uploaded by

Raunit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
259 views177 pages

Module 3 - Decision Tress and Artificial Neural Networks

This document discusses decision trees, including: 1. Decision trees is a widely used classification algorithm that uses tree-like models to predict outcomes. 2. Features of decision trees include approximating discrete-valued functions, representing learned functions as decision trees, and being robust to noisy data. 3. Decision trees are best suited to problems where instances are represented by attribute-value pairs and the target concept can be approximated by a set of if-then rules.

Uploaded by

Raunit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 177

Module 3

Decision Trees
r
da
u d
h H
e s
a h
Dr. Mahesh G Huddar
M
Dept. of Computer Science and Engineering

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Decision Trees
▪ Decision Trees is one of the most widely used Classification Algorithm
▪ Features
dar
u d functions (including boolean)
▪ Method for approximating discrete-valued
h H
hes as decision trees (or if-then-else
▪ Learned functions are represented
Ma
rules)

▪ Expressive hypotheses space, including disjunction

▪ Robust to noisy data


Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
Decision Tree for Boolean Functions

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Decision Tree for Boolean Functions
• Every Variable in Boolean function such as A, B, C etc. has two
possibilities that is True and False
dar
d
Hu or False
• Every Boolean function is either True
e sh
a h we write YES (Y)
• If the Boolean function is true
M
• If the Boolean function is False we write NO (N)

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Decision Tree for Boolean Functions

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Decision Tree for Boolean Functions

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Example

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Decision Tree Representation (PlayTennis)

dar
u d
h H
hes
M a

Outlook=Sunny, Temp=Hot,
Watch Video Tutorial at Humidity=High, Wind=Strong
https://www.youtube.com/@MaheshHuddar No
Decision trees expressivity
• Decision trees represent a disjunction of conjunctions on
constraints on the value of attributes:
(Outlook = Sunny  Humidity = Normal)  ar
d d
(Outlook = Overcast) 
Hu
(Outlook = Rain  Wind = Weak) esh
a h
M

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Decision tree representation (PlayTennis)
• Decision trees classify instances by sorting them down the tree from the root to
some leaf node, which provides the classification of the instance.

d ar
• Each node in the tree specifies a test of some attribute of the instance, and each
u d
h H
branch descending from that node corresponds to one of the possible values for
e s
this attribute.
a h
M
• An instance is classified by starting at the root node of the tree, testing the
attribute specified by this node, then moving down the tree branch
corresponding to the value of the attribute in the given example.
• ThisWatch
process is then repeated
Video Tutorial at
for the subtree rooted at the new node.
https://www.youtube.com/@MaheshHuddar
Decision tree representation (PlayTennis)
• In general, decision trees represent a disjunction of conjunctions of constraints
on the attribute values of instances.
r
da to a conjunction of attribute
• Each path from the tree root to a leaf corresponds
d
Hu
h
tests, and the tree itself to a disjunction of these conjunctions.
hes
Ma

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Example

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


APPROPRIATE PROBLEMS FOR DECISION TREE LEARNING
Although a variety of decision tree learning methods have been developed with
somewhat differing capabilities and requirements, decision tree learning is
r characteristics:
generally best suited to problems with the following
a
d d
1. Hu
Instances are represented by attribute-value
h
pairs. Instances are described by
e s
a h
a fixed set of attributes (e.g., Temperature) and their values (e.g., Hot). The
easiest situation for decisionMtree learning is when each attribute takes on a
small number of disjoint possible values (e.g., Hot, Mild, Cold). However,
extensions to the basic algorithm allow handling real-valued attributes as well
(e.g., representing Temperature numerically).
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
APPROPRIATE PROBLEMS FOR DECISION TREE LEARNING
2. The target function has discrete output values. The decision tree is usually
used for Boolean classification (e.g., yes or no) kind of example. Decision tree
ar with more than two possible
methods easily extend to learning functions
d
u d
h H
output values. A more substantial extension allows learning target functions
e s
a h
with real-valued outputs, though the application of decision trees in this
M
setting is less common.

3. Disjunctive descriptions may be required. Decision trees naturally represent


disjunctive expressions.
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
APPROPRIATE PROBLEMS FOR DECISION TREE LEARNING

4. The training data may contain errors. Decision tree learning methods are
robust to errors, both errors in classifications of the training examples and
ar examples.
errors in the attribute values that describe these
d
u d
5. The training data may contain h H
missing attribute values. Decision tree
hes
Ma
methods can be used even when some training examples have unknown
values (e.g., if the Humidity of the day is known for only some of the training
examples).

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


APPROPRIATE PROBLEMS FOR DECISION TREE LEARNING

• Many practical problems have been found to fit these characteristics.


• Decision tree learning has therefore been applied to problems such as
r
learning to classify medical patients daby their disease, equipment
du
H
malfunctions by their cause, and hloan applicants by their likelihood of
hes
defaulting on payments. Ma
• Such problems, in which the task is to classify examples into one of a
discrete set of possible categories, are often referred to as classification
problems.
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
THE BASIC DECISION TREE LEARNING ALGORITHM
• Most algorithms that have been developed for learning decision trees are
variations on a core algorithm that employs a top-down, greedy search through
the space of possible decision trees.
a r
• This approach is exemplified by the ID3 dalgorithm
d (Quinlan 1986) and its
u the primary focus of our discussion
H
successor C4.5 (Quinlan 1993), which form
h
e s
here. h
a
M
• The basic algorithm for decision tree learning, corresponding approximately to
the ID3 algorithm.
• Next, we consider a number of extensions to this basic algorithm, including
extensions incorporated into C4.5 and other more recent algorithms for decision
treeWatch
learning.
Video Tutorial at https://www.youtube.com/@MaheshHuddar
Example

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


CONSTRUCTING DECISION TREE – ID3 ALGORITHM
Which Attribute Is the Best Classifier?

• The central choice in the ID3 algorithm is selecting which attribute to test at each
node in the tree.
dar
d
• We would like to select the attribute thatHisumost useful for classifying examples.
e sh
a h of the worth of an attribute? We will define a
• What is a good quantitative measure
M
statistical property, called information gain, that measures how well a given
attribute separates the training examples according to their target classification.

• ID3 uses this information gain measure to select among the candidate attributes at
eachWatch
step Video Tutorial at https://www.youtube.com/@MaheshHuddar
while growing the tree.
CONSTRUCTING DECISION TREE – ID3 ALGORITHM
ENTROPY MEASURES HOMOGENEITY OF EXAMPLES

• Entropy, characterizes the (im)purity of an arbitrary collection of examples.

ar examples of some target


• Given a collection S, containing positive and negative
d
u d
H
concept, the entropy of S relative to this boolean classification is
e sh
a h
M
• where p+, is the proportion of positive examples in S and p-, is the proportion of
negative examples in S.

• In all calculations involving entropy we define 0 log 0 to be 0.


Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
CONSTRUCTING DECISION TREE – ID3 ALGORITHM
ENTROPY MEASURES HOMOGENEITY OF EXAMPLES
• Entropy measures the (im)purity of a collection of examples. It depends from the distribution
of the random variable p.
▪ S is a collection of training examples
dar
▪ p+ the proportion of positive examples in S u d
h H
▪ p– the proportion of negative examplesesin S
a h
Examples M
Entropy (S)  – p+ log2 p+ – p–log2 p– [0 log20 = 0]
Entropy ([14+, 0–]) = – 14/14 log2 (14/14) – 0 log2 (0) = 0
Entropy ([9+, 5–]) = – 9/14 log2 (9/14) – 5/14 log2 (5/14) = 0,94
Entropy ([7+, 7– ]) = – 7/14 log2 (7/14) – 7/14 log2 (7/14) =
Watch Video Tutorial at = 1/2 +https://www.youtube.com/@MaheshHuddar
1/2 = 1
Example

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


CONSTRUCTING DECISION TREE – ID3 ALGORITHM
Entropy

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


CONSTRUCTING DECISION TREE – ID3 ALGORITHM
INFORMATION GAIN MEASURES THE EXPECTED REDUCTION IN ENTROPY
• Given entropy as a measure of the impurity in a collection of training examples, we can
now define a measure of the effectiveness of an attribute in classifying the training data.
• Now, the information gain, is simply the expected dar reduction in entropy caused by
u d
H
partitioning the examples according to this attribute.
h
es A) of an attribute A, relative to a collection of
• More precisely, the information gain, Gain(S,
h
examples S, is defined as, Ma

• where Values(A) is the set of all possible values for attribute A, and S, is the subset of S for
which attribute
Watch A has
Video Tutorial (i.e. , 𝑆𝑣 = {𝒔 ∈ 𝑺|𝑨(𝒔) = 𝒗})
at value v https://www.youtube.com/@MaheshHuddar
CONSTRUCTING DECISION TREE – ID3 ALGORITHM
• For example, suppose S is a collection of training-example days described by
attributes including Wind, which can have the values Weak or Strong.

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


CONSTRUCTING DECISION TREE – ID3 ALGORITHM
• Information gain is precisely the measure used by ID3 to select the best attribute
at

• each step in growing the tree.


dar
u d
H
• The use of information gain to evaluatehthe relevance of attributes.
hes
• Here the information gain of M a different attributes, Humidity and Wind, is
two
computed in order to determine which is the better attribute for classifying the
training examples.

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


CONSTRUCTING DECISION TREE – ID3 ALGORITHM

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


DECISION TREE – ID3 ALGORITHM NUMERICAL EXAMPLE
Day Outlook Temp Humidity Wind PlayTennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High r
Weak
a
Yes
d
dWeak
u
D5 Rain Cool Normal Yes
NormalH Strong
sh Strong
D6 Rain Cool No
e
ah High
D7 Overcast Cool Normal Yes
D8 Sunny M
Mild Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
D14 Rain Mild High Strong No
Play Attribute: Outlook
Day Outlook Temp Humidity Wind
Tennis 𝑽𝒂𝒍𝒖𝒆𝒔 (𝑶𝒖𝒕𝒍𝒐𝒐𝒌) = 𝑺𝒖𝒏𝒏𝒚, 𝑶𝒗𝒆𝒓𝒄𝒂𝒔𝒕, 𝑹𝒂𝒊𝒏
D1 Sunny Hot High Weak No
𝟗 𝟗 𝟓 𝟓
D2 Sunny Hot High Strong No 𝑺 = 𝟗+, 𝟓 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟗𝟒
𝟏𝟒 𝟏𝟒 𝟏𝟒 𝟏𝟒
D3 Overcast Hot High Weak Yes 𝟐 𝟐 𝟑 𝟑
𝑺𝑺𝒖𝒏𝒏𝒚 ← [𝟐+, 𝟑−] 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑺𝒖𝒏𝒏𝒚 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟗𝟕𝟏
D4 Rain Mild High Weak Yes 𝟓 𝟓 𝟓 𝟓

D5 Rain Cool Normal Weak Yes 𝑺𝑶𝒗𝒆𝒓𝒄𝒂𝒔𝒕 ← [𝟒+, 𝟎−]


𝟒
𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑶𝒗𝒆𝒓𝒄𝒂𝒔𝒕 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎
𝟒 𝟎 𝟎

ar 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺
𝟒 𝟒 𝟒 𝟒
D6 Rain Cool Normal Strong No
d
𝟑 𝟑 𝟐 𝟐

d
D7 Overcast Cool Normal Strong Yes 𝑺𝑹𝒂𝒊𝒏 ← [𝟑+, 𝟐−] 𝑹𝒂𝒊𝒏 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟗𝟕𝟏
u
𝟓 𝟓 𝟓 𝟓
D8 Sunny Mild High Weak No
𝑮𝒂𝒊𝒏 𝑺,h
H
s 𝑶𝒖𝒕𝒍𝒐𝒐𝒌
D9 Sunny Cool Normal Weak Yes 𝑺𝒗

e
= 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 − ෍ 𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺𝒗 )

h
|𝑺|

𝑮𝒂𝒊𝒏a𝑺, 𝑶𝒖𝒕𝒍𝒐𝒐𝒌
D10 Rain Mild Normal Weak Yes 𝒗 ∈{𝑺𝒖𝒏𝒏𝒚,𝑶𝒗𝒆𝒓𝒄𝒂𝒔𝒕,𝑹𝒂𝒊𝒏}

D11
D12
Sunny
Overcast
Mild
Mild
Normal
High
Strong
Strong
Yes
Yes
M 𝟓 𝟒 𝟓
= 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑺𝒖𝒏𝒏𝒚 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑶𝒗𝒆𝒓𝒄𝒂𝒔𝒕 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑹𝒂𝒊𝒏
D13 Overcast Hot Normal Weak Yes 𝟏𝟒 𝟏𝟒 𝟏𝟒

D14 Rain Mild High Strong No


𝟓 𝟒 𝟓
𝑮𝒂𝒊𝒏 𝑺, 𝑶𝒖𝒕𝒍𝒐𝒐𝒌 = 𝟎. 𝟗𝟒 − 𝟎. 𝟗𝟕𝟏 − 𝟎− 𝟎. 𝟗𝟕𝟏 = 𝟎. 𝟐𝟒𝟔𝟒
𝟏𝟒 𝟏𝟒 𝟏𝟒

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Day Outlook Temp Humidity Wind
Play Attribute: Temp
Tennis
D1 Sunny Weak No 𝑽𝒂𝒍𝒖𝒆𝒔 𝑻𝒆𝒎𝒑 = 𝑯𝒐𝒕, 𝑴𝒊𝒍𝒅, 𝑪𝒐𝒐𝒍
Hot High
D2 Sunny Hot High Strong No 𝟗 𝟗 𝟓 𝟓
𝑺 = 𝟗+, 𝟓 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟗𝟒
D3 Overcast Hot High Weak Yes 𝟏𝟒 𝟏𝟒 𝟏𝟒 𝟏𝟒

D4 Rain Mild High Weak Yes 𝑺𝑯𝒐𝒕 ← [𝟐+, 𝟐−]


𝟐 𝟐
𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑯𝒐𝒕 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟏. 𝟎
𝟐 𝟐
𝟒 𝟒 𝟒 𝟒
D5 Rain Cool Normal Weak Yes

ar
𝟒 𝟒 𝟐 𝟐
D6 Rain Cool Normal Strong No 𝑺𝑴𝒊𝒍𝒅 ← [𝟒+, 𝟐−] 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑴𝒊𝒍𝒅 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟗𝟏𝟖𝟑
𝟔 𝟔 𝟔 𝟔
D7 Overcast Cool Normal Strong Yes
d d
u
𝟑 𝟑 𝟏 𝟏
D8 Sunny Mild High Weak No 𝑺𝑪𝒐𝒐𝒍 ← [𝟑+, 𝟏−] 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑪𝒐𝒐𝒍 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟖𝟏𝟏𝟑

h H 𝟒 𝟒 𝟒 𝟒

es 𝑺, 𝑻𝒆𝒎𝒑
D9 Sunny Cool Normal Weak Yes

h
D10 Rain Mild Normal Weak Yes 𝑺𝒗
D11 Sunny Mild Normal Strong Yes
M a 𝑮𝒂𝒊𝒏 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 − ෍
𝒗 ∈{𝑯𝒐𝒕,𝑴𝒊𝒍𝒅,𝑪𝒐𝒐𝒍}
|𝑺|
𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺𝒗 )

D12 Overcast Mild High Strong Yes 𝑮𝒂𝒊𝒏 𝑺, 𝑻𝒆𝒎𝒑


D13 Overcast Hot Normal Weak Yes
𝟒 𝟔 𝟒
D14 Rain Mild High Strong No = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑯𝒐𝒕 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑴𝒊𝒍𝒅 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑪𝒐𝒐𝒍
𝟏𝟒 𝟏𝟒 𝟏𝟒
𝟒 𝟔 𝟒
𝑮𝒂𝒊𝒏 𝑺, 𝑻𝒆𝒎𝒑 = 𝟎. 𝟗𝟒 − 𝟏. 𝟎 − 𝟎. 𝟗𝟏𝟖𝟑 − 𝟎. 𝟖𝟏𝟏𝟑 = 𝟎. 𝟎𝟐𝟖𝟗
𝟏𝟒 𝟏𝟒 𝟏𝟒

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Day Outlook Temp Humidity Wind
Play Attribute: Humidity
Tennis
D1 Sunny Weak No 𝑽𝒂𝒍𝒖𝒆𝒔 𝑯𝒖𝒎𝒊𝒅𝒊𝒕𝒚 = 𝑯𝒊𝒈𝒉, 𝑵𝒐𝒓𝒎𝒂𝒍
Hot High
D2 Sunny Hot High Strong No 𝟗 𝟗 𝟓 𝟓
𝑺 = 𝟗+, 𝟓 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟗𝟒
D3 Overcast Hot High Weak Yes 𝟏𝟒 𝟏𝟒 𝟏𝟒 𝟏𝟒

D4 Rain Mild High Weak Yes 𝑺𝑯𝒊𝒈𝒉 ← [𝟑+, 𝟒−]


𝟑 𝟑 𝟒
𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑯𝒊𝒈𝒉 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟗𝟖𝟓𝟐
𝟒
𝟕 𝟕 𝟕 𝟕
D5 Rain Cool Normal Weak Yes

ar
𝟔 𝟔 𝟏 𝟏
D6 Rain Cool Normal Strong No 𝑺𝑵𝒐𝒓𝒎𝒂𝒍 ← [𝟔+, 𝟏−] 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑵𝒐𝒓𝒎𝒂𝒍 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟓𝟗𝟏𝟔
𝟕 𝟕 𝟕 𝟕
D7 Overcast Cool Normal Strong Yes
d d
D8 Sunny Mild High Weak No
Hu
𝑮𝒂𝒊𝒏 𝑺, 𝑯𝒖𝒎𝒊𝒅𝒊𝒕𝒚 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 − ෍
𝑺𝒗
𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺𝒗 )
D9 Sunny Cool Normal Weak Yes
h
s 𝑺, 𝑯𝒖𝒎𝒊𝒅𝒊𝒕𝒚
|𝑺|
e𝑮𝒂𝒊𝒏
𝒗 ∈{𝑯𝒊𝒈𝒉,𝑵𝒐𝒓𝒎𝒂𝒍}

h
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
M a 𝟕 𝟕
D12 Overcast Mild High Strong Yes = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑯𝒊𝒈𝒉 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑵𝒐𝒓𝒎𝒂𝒍
𝟏𝟒 𝟏𝟒
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
𝟕 𝟕
𝑮𝒂𝒊𝒏 𝑺, 𝑯𝒖𝒎𝒊𝒅𝒊𝒕𝒚 = 𝟎. 𝟗𝟒 − 𝟎. 𝟗𝟖𝟓𝟐 − 𝟎. 𝟓𝟗𝟏𝟔 = 𝟎. 𝟏𝟓𝟏𝟔
𝟏𝟒 𝟏𝟒

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Day Outlook Temp Humidity Wind
Play Attribute: Wind
Tennis
D1 Sunny Weak No 𝑽𝒂𝒍𝒖𝒆𝒔 𝑾𝒊𝒏𝒅 = 𝑺𝒕𝒓𝒐𝒏𝒈, 𝑾𝒆𝒂𝒌
Hot High
D2 Sunny Hot High Strong No 𝟗 𝟗 𝟓 𝟓
𝑺 = 𝟗+, 𝟓 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟗𝟒
D3 Overcast Hot High Weak Yes 𝟏𝟒 𝟏𝟒 𝟏𝟒 𝟏𝟒

D4 Rain Mild High Weak Yes 𝑺𝑺𝒕𝒓𝒐𝒏𝒈 ← [𝟑+, 𝟑−] 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑺𝒕𝒓𝒐𝒏𝒈 = 𝟏. 𝟎
D5 Rain Cool Normal Weak Yes
𝟔 𝟔 𝟐 𝟐
𝑺𝑾𝒆𝒂𝒌 ← [𝟔+, 𝟐−] 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑾𝒆𝒂𝒌 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟖𝟏𝟏𝟑

ar
D6 Rain Cool Normal Strong No 𝟖 𝟖 𝟖 𝟖
D7 Overcast Cool Normal Strong Yes
d d
D8 Sunny Mild High Weak No
Hu
𝑮𝒂𝒊𝒏 𝑺, 𝑾𝒊𝒏𝒅 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 − ෍
𝑺𝒗
𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺𝒗 )
h
|𝑺|

es
D9 Sunny Cool Normal Weak Yes 𝒗 ∈{𝑺𝒕𝒓𝒐𝒏𝒈,𝑾𝒆𝒂𝒌}

h
D10 Rain Mild Normal Weak Yes 𝟔 𝟖
D11 Sunny Mild Normal Strong Yes
M a
𝑮𝒂𝒊𝒏 𝑺, 𝑾𝒊𝒏𝒅 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 −
𝟏𝟒
𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑺𝒕𝒓𝒐𝒏𝒈 −
𝟏𝟒
𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑾𝒆𝒂𝒌

D12 Overcast Mild High Strong Yes


D13 Overcast Hot Normal Weak Yes
𝟔 𝟖
D14 Rain Mild High Strong No 𝑮𝒂𝒊𝒏 𝑺, 𝑾𝒊𝒏𝒅 = 𝟎. 𝟗𝟒 − 𝟏. 𝟎 − 𝟎. 𝟖𝟏𝟏𝟑 = 𝟎. 𝟎𝟒𝟕𝟖
𝟏𝟒 𝟏𝟒

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Play
Day Outlook Temp Humidity Wind
Tennis
𝑮𝒂𝒊𝒏 𝑺, 𝑶𝒖𝒕𝒍𝒐𝒐𝒌 = 𝟎. 𝟐𝟒𝟔𝟒
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
𝑮𝒂𝒊𝒏 𝑺, 𝑻𝒆𝒎𝒑 = 𝟎. 𝟎𝟐𝟖𝟗
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes

ar
D6 Rain Cool Normal Strong No 𝑮𝒂𝒊𝒏 𝑺, 𝑯𝒖𝒎𝒊𝒅𝒊𝒕𝒚 = 𝟎. 𝟏𝟓𝟏𝟔
D7 Overcast Cool Normal Strong Yes
d d
D8 Sunny Mild High Weak No
Hu
h
𝑮𝒂𝒊𝒏 𝑺, 𝑾𝒊𝒏𝒅 = 𝟎. 𝟎𝟒𝟕𝟖
es
D9 Sunny Cool Normal Weak Yes

h
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
M a
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Play
Day Outlook Temp Humidity Wind
Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes

ar
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal
d d
Weak Yes
D6 Rain Cool Normal Hu Strong No
h
es
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild a hHigh Weak No
D9 Sunny CoolM Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
Tem Play Attribute: Temp
Day Humidity Wind
p Tennis 𝑽𝒂𝒍𝒖𝒆𝒔 𝑻𝒆𝒎𝒑 = 𝑯𝒐𝒕, 𝑴𝒊𝒍𝒅, 𝑪𝒐𝒐𝒍
D1 Hot High Weak No
𝟐 𝟐 𝟑 𝟑
D2 Hot High Strong No 𝑺𝑺𝒖𝒏𝒏𝒚 = 𝟐+, 𝟑 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑺𝒖𝒏𝒏𝒚 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 =
𝟓 𝟓 𝟓 𝟓
D8 Mild High Weak No 𝟎. 𝟗𝟕
D9 Cool Normal Weak Yes
𝑺𝑯𝒐𝒕 ← [𝟎+, 𝟐−] 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑯𝒐𝒕 = 𝟎. 𝟎
D11 Mild Normal Strong Yes

ar
𝑺𝑴𝒊𝒍𝒅 ← [𝟏+, 𝟏−] 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑴𝒊𝒍𝒅 = 𝟏. 𝟎

d d
u
𝑺𝑪𝒐𝒐𝒍 ← [𝟏+, 𝟎−] 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑪𝒐𝒐𝒍 = 𝟎. 𝟎

h H
es
𝑺𝒗

h
𝑮𝒂𝒊𝒏 𝑺𝑺𝒖𝒏𝒏𝒚 , 𝑻𝒆𝒎𝒑 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑺𝒖𝒏𝒏𝒚 − ෍ 𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺𝒗 )
a
|𝑺|
𝒗 ∈{𝑯𝒐𝒕,𝑴𝒊𝒍𝒅,𝑪𝒐𝒐𝒍}

M
𝑮𝒂𝒊𝒏 𝑺𝑺𝒖𝒏𝒏𝒚 , 𝑻𝒆𝒎𝒑

𝟐 𝟐 𝟏
= 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑺𝒖𝒏𝒏𝒚 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑯𝒐𝒕 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑴𝒊𝒍𝒅 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑪𝒐𝒐𝒍
𝟓 𝟓 𝟓
𝟐 𝟐 𝟏
𝑮𝒂𝒊𝒏 𝑺𝒔𝒖𝒏𝒏𝒚 , 𝑻𝒆𝒎𝒑 = 𝟎. 𝟗𝟕 − 𝟎. 𝟎 − 𝟏 − 𝟎. 𝟎 = 𝟎. 𝟓𝟕𝟎
𝟓 𝟓 𝟓

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Tem Play Attribute: Humidity
Day Humidity Wind
p Tennis 𝑽𝒂𝒍𝒖𝒆𝒔 𝑯𝒖𝒎𝒊𝒅𝒊𝒕𝒚 = 𝑯𝒊𝒈𝒉, 𝑵𝒐𝒓𝒎𝒂𝒍
Dl Hot High Weak No
𝟐 𝟐 𝟑 𝟑
D2 Hot High Strong No 𝑺𝑺𝒖𝒏𝒏𝒚 = 𝟐+, 𝟑 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟗𝟕
𝟓 𝟓 𝟓 𝟓
D8 Mild High Weak No
𝑺𝒉𝒊𝒈𝒉 ← [𝟎+, 𝟑−] 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑯𝒊𝒈𝒉 = 𝟎. 𝟎
D9 Cool Normal Weak Yes
Dl1 Mild Normal Strong Yes 𝑺𝑵𝒐𝒓𝒎𝒂𝒍 ← [𝟐+, 𝟎−] 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑵𝒐𝒓𝒎𝒂𝒍 = 𝟎. 𝟎

dar
u d
h H
𝑮𝒂𝒊𝒏 𝑺𝑺𝒖𝒏𝒏𝒚 , 𝑯𝒖𝒎𝒊𝒅𝒊𝒕𝒚 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑺𝒖𝒏𝒏𝒚 − ෍
𝑺𝒗
𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺𝒗 )

e s |𝑺|

h
𝒗 ∈{𝑯𝒊𝒈𝒉,𝑵𝒐𝒓𝒎𝒂𝒍}

M a𝑮𝒂𝒊𝒏 𝑺 𝑺𝒖𝒏𝒏𝒚 , 𝑯𝒖𝒎𝒊𝒅𝒊𝒕𝒚

𝟑 𝟐
= 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑺𝒖𝒏𝒏𝒚 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑯𝒊𝒈𝒉 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑵𝒐𝒓𝒎𝒂𝒍
𝟓 𝟓
𝟑 𝟐
𝑮𝒂𝒊𝒏 𝑺𝒔𝒖𝒏𝒏𝒚 , 𝑯𝒖𝒎𝒊𝒅𝒊𝒕𝒚 = 𝟎. 𝟗𝟕 − 𝟎. 𝟎 − 𝟎. 𝟎 = 𝟎. 𝟗𝟕
𝟓 𝟓

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Tem Play Attribute: Wind
Day Humidity Wind
p Tennis 𝑽𝒂𝒍𝒖𝒆𝒔 𝑾𝒊𝒏𝒅 = 𝑺𝒕𝒓𝒐𝒏𝒈, 𝑾𝒆𝒂𝒌
Dl Hot High Weak No
𝟐 𝟐 𝟑 𝟑
D2 Hot High Strong No 𝑺𝑺𝒖𝒏𝒏𝒚 = 𝟐+, 𝟑 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟗𝟕
𝟓 𝟓 𝟓 𝟓
D8 Mild High Weak No
𝑺𝑺𝒕𝒓𝒐𝒏𝒈 ← [𝟏+, 𝟏−] 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑺𝒕𝒓𝒐𝒏𝒈 = 𝟏. 𝟎
D9 Cool Normal Weak Yes
Dl1 Mild Normal Strong Yes 𝑺𝑾𝒆𝒂𝒌 ← [𝟏+, 𝟐−]
𝟏 𝟏
𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑾𝒆𝒂𝒌 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 =
𝟐 𝟐

ar
𝟑 𝟑 𝟑 𝟑

𝟎. 𝟗𝟏𝟖𝟑
d d
Hu
h
hes 𝑺𝒗

M a
𝑮𝒂𝒊𝒏 𝑺𝑺𝒖𝒏𝒏𝒚 , 𝑾𝒊𝒏𝒅 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑺𝒖𝒏𝒏𝒚 − ෍
𝒗 ∈{𝑺𝒕𝒓𝒐𝒏𝒈,𝑾𝒆𝒂𝒌}
|𝑺|
𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺𝒗 )

𝑮𝒂𝒊𝒏 𝑺𝑺𝒖𝒏𝒏𝒚 , 𝑾𝒊𝒏𝒅

𝟐 𝟑
= 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑺𝒖𝒏𝒏𝒚 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑺𝒕𝒓𝒐𝒏𝒈 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑾𝒆𝒂𝒌
𝟓 𝟓
𝟐 𝟑
𝑮𝒂𝒊𝒏 𝑺𝒔𝒖𝒏𝒏𝒚 , 𝑾𝒊𝒏𝒅 = 𝟎. 𝟗𝟕 − 𝟏. 𝟎 − 𝟎. 𝟗𝟏𝟖 = 𝟎. 𝟎𝟏𝟗𝟐
𝟓 𝟓

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Tem Play
Day Humidity Wind
p Tennis
D1 Hot High Weak No 𝑮𝒂𝒊𝒏 𝑺𝒔𝒖𝒏𝒏𝒚 , 𝑻𝒆𝒎𝒑 = 𝟎. 𝟓𝟕𝟎
D2 Hot High Strong No
D8 Mild High Weak No
D9 Cool Normal Weak Yes
D11 Mild Normal Strong Yes
𝑮𝒂𝒊𝒏 𝑺𝒔𝒖𝒏𝒏𝒚 , 𝑯𝒖𝒎𝒊𝒅𝒊𝒕𝒚 = 𝟎. 𝟗𝟕
dar
u d
h H
es , 𝑾𝒊𝒏𝒅 = 𝟎. 𝟎𝟏𝟗𝟐
𝑮𝒂𝒊𝒏 𝑺𝒔𝒖𝒏𝒏𝒚
h
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


dar
u d
h H
hes
M a

{D1, D2, D8} {D9, D11}


No Yes

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Play
Day Outlook Temp Humidity Wind
Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes

ar
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal
d d
Weak Yes
D6 Rain Cool Normal Hu Strong No
h
es
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild a hHigh Weak No
D9 Sunny CoolM Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
Tem Play Attribute: Temp
Day Humidity Wind
p Tennis 𝑽𝒂𝒍𝒖𝒆𝒔 𝑻𝒆𝒎𝒑 = 𝑯𝒐𝒕, 𝑴𝒊𝒍𝒅, 𝑪𝒐𝒐𝒍
D4 Mild High Weak Yes
𝟑 𝟑 𝟐 𝟐
D5 Cool Normal Weak Yes 𝑺𝑹𝒂𝒊𝒏 = 𝟑+, 𝟐 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑺𝒖𝒏𝒏𝒚 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 =
𝟓 𝟓 𝟓 𝟓
D6 Cool Normal Strong No 𝟎. 𝟗𝟕
D10 Mild Normal Weak Yes
𝑺𝑯𝒐𝒕 ← [𝟎+, 𝟎−] 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑯𝒐𝒕 = 𝟎. 𝟎
D14 Mild High Strong No

ar
𝟐 𝟐 𝟏 𝟏

d
𝑺𝑴𝒊𝒍𝒅 ← [𝟐+, 𝟏−] 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑴𝒊𝒍𝒅 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 =
d
𝟑 𝟑 𝟑 𝟑

𝟎. 𝟗𝟏𝟖𝟑
Hu
← [𝟏+, 𝟏−]h
𝑺𝑪𝒐𝒐𝒍
e s 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 𝑪𝒐𝒐𝒍 = 𝟏. 𝟎

h
Ma 𝑺 , 𝑻𝒆𝒎𝒑 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 −
𝑮𝒂𝒊𝒏 𝑹𝒂𝒊𝒏 𝑹𝒂𝒊𝒏 ෍
𝑺𝒗
|𝑺|
𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺𝒗 )
𝒗 ∈{𝑯𝒐𝒕,𝑴𝒊𝒍𝒅,𝑪𝒐𝒐𝒍}

𝑮𝒂𝒊𝒏 𝑺𝑹𝒂𝒊𝒏 , 𝑻𝒆𝒎𝒑

𝟎 𝟑 𝟐
= 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑹𝒂𝒊𝒏 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑯𝒐𝒕 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑴𝒊𝒍𝒅 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑪𝒐𝒐𝒍
𝟓 𝟓 𝟓
𝟎 𝟑 𝟐
𝑮𝒂𝒊𝒏 𝑺𝑹𝒂𝒊𝒏 , 𝑻𝒆𝒎𝒑 = 𝟎. 𝟗𝟕 − 𝟎. 𝟎 − 𝟎. 𝟗𝟏𝟖 − 𝟏. 𝟎 = 𝟎. 𝟎𝟏𝟗𝟐
𝟓 𝟓 𝟓
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
Tem Play Attribute: Humidity
Day Humidity Wind
p Tennis 𝑽𝒂𝒍𝒖𝒆𝒔 𝑯𝒖𝒎𝒊𝒅𝒊𝒕𝒚 = 𝑯𝒊𝒈𝒉, 𝑵𝒐𝒓𝒎𝒂𝒍
D4 Mild High Weak Yes
𝟑 𝟑 𝟐 𝟐
D5 Cool Normal Weak Yes 𝑺𝑹𝒂𝒊𝒏 = 𝟑+, 𝟐 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑺𝒖𝒏𝒏𝒚 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 =
𝟓 𝟓 𝟓 𝟓
D6 Cool Normal Strong No 𝟎. 𝟗𝟕
Dl0 Mild Normal Weak Yes
Dl4 Mild High Strong No 𝑺𝑯𝒊𝒈𝒉 ← [𝟏+, 𝟏−] 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑯𝒊𝒈𝒉 = 𝟏. 𝟎

𝑺𝑵𝒐𝒓𝒎𝒂𝒍 ← [𝟐+, 𝟏−]


dar 𝟐
𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑵𝒐𝒓𝒎𝒂𝒍 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 =
𝟐 𝟏 𝟏

d
𝟑 𝟑 𝟑 𝟑

𝟎. 𝟗𝟏𝟖𝟑
Hu
h
hes
M a 𝑺𝒗
𝑮𝒂𝒊𝒏 𝑺𝑹𝒂𝒊𝒏 , 𝑯𝒖𝒎𝒊𝒅𝒊𝒕𝒚 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑹𝒂𝒊𝒏 − ෍ 𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺𝒗 )
|𝑺|
𝒗 ∈{𝑯𝒊𝒈𝒉,𝑵𝒐𝒓𝒎𝒂𝒍}

𝑮𝒂𝒊𝒏 𝑺𝑹𝒂𝒊𝒏 , 𝑯𝒖𝒎𝒊𝒅𝒊𝒕𝒚

𝟐 𝟑
= 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑹𝒂𝒊𝒏 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑯𝒊𝒈𝒉 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑵𝒐𝒓𝒎𝒂𝒍
𝟓 𝟓
𝟐 𝟑
𝑮𝒂𝒊𝒏 𝑺𝑹𝒂𝒊𝒏 , 𝑯𝒖𝒎𝒊𝒅𝒊𝒕𝒚 = 𝟎. 𝟗𝟕 − 𝟏. 𝟎 − 𝟎. 𝟗𝟏𝟖 = 𝟎. 𝟎𝟏𝟗𝟐
𝟓 𝟓
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
Tem Play Attribute: Wind
Day Humidity Wind
p Tennis 𝑽𝒂𝒍𝒖𝒆𝒔 𝒘𝒊𝒏𝒅 = 𝑺𝒕𝒓𝒐𝒏𝒈, 𝑾𝒆𝒂𝒌
D4 Mild High Weak Yes
𝟑 𝟑 𝟐 𝟐
D5 Cool Normal Weak Yes 𝑺𝑹𝒂𝒊𝒏 = 𝟑+, 𝟐 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑺𝒖𝒏𝒏𝒚 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 =
𝟓 𝟓 𝟓 𝟓
D6 Cool Normal Strong No 𝟎. 𝟗𝟕
Dl0 Mild Normal Weak Yes
Dl4 Mild High Strong No 𝑺𝑺𝒕𝒓𝒐𝒏𝒈 ← [𝟎+, 𝟐−] 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑺𝒕𝒓𝒐𝒏𝒈 = 𝟎. 𝟎

𝑺𝑾𝒆𝒂𝒌 ← [𝟑+, 𝟎−]


dar 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝒘𝒆𝒂𝒌 = 𝟎. 𝟎

u d
h H
es
𝑺𝒗

a h
𝑮𝒂𝒊𝒏 𝑺𝑹𝒂𝒊𝒏 , 𝑾𝒊𝒏𝒅 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑹𝒂𝒊𝒏 − ෍
|𝑺|
𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺𝒗 )

M
𝒗 ∈{𝑺𝒕𝒓𝒐𝒏𝒈,𝑾𝒆𝒂𝒌}

𝟐 𝟑
𝑮𝒂𝒊𝒏 𝑺𝑹𝒂𝒊𝒏 , 𝑾𝒊𝒏𝒅 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑹𝒂𝒊𝒏 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑺𝒕𝒓𝒐𝒏𝒈 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑾𝒆𝒂𝒌
𝟓 𝟓
𝟐 𝟑
𝑮𝒂𝒊𝒏 𝑺𝑹𝒂𝒊𝒏 , 𝑾𝒊𝒏𝒅 = 𝟎. 𝟗𝟕 − 𝟎. 𝟎 − 𝟎. 𝟎 = 𝟎. 𝟗𝟕
𝟓 𝟓

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Tem Play
Day Humidity Wind
p Tennis
D4 Mild High Weak Yes 𝑮𝒂𝒊𝒏 𝑺𝑹𝒂𝒊𝒏 , 𝑻𝒆𝒎𝒑 = 𝟎. 𝟎𝟏𝟗𝟐
D5 Cool Normal Weak Yes
D6 Cool Normal Strong No
Dl0 Mild Normal Weak Yes
Dl4 Mild High Strong No 𝑮𝒂𝒊𝒏 𝑺𝑹𝒂𝒊𝒏 , 𝑯𝒖𝒎𝒊𝒅𝒊𝒕𝒚 = 𝟎. 𝟎𝟏𝟗𝟐

dar
u d
H
𝑮𝒂𝒊𝒏 𝑺𝑹𝒂𝒊𝒏 ,h𝑾𝒊𝒏𝒅 = 𝟎. 𝟗𝟕
hes
Ma

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


dar
u d
h H
hes
M a

{D1, D2, D8} {D9, D11}


No Yes {D4, D5, D10}
{D6, D14}
No Yes
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


DECISION TREE EXAMPLE
Instance Classification a1 a2
1 + T T
2 + T T
3 - T F
daFr
4 +
u d F
5 - H F T
h
6 es-
h
F T
M a
1. What is the entropy of this collection of training examples with respect to
the target function classification?
2. What is the information gain of a2 relative to these training examples?
3. Draw decision
Watch Video Tutorial at tree for https://www.youtube.com/@MaheshHuddar
the given dataset.
Decision Tree Algorithm – ID3 Solved Example
1. What is the entropy of this collection of training examples with respect to the target
function classification?
2. What is the information gain of a1 and a2 relative to these training examples?
3. Draw decision tree for the given dataset. dar
u d
h H
Instance Classification a1 a2
e s
1
a h + T T
2 M + T T
3 - T F
4 + F F
5 - F T
6 - F T
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
Subscribe to Mahesh Huddar Visit: vtupulse.com
Instance Classification a1 a2 Attribute: a1
1 + T T 𝑽𝒂𝒍𝒖𝒆𝒔 𝒂𝟏 = 𝑻, 𝑭

2 + T T
𝑺 = 𝟑+, 𝟑 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 = 𝟏. 𝟎
3 - T F
4 + F F 𝑺𝑻 = 𝟐+, 𝟏 −
𝟐 𝟐 𝟏
𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑻 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟗𝟏𝟖𝟑
𝟑 𝟑 𝟑
𝟏
𝟑

5 - F T
6 - F T 𝑺𝑭 ← [𝟏+, 𝟐−]
dar𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 𝟏 𝟏 𝟐 𝟐
= − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟗𝟏𝟖𝟑
d
𝑭

u
𝟑 𝟑 𝟑 𝟑

h H
h es
Ma
𝑺𝒗
𝑮𝒂𝒊𝒏 𝑺, 𝒂𝟏 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 − ෍ 𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺𝒗 )
|𝑺|
Example - 2 𝒗 ∈{𝑻,𝑭}

Decision Tree Algorithm – ID3 𝟑 𝟑


𝑮𝒂𝒊𝒏 𝑺, 𝒂𝟏 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑻 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑭
Solved Example 𝟔 𝟔

𝟑 𝟑
𝑮𝒂𝒊𝒏 𝑺, 𝒂𝟏 = 𝟏. 𝟎 − ∗ 𝟎. 𝟗𝟏𝟖𝟑 − ∗ 𝟎. 𝟗𝟏𝟖𝟑 = 𝟎. 𝟎𝟖𝟏𝟕
𝟔 𝟔

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Subscribe to Mahesh Huddar Visit: vtupulse.com
Instance Classification a1 a2 Attribute: a2
1 + T T 𝑽𝒂𝒍𝒖𝒆𝒔 𝒂𝟐 = 𝑻, 𝑭

2 + T T
𝑺 = 𝟑+, 𝟑 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 = 𝟏. 𝟎
3 - T F
4 + F F 𝑺𝑻 = 𝟐+, 𝟐 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑻 = 𝟏. 𝟎

5 - F T 𝑺𝑭 ← [𝟏+, 𝟏−]
a r𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 𝑭 = 𝟏. 𝟎
6 - F T
d d
Hu
h
es
𝑺𝒗
h
𝑮𝒂𝒊𝒏 𝑺, 𝒂𝟐 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 − ෍ 𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺𝒗 )
a
|𝑺|

M
𝒗 ∈{𝑻,𝑭}

Example - 2 𝟒 𝟐
𝑮𝒂𝒊𝒏 𝑺, 𝒂𝟐 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑻 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑭
Decision Tree Algorithm – ID3 𝟔 𝟔
Solved Example 𝟒 𝟐
𝑮𝒂𝒊𝒏 𝑺, 𝒂𝟐 = 𝟏. 𝟎 − ∗ 𝟏. 𝟎 − ∗ 𝟏. 𝟎 = 𝟎. 𝟎
𝟔 𝟔

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Subscribe to Mahesh Huddar Visit: vtupulse.com
Instance Classification a1 a2
𝑮𝒂𝒊𝒏(𝑺, 𝒂𝟏) = 𝟎. 𝟎𝟖𝟏𝟕 − 𝑴𝒂𝒙𝒊𝒎𝒖𝒎 𝑮𝒂𝒊𝒏
1 + T T
2 + T T 𝑮𝒂𝒊𝒏(𝑺, 𝒂𝟐) = 𝟎. 𝟎
3 - T F a1
4 + F F
T F
5 - F T
6 - F T dar
d
u1, 2, 3
h H 4, 5, 6

hes a2
a
a2
Example - 2 M T F
T F
Decision Tree Algorithm – ID3
Solved Example
1, 2 3 5, 6 4

+ - - +
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
Subscribe to Mahesh Huddar Visit: vtupulse.com
dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


DECISION TREE EXAMPLE
Instance a1 a2 a3 Classification
1 True Hot High No
2 True Hot High No
3 False Hot High Yes
dar Yes
4
d
False Cool Normal
u
5 H
False Cool Normal Yes
6 e
True Coolsh High No
h
7 Ma Hot High
True No
8 True Hot Normal Yes
9 False Cool Normal Yes
10 False Cool High Yes

1. Construct theat decisionhttps://www.youtube.com/@MaheshHuddar


Watch Video Tutorial tree for the following tree using ID3 Algorithm
Decision Tree Algorithm – ID3 Solved Example
Instance a1 a2 a3 Classification
1 True Hot High No
2 True Hot High No
3 False Hot High r Yes
da
4 False Cool Normal
u d Yes
5 False h H
Cool Normal Yes
6 True Coolhes High No
M a
7 True Hot High No
8 True Hot Normal Yes
9 False Cool Normal Yes
10 False Cool High Yes
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
Subscribe to Mahesh Huddar Visit: vtupulse.com
Instance a1 a2 a3 Classification Attribute: a1
1 True Hot High No
2 True Hot High No 𝑽𝒂𝒍𝒖𝒆𝒔 𝒂𝟏 = 𝑻𝒓𝒖𝒆, 𝑭𝒂𝒍𝒔𝒆
3 False Hot High Yes
𝟔 𝟔 𝟒 𝟒
4 False Cool Normal Yes 𝑺 = 𝟔+, 𝟒 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟗𝟕𝟎𝟗
𝟏𝟎 𝟏𝟎 𝟏𝟎 𝟏𝟎
5 False Cool Normal Yes
6 True Cool High No 𝑺𝑻𝒓𝒖𝒆 = 𝟏+, 𝟒 −
𝟏 𝟏
𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑻𝒓𝒖𝒆 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟕𝟐𝟏𝟗
𝟒 𝟒

r
𝟓 𝟓 𝟓 𝟓

a
7 True Hot High No
8 True Hot Normal Yes d
d𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺
u
𝑺𝑭𝒍𝒂𝒔𝒆 ← [𝟓+, 𝟎−] 𝑭𝒂𝒍𝒔𝒆 = 𝟎. 𝟎
9 False Cool Normal Yes
h H
es
10 False Cool High Yes

h
Ma = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺
𝑺𝒗
𝑮𝒂𝒊𝒏 𝑺, 𝒂𝟏 − ෍ 𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺𝒗 )
|𝑺|
𝒗 ∈{𝑻𝒓𝒖𝒆,𝑭𝒂𝒍𝒔𝒆}
Example - 3
Decision Tree Algorithm – ID3 𝟓 𝟓
𝑮𝒂𝒊𝒏 𝑺, 𝒂𝟏 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑻𝒓𝒖𝒆 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑭𝒂𝒍𝒔𝒆
𝟏𝟎 𝟏𝟎
Solved Example
𝟓 𝟓
𝑮𝒂𝒊𝒏 𝑺, 𝒂𝟏 = 𝟎. 𝟗𝟕𝟎𝟗 − ∗ 𝟎. 𝟕𝟐𝟏𝟗 − ∗ 𝟏 = 𝟎. 𝟔𝟎𝟗𝟗
𝟏𝟎 𝟏𝟎

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Subscribe to Mahesh Huddar Visit: vtupulse.com
Instance a1 a2 a3 Classification Attribute: a2
1 True Hot High No
2 True Hot High No 𝑽𝒂𝒍𝒖𝒆𝒔 𝒂𝟐 = 𝑯𝒐𝒕, 𝑪𝒐𝒐𝒍
3 False Hot High Yes
𝟔 𝟔 𝟒 𝟒
4 False Cool Normal Yes 𝑺 = 𝟔+, 𝟒 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟗𝟕𝟎𝟗
𝟏𝟎 𝟏𝟎 𝟏𝟎 𝟏𝟎
5 False Cool Normal Yes
6 True Cool High No 𝑺𝑯𝒐𝒕 = 𝟐+, 𝟑 −
𝟐 𝟐
𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑯𝒐𝒕 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟗𝟕𝟎𝟗
𝟑 𝟑

ar
𝟓 𝟓 𝟓 𝟓
7 True Hot High No
8 True Hot Normal Yes d
d𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺
u
𝟒 𝟒 𝟏 𝟏
𝑺𝑪𝒐𝒐𝒍 ← [𝟒+, 𝟏−] = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟕𝟐𝟏𝟗
H
𝑪𝒐𝒐𝒍
9 False Cool Normal Yes 𝟓 𝟓 𝟓 𝟓

h
es
10 False Cool High Yes

a h
M
𝑮𝒂𝒊𝒏 𝑺, 𝒂𝟐 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 − ෍
𝑺𝒗
|𝑺|
𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺𝒗 )
Example - 3 𝒗 ∈{𝑯𝒐𝒕,𝑪𝒐𝒐𝒍}

Decision Tree Algorithm – ID3 𝟓 𝟓


𝑮𝒂𝒊𝒏 𝑺, 𝒂𝟐 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑯𝒐𝒕 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑪𝒐𝒐𝒍
Solved Example 𝟏𝟎 𝟏𝟎

𝟓 𝟓
𝑮𝒂𝒊𝒏 𝑺, 𝒂𝟐 = 𝟎. 𝟗𝟕𝟎𝟗 − ∗ 𝟎. 𝟗𝟕𝟎𝟗 − ∗ 𝟎. 𝟕𝟐𝟏𝟗 = 𝟎. 𝟏𝟐𝟒𝟓
𝟏𝟎 𝟏𝟎
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
Subscribe to Mahesh Huddar Visit: vtupulse.com
Instance a1 a2 a3 Classification Attribute: a3
1 True Hot High No
2 True Hot High No 𝑽𝒂𝒍𝒖𝒆𝒔 𝒂𝟑 = 𝑯𝒊𝒈𝒉, 𝑵𝒐𝒓𝒎𝒂𝒍
3 False Hot High Yes
𝟔 𝟔 𝟒 𝟒
4 False Cool Normal Yes 𝑺 = 𝟔+, 𝟒 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟗𝟕𝟎𝟗
𝟏𝟎 𝟏𝟎 𝟏𝟎 𝟏𝟎
5 False Cool Normal Yes
6 True Cool High No 𝑺𝑯𝒊𝒈𝒉 = 𝟐+, 𝟒 −
𝟐 𝟐
𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑯𝒊𝒈𝒉 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟗𝟏𝟖𝟑
𝟒 𝟒

r
𝟔 𝟔 𝟔 𝟔

a
7 True Hot High No
8 True Hot Normal Yes d
d𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺
u
𝑺𝑵𝒐𝒓𝒎𝒂𝒍 ← [𝟒+, 𝟎−] 𝑵𝒐𝒓𝒎𝒂𝒍 = 𝟎. 𝟎
9 False Cool Normal Yes
h H
es
10 False Cool High Yes 𝑺𝒗

a h
𝑮𝒂𝒊𝒏 𝑺, 𝒂𝟑 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 − ෍
|𝑺|
𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺𝒗 )

M
𝒗 ∈{𝑯𝒊𝒈𝒉,𝑵𝒐𝒓𝒎𝒂𝒍}

𝟔 𝟒
Example - 3 𝑮𝒂𝒊𝒏 𝑺, 𝒂𝟑 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 −
𝟏𝟎
𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑯𝒊𝒈𝒉 −
𝟏𝟎
𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑵𝒐𝒓𝒎𝒂𝒍
Decision Tree Algorithm – ID3
𝟔 𝟒
Solved Example 𝑮𝒂𝒊𝒏 𝑺, 𝒂𝟑 = 𝟎. 𝟗𝟕𝟎𝟗 − ∗ 𝟎. 𝟗𝟏𝟖𝟑 − ∗ 𝟎. 𝟎 = 𝟎. 𝟒𝟏𝟗𝟗
𝟏𝟎 𝟏𝟎

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Subscribe to Mahesh Huddar Visit: vtupulse.com
Instance a1 a2 a3 Classification 𝑮𝒂𝒊𝒏(𝑺, 𝒂𝟏) = 𝟎. 𝟔𝟎𝟗𝟗 − 𝑴𝒂𝒙𝒊𝒎𝒖𝒎 𝑮𝒂𝒊𝒏
1 True Hot High No
𝑮𝒂𝒊𝒏(𝑺, 𝒂𝟐) = 𝟎. 𝟏𝟐𝟒𝟓
2 True Hot High No
3 False Hot High Yes 𝑮𝒂𝒊𝒏(𝑺, 𝒂𝟑) = 𝟎. 𝟒𝟏𝟗𝟗
4 False Cool Normal Yes
5 False Cool Normal Yes a1
6 True Cool High No

ar
7 True Hot High No True False
8 True Hot Normal Yes
d d
9 False Cool Normal Yes
Hu
h
es
10 False Cool High Yes 1, 2, 6, 7, 8 3, 4, 5, 9, 10

a h Yes
M
Example - 3
Decision Tree Algorithm – ID3
Solved Example

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Subscribe to Mahesh Huddar Visit: vtupulse.com
Attribute: a2

Instance a2 a3 Classification 𝑽𝒂𝒍𝒖𝒆𝒔 𝒂𝟐 = 𝑯𝒐𝒕, 𝑪𝒐𝒐𝒍


1 Hot High No
2 Hot High No 𝟏 𝟏 𝟒 𝟒
𝑺𝒂𝟏 = 𝟏+, 𝟒 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝒂𝟏 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟕𝟐𝟏𝟗
𝟓 𝟓 𝟓 𝟓
6 Cool High No
7 Hot High No
𝟏 𝟏 𝟑 𝟑
8 Hot Normal Yes 𝑺𝑯𝒐𝒕 = 𝟏+, 𝟑 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑯𝒐𝒕 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟖𝟏𝟏𝟐

ar
𝟒 𝟒 𝟒 𝟒

d d
u
𝑺𝑪𝒐𝒐𝒍 ← [𝟎+, 𝟏−] 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑪𝒐𝒐𝒍 = 𝟎. 𝟎

h H
es
𝑺𝒗

a h
𝑮𝒂𝒊𝒏 𝑺, 𝒂𝟐 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 − ෍
|𝑺|
𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺𝒗 )

M
𝒗 ∈{𝑯𝒐𝒕,𝑪𝒐𝒐𝒍}

Example - 3 𝟒 𝟏
𝑮𝒂𝒊𝒏 𝑺, 𝒂𝟐 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑯𝒐𝒕 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑪𝒐𝒐𝒍
Decision Tree Algorithm – ID3 𝟓 𝟓
Solved Example 𝟒 𝟏
𝑮𝒂𝒊𝒏 𝑺, 𝒂𝟐 = 𝟎. 𝟗𝟕𝟎𝟗 − ∗ 𝟎. 𝟖𝟏𝟏𝟐 − ∗ 𝟎. 𝟎 = 𝟎. 𝟑𝟐𝟏𝟗
𝟓 𝟓

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Subscribe to Mahesh Huddar Visit: vtupulse.com
Attribute: a3

Instance a2 a3 Classification 𝑽𝒂𝒍𝒖𝒆𝒔 𝒂𝟑 = 𝑯𝒊𝒈𝒉, 𝑵𝒐𝒓𝒎𝒂𝒍


1 Hot High No
2 Hot High No 𝟏 𝟏 𝟒 𝟒
𝑺𝒂𝟏 = 𝟏+, 𝟒 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝒂𝟏 = − 𝒍𝒐𝒈𝟐 − 𝒍𝒐𝒈𝟐 = 𝟎. 𝟕𝟐𝟏𝟗
𝟓 𝟓 𝟓 𝟓
6 Cool High No
7 Hot High No
𝑺𝑯𝒊𝒈𝒉 = 𝟎+, 𝟒 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑯𝒊𝒈𝒉 = 𝟎. 𝟎
8 Hot Normal Yes

dar
d
𝑺𝑵𝒐𝒓𝒎𝒂𝒍 ← [𝟏+, 𝟎−] 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑵𝒐𝒓𝒎𝒂𝒍 = 𝟎. 𝟎

Hu
h
es
𝑺𝒗
𝑮𝒂𝒊𝒏 𝑺, 𝒂𝟑 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 − ෍ 𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺𝒗 )

a h 𝒗 ∈{𝑯𝒊𝒈𝒉,𝑵𝒐𝒓𝒎𝒂𝒍}
|𝑺|

Example - 3 M 𝟒 𝟏
𝑮𝒂𝒊𝒏 𝑺, 𝒂𝟑 = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑯𝒊𝒈𝒉 − 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑺𝑵𝒐𝒓𝒎𝒂𝒍
Decision Tree Algorithm – ID3 𝟓 𝟓

Solved Example 𝟒 𝟏
𝑮𝒂𝒊𝒏 𝑺, 𝒂𝟑 = 𝟎. 𝟗𝟕𝟎𝟗 − ∗ 𝟎. 𝟎 − ∗ 𝟎. 𝟎 = 𝟎. 𝟕𝟐𝟏𝟗
𝟓 𝟓

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Subscribe to Mahesh Huddar Visit: vtupulse.com
𝑮𝒂𝒊𝒏 𝑺𝒂𝟏 , 𝒂𝟐 = 𝟎. 𝟑𝟐𝟏𝟗

Instance a2 a3 Classification 𝑮𝒂𝒊𝒏 𝑺𝒂𝟏 , 𝒂𝟑 = 𝟎. 𝟕𝟐𝟏𝟗 − 𝑴𝒂𝒙𝒊𝒎𝒖𝒎 𝑮𝒂𝒊𝒏


1 Hot High No
2 Hot High No
6 Cool High No a1
7 Hot High No
True False
8 Hot Normal Yes

dar
u d
h H1, 2, 6, 7, 8 3, 4, 5, 9, 10

hes Yes

M a a3
Example - 3
High Normal
Decision Tree Algorithm – ID3
Solved Example
1, 2, 6, 7 8
No Yes

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Subscribe to Mahesh Huddar Visit: vtupulse.com
dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


When to use Decision Trees
▪ Problem characteristics:
▪ Instances can be described by attribute value pairs
▪ Target function is discrete valued
▪ Disjunctive hypothesis may be required
dar
▪ Possibly noisy training data samples u d
h H
es
▪ Robust to errors in training data
▪ Missing attribute values a h
M
▪ Different classification problems:
▪ Equipment classification
▪ Medical diagnosis
▪ Credit risk analysis
▪Watch
Several tasks in natural language
Video Tutorial at
processing
https://www.youtube.com/@MaheshHuddar
Issues in decision trees learning
▪ Overfitting
▪ Reduced error pruning
▪ Rule post-pruning
dar
u d
▪ Continuous valued attributes
h H
s
e attributes
h
▪ Alternative measures for selecting
a M
▪ Handling training examples with missing attribute values
▪ Handling attributes with different costs
▪ Improving computational efficiency
▪ Most of these
Watch Video Tutorial atimprovements in C4.5 (Quinlan, 1993)
https://www.youtube.com/@MaheshHuddar
Overfitting: definition
• Building trees that “adapt too much” to the training examples may lead to
“overfitting”.
• Consider error of hypothesis h over
dar
– training data: errorD(h) u d error
empirical
h H
e s
– entire distribution X of data: errorX(h) expected error
a h
• Hypothesis h overfits training data M if there is an alternative hypothesis h' 
H such that
errorD(h) < errorD(h’) and
errorX(h’) < errorX(h)
i.e.Watch Video Tutorial at
h’ behaves better overhttps://www.youtube.com/@MaheshHuddar
unseen data
Overfitting in decision tree learning

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Avoid overfitting in Decision Trees
▪ Two strategies:
1. Stop growing the tree earlier, before perfect classification
r then post-prune the
2. Allow the tree to overfit the data, aand
d d
tree
Hu
sh split the training in two parts
–Training and validation eset:
a h
M and use validation to assess the
(training and validation)
utility of post-pruning
• Reduced error pruning
• Rule Post pruning
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
Reduced-error pruning (Quinlan 1987)
▪ Each node is a candidate for pruning
▪ Pruning consists in removing a subtree rooted in a node: the node
becomes a leaf and is assigned the most dar
common classification
u d
h H tree performs no worse on the
▪ Nodes are removed only if the resulting
hes
validation set.
Ma
▪ Nodes are pruned iteratively: at each iteration the node whose removal
most increases accuracy on the validation set is pruned.
▪ Pruning stops when no pruning increases accuracy
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
Effect of reduced error pruning

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Rule post-pruning
1. Create the decision tree from the training set
2. Convert the tree into an equivalent set of rules
– Each path corresponds to a rule
dar
d
Hu to a pre-condition
– Each node along a path corresponds
e sh
h post-condition
– Each leaf classification to athe
M
3. Prune (generalize) each rule by removing those preconditions whose
removal improves accuracy over validation set
4. Sort the rules in estimated order of accuracy, and consider them in
sequence when classifying
Watch Video Tutorial at
new instances
https://www.youtube.com/@MaheshHuddar
Converting to rules

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Rule Post-Pruning
• Convert tree to rules (one for each path from root to a leaf)

• For each antecedent in a rule, remove it if error rate on validation


dar
set does not decrease u d
h H
e s
• Sort final rule set by accuracy
a h
M Compare first rule to:
Outlook=sunny ^ humidity=high -> No Outlook=sunny->No
Outlook=sunny ^ humidity=normal -> Yes Humidity=high->No
Outlook=overcast -> Yes Calculate accuracy of 3 rules
Outlook=rain ^ wind=strong -> No
based on validation set and
Outlook=rain ^ wind=weak -> Yes
pick best version.
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
Why converting to rules?
▪ Each distinct path produces a different rule: a condition removal
may be based on a local (contextual) criterion. Node pruning is
dar
global and affects all the rules u d
h H
s and there is no book-keeping
he
▪ In rule form, tests are not ordered
Ma
involved when conditions (nodes) are removed

▪ Converting to rules improves readability for humans

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Dealing with continuous-valued attributes
▪ So far discrete values for attributes and for outcome.
▪ Given a continuous-valued attribute A, dynamically create a new attribute Ac
Ac = True if A < c, False otherwise
▪ How to determine threshold value c ?
dar
d
Hu
▪ Example. Temperature in the PlayTennis example
h
es
▪ Sort the examples according to Temperature
h
Temperature 40 48 M
60
a 72 80 90
PlayTennis No No Yes Yes Yes No
▪ Determine candidate thresholds by averaging consecutive values where there is a
change in classification: (48+60)/2=54 and (80+90)/2=85
▪ Evaluate candidate thresholds (attributes) according to information gain. The best is
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
Temperature>54.The new attribute competes with the other ones
Tid Refund Marital Status Taxable Income Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K
a r No
4 Yes Married 120Kd d No
Hu
5 No Divorced
s h95K Yes
Married he
6 No
M a 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Handling incomplete training data
▪ How to cope with the problem that the value of some attribute may be missing?

▪ Example: Blood-Test-Result in a medical diagnosis problem

▪ ard
The strategy: use other examples to guess attribute
u d
1. h
Assign the value that is most common H among the training examples at the
e s
h
Ma
node

2. Assign a probability to each value, based on frequencies, and assign values to


missing attribute, according to this probability distribution

▪ Missing values in new instances to be classified are treated accordingly, and the
most probable
Watch Video Tutorial atclassification is chosen (C4.5)
https://www.youtube.com/@MaheshHuddar
Handling attributes with different costs
• Instance attributes may have an associated cost: we would prefer
decision trees that use low-cost attributes
• r
ID3 can be modified to take into account costs:
da
u d
1. Tan and Schlimmer (1990)
h H
hes
Gain2(S, A)
M a
Cost(S, A)

2. Nunez (1988)

2Gain(S, A) − 1

Watch Video Tutorial at (Cost(A) + 1)w w ∈ [0,1]


https://www.youtube.com/@MaheshHuddar
Inductive Bias - ID3 Decision Tree Learning
• Given a collection of training examples, there are typically many decision trees consistent

with these examples.

r
da of describing the basis by which it
• Describing the inductive bias of ID3 therefore consists
d
u
H the others.
h
es
chooses one of these consistent hypotheses over
a h
• M
Which of these decision trees does ID3 choose?

• It chooses the first acceptable tree it encounters in its simple-to-complex, hill climbing

search through the space of possible trees.


Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
Inductive Bias - ID3 Decision Tree Learning
• Approximate inductive bias of ID3:

– Shorter trees are preferred over larger trees.

a r
d
• A closer approximation to the inductive bias of ID3:d
Hu
– Shorter trees are preferred over longeres
h
trees.
a h
M
– Trees that place high information gain attributes close to the root are preferred over

those that do not.

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Inductive Bias - ID3 Decision Tree Learning
• Why Prefer Short Hypotheses?

• Arguments in favor:

– There are fewer short hypotheses than long ones

– If a short hypothesis fits data unlikely to be a coincidencedar


u d
• Arguments against:
h H
hes
– Not every short hypothesis is a reasonable one.
M a
• Occam's razor: "The simplest explanation is usually the best one."

– a principle usually attributed14th-century English logician and Franciscan friar, William of Ockham.

– The term razor refers to the act of shaving away unnecessary assumptions to get to the simplest
explanation.

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Two kinds of biases
▪ Preference or search biases
▪ ID3 searches through incompletely through complete
a r
hypotheses space; from simple to d
d complex hypotheses, until
Hu
h
es
termination condition is reached.
h
M a
▪ Restriction or language biases
▪ It searches the hypotheses space completely

▪ Candidate-Elimination search is complete


Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
Artificial NeuralarNetworks
d d
Hu
h
hes
M a
Dr. Mahesh G Huddar

Dept. of Computer Science and Engineering


Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar 99
Artificial Neural Networks
• Introduction
• Neural Network Representation
dar
• Appropriate Problems for Neural
u d Network Learning
h H
• Perceptrons hes
Ma
• Multilayer Networks and BACKPROPAGATION
Algorithms
• Remarks on the BACKPROPAGATION Algorithms
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
Artificial Neural Networks
• ANN learning well-suited to problems which the training data corresponds to noisy, complex data
(inputs from cameras or microphones)
• Can also be used for problems with symbolic representations

ar
• Most appropriate for problems where
– Instances have many attribute-value pairs d d
H u
– h
Target function output may be discrete-valued, real-valued, or a vector of several real- or
discrete-valued attributes h es
M a
– Training examples may contain errors
– Long training times are acceptable
– Fast evaluation of the learned target function may be required
– The ability for humans to understand the learned target function is not important

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar 101


Appropriate Problems – for ANN
• Instances are represented by many attribute-value pairs. The target function to be learned is defined
over instances that can be described by a vector of predefined features, such as the pixel values in the
ALVINN example. These input attributes may be highly correlated or independent of one another.

ar
Input values can be any real values.
d d

u
The target function output may be discrete-valued, real-valued, or a vector of several real- or
H system the output is a vector of 30 attributes,
h
es
discrete-valued attributes. For example, in the ALVINN
h
each corresponding to a recommendationaregarding the steering direction. The value of each output is
M
some real number between 0 and 1, which in this case corresponds to the confidence in predicting the
corresponding steering direction. We can also train a single network to output both the steering
command and suggested acceleration, simply by concatenating the vectors that encode these two
output predictions.
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar 102
Appropriate Problems – for ANN
The training examples may contain errors. ANN learning methods are quite robust
to noise in the training data.

dar
u d
Long training times are acceptable. Network training algorithms typically require
h H
e s tree learning algorithms. Training times
longer training times than, say, decision
a h
can range from a few seconds to Mmany hours, depending on factors such as the
number of weights in the network, the number of training examples considered,
and the settings of various learning algorithm parameters.

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar 103


Appropriate Problems – for ANN
Fast evaluation of the learned target function may be required. Although ANN
learning times are relatively long, evaluating the learned network, in order to
apply it to a subsequent instance, is typically very fast. For example, ALVINN
dar
applies its neural network several times per d
Hu second to continually update its
h
s forward.
e
steering command as the vehicle drives
h
a
M the learned target function is not important.
The ability of humans to understand
The weights learned by neural networks are often difficult for humans to
interpret. Learned neural networks are less easily communicated to humans than
learned rules
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar 104
Neural Network History
• History traces back to the 50’s but became popular in the 80’s with work by Rumelhart, Hinton, and Mclelland
– A General Framework for Parallel Distributed Processing in Parallel Distributed Processing: Explorations
in the Microstructure of Cognition
• Peaked in the 90’s.:
– Hundreds of variants
dar
u d

H
Less a model of the actual brain than a useful tool, but still some debate
h
• Numerous applications
hes
Handwriting, face, speech recognition a

M
– Vehicles that drive themselves
– Models of reading, sentence production, dreaming
• Debate for philosophers and cognitive scientists
– Can human consciousness or cognitive abilities be explained by a connectionist model or does it require
theVideo
Watch manipulation
Tutorial at of symbols?https://www.youtube.com/@MaheshHuddar
Biological Motivation
• The study of artificial neural networks (ANNs) has been inspired by the
observation that biological learning systems are built of very complex webs of
interconnected Neurons
dar
d of brain neuron: basic building
• Human information processing system consists
u
h H
e s
block cell that communicates information to and from various parts of body
a h
M
• Simplest model of a neuron: considered as a threshold unit –a processing
element (PE)
• Collects inputs & produces output if the sum of the input exceeds an internal
threshold value
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
Biological Motivation
• The human brain is made up of billions of simple processing
units – neurons.
• Inputs are received on dendrites, and if a r the input levels are over a
d d
threshold, the neuron fires, passingHu a signal through the axon to
s h
he
the synapse which then connects to another neuron.
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar 107


107
Biological Motivation

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Simplest Neural Network

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Simplest Neural Network

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


FIND-S: Step-2

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


S0: , , , , . 

ar
S1: Sunny,Warm, Normal, Strong, Warm, Same

d d
Hu
S2: S3:
s h
Sunny,Warm, ?, Strong, Warm, Same

S4 h e
Sunny, Warm,a?, Strong, ?, ?
M
G4: Sunny, ?, ?, ?, ?, ? ?, Warm, ?, ?, ?, ?

G3: Sunny,?,?,?,?,? ?,Warm,?,?,?,? ?,?,Normal,?,?,? ?, ?,?,?,Cool,? ?,?,?,?,?,Same

G0: Watch
G1: VideoGTutorial
2: at https://www.youtube.com/@MaheshHuddar
?, ?, ?, ?, ?, ?
Decision Trees
• Decision trees represent a disjunction of conjunctions on constraints on the
value of attributes:
(Outlook = Sunny  Humidity = Normal) => Yes
(Outlook = Overcast) => Yes
dar
(Outlook = Rain  Wind = Weak) => YesHud
h
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Artificial Neurons
• Artificial neurons are based on biological neurons.
• Each neuron in the network receives one or more inputs.
ar inputs, which
• An activation function is applied toddthe
Hu – the activation level.
determines the output of the hneuron
hes
Ma

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Artificial Neurons

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Artificial Neurons

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Artificial Neurons

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Artificial Neurons
• A typical activation function works as follows:
n + 1 for X  t
X =  wi xi Y =
i =1 0 for X  t
a r
d d
• Each node i has a weight, wi associated with it.
Hu
h
• The input to node i is xi. hes
Ma
• t is the threshold.

• So if the weighted sum of the inputs to the neuron is above


the threshold, then https://www.youtube.com/@MaheshHuddar
Watch Video Tutorial at
the neuron fires.
Artificial Neurons
• The charts on the right show three typical activation
functions.

dar
u d
h H
hes
Ma

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


PERCEPTRON
• One type of ANN system is based on a unit called a perceptron.
• A perceptron takes a vector of real-valued inputs, calculates a linear
combination of these inputs, then outputs a 1 if the result is greater than some
dar
threshold and -1 otherwise.
u d
h H
s xn, the output o(x1, . . . , xn)
• More precisely, given inputs x1 through
e
a h
computed by the perceptron isM

• where each wi is a real-valued constant, or weight, that determines the


contribution of input xi to the perceptron output
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
PERCEPTRON

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


PERCEPTRON
• Sometimes we write the perceptron function as,

dar
u d
h H
• s values for the weights wo, . . . , wn.
Learning a perceptron involves choosing
e
a h
• M
Therefore, the space H of candidate hypotheses considered in perceptron
learning is the set of all possible real-valued weight vectors.

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


The Perceptron Training Rule
• One way to learn an acceptable weight vector is to begin with random weights, then
iteratively apply the perceptron to each training example, modifying the perceptron
weights whenever it misclassifies an example.
r
• da examples as many times as
This process is repeated, iterating through the training
d
u
H examples correctly.
needed until the perceptron classifies all training
h
• hes to the perceptron training rule, which
a
Weights are modified at each step according
M
revises the weight wi associated with input xi according to the rule

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


The Perceptron Training Rule

dar
d
• Hu example, o is the output generated by
Here t is the target output for the current training
e sh
a h
the perceptron, and n is a positive constant called the learning rate.

M
The role of the learning rate is to moderate the degree to which weights are changed at
each step.
• It is usually set to some small value (e.g., 0.1) and is sometimes made to decay as the
number of weight-tuning iterations increases.
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
The Perceptron Training Rule
A single perceptron can be used to represent many Boolean functions weights 0.6 and 0.6
AND function
If A=0 & B=0 → 0*0.6 + 0*0.6 = 0
r
da= 0
This is not greater than the threshold of 1, so the output
d
u
hH
If A=0 & B=1 → 0*0.6 + 1*0.6 = 0.6
s
a heoutput = 0
This is not greater than the threshold, so the
If A=1 & B=0 → 1*0.6 + 0*0.6 = 0.6
M
This is not greater than the threshold, so the output = 0
If A=1 & B=1 → 1*0.6 + 1*0.6 = 1.2
This exceeds the threshold, so the output = 1
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
The Perceptron Training Rule
A single perceptron can be used to represent many Boolean functions weights 1.2 and 0.6
AND function
If A=0 & B=0 → 0*1.2 + 0*0.6 = 0
r
da= 0
This is not greater than the threshold of 1, so the output
d
u
hH
If A=0 & B=1 → 0*1.2 + 1*0.6 = 0.6
s
a heoutput = 0
This is not greater than the threshold, so the
If A=1 & B=0 → 1*1.2 + 0*0.6 = 1.2
M
This is greater than the threshold, so the output = 1
But the expected output is 0

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


The Perceptron Training Rule
A single perceptron can be used to represent many Boolean functions weights 1.2 and 0.6
AND function

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


The Perceptron Training Rule
A single perceptron can be used to represent many Boolean functions weights 0.7 and 0.6
AND function
If A=0 & B=0 → 0*0.7 + 0*0.6 = 0
This is not greater than the threshold of 1, so the output = 0
If A=0 & B=1 → 0*0.7 + 1*0.6 = 0.6
dar
This is not greater than the threshold, so the output = 0
u d
If A=1 & B=0 → 1*0.7 + 0*0.6 = 0.7
h H
hes
a
This is greater than the threshold, so the output = 0
If A=1 & B=0 → 1*0.7 + 1*0.6 = 1.3 M
This is greater than the threshold, so the output = 1

0.7

0.6
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
The Perceptron Training Rule

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


The Perceptron Training Rule

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


The Perceptron Training Rule
A single perceptron can be used to represent many Boolean functions
OR function
If A=0 & B=0 → 0*1.1 + 0*1.1 = 0
This is not greater than the threshold of 1, so the output =a0r
d d
If A=0 & B=1 → 0*1.1 + 1*1.1 = 1.1
Hu
h
h es = 1.
This is greater than the threshold, so the output
If A=1 & B= → 1*1.1 + 0*1.1 = 1.1 M a
This is greater than the threshold, so the output = 1.
If A=1 & B= → 1*1.1 + 1*1.1 = 2.2 1.1
0.7

This is greater than the threshold, so the output = 1.


1.1
0.6
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
The Perceptron Training Rule
Perceptron_training_rule (X, η) X: training data
initialize w (wi  an initial (small) random value) η: learning rate (small
positive constant, e.g., 0.1)
repeat
for each training instance (x, tx) ∈ X
compute the real output ox=Summation(w.x) ar
if (tx ≠ ox) d d Examples

H u • x is correctly classified, ox–ox=0


sh
for each wi → no update
wi  wi + ∆𝑤𝑖
h e • ox=-1 but tx=1, tx-ox>0
M a
∆𝑤𝑖  η(tx-ox)xi → wi is increased if xi>0,
end for decreased otherwise
end if →w.x is increased
end for • ox=1, but outx=-1, outx-ox<0
→wi is decreased if xi>0,
until all the training instances in X are correctly classified
increased otherwise
return w →w. x is decreased
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
Representational Power of Perceptron's

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Representational Power of Perceptron's

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Representational Power of Perceptron's
• Perceptrons can represent all of the primitive boolean functions AND, OR,
NAND, and NOR.
• Unfortunately, however, some boolean functions cannot be represented by a
dar
u
single perceptron, such as the XOR functiond whose value is 1 if and only if xl
h H
!= x2.
h es
Ma training examples shown in Figure (b)
• Note the set of linearly nonseparable
corresponds to this XOR function.

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Representational Power of Perceptron's
• The ability of perceptrons to represent AND, OR, NAND, and NOR is important because every
boolean function can be represented by some network of interconnected units based on these
primitives.
• In fact, every boolean function can be represented by some network of perceptrons only two

a r the outputs of these units are then


levels deep, in which the inputs are fed to multiple units, and
d d
u
input to a second, final stage.

h H
One way is to represent the boolean function in disjunctive normal form (i.e., as the disjunction
(OR) of a set of conjunctions (ANDs) of the
hesinputs and their negations).
• Ma can be negated simply by changing the sign of the
Note that the input to an AND perceptron
corresponding input weight.
• Because networks of threshold units can represent a rich variety of functions and because single
units alone cannot, we will generally be interested in learning multilayer networks of threshold
units.

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Gradient Descent and the Delta Rule
• Although the perceptron rule finds a successful weight vector when the training examples are
linearly separable, it can fail to converge if the examples are not linearly separable.
• A second training rule, called the delta rule, is designed to overcome this difficulty.

a r rule converges toward a best-fit
If the training examples are not linearly separable, the delta
d d
approximation to the target concept.
H u

s h descent to search the hypothesis space of
The key idea behind the delta rule is to use gradient
h e
Ma
possible weight vectors to find the weights that best fit the training examples.
• This rule is important because gradient descent provides the basis for the BACKPROPAGATON
algorithm, which can learn networks with many interconnected units.
• It is also important because gradient descent can serve as the basis for learning algorithms that
must search through hypothesis spaces containing many different types of continuously
Watch Video Tutorial
parameterized at
hypotheses. https://www.youtube.com/@MaheshHuddar
Gradient Descent and the Delta Rule

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Gradient Descent and the Delta Rule

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Gradient Descent and the Delta Rule

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Gradient Descent and the Delta Rule
• The delta training rule is best understood by considering the task of training an
unthresholded perceptron; that is, a linear unit for which the output o is given by


r
Thus, a linear unit corresponds to the first stage ofdaaperceptron, without the threshold.
u d units, let us begin by specifying a
• In order to derive a weight learning rule for
h H linear

h es
measure for the training error of a hypothesis (weight vector), relative to the training
examples. Ma
• Although there are many ways to define this error, one common measure is

• where D is the set of training examples, td is the target output for training example d, and
od isWatch
theVideo
output ofatthe linear unit
Tutorial for training example d.
https://www.youtube.com/@MaheshHuddar
Derivation of Gradient Descent Rule

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Derivation of Gradient Descent Rule

r
• dawhich determines the step size in the
Here n is a positive constant called the learning rate,
d
gradient descent search. The negative sign is H u
h
present because we want to move the weight
h es
a
vector in the direction that decreases E.
M
• This training rule can also be written in its component form

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Derivation of Gradient Descent Rule

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


STOCHASTIC APPROXIMATION TO GRADIENT DESCENT
• Gradient descent is an important general paradigm for learning.
• It is a strategy for searching through a large or infinite hypothesis space that can be
applied whenever

ar
1. the hypothesis space contains continuously parameterized
d
hypotheses (e.g., the
u d
weights in a linear unit), and
h H
es to these hypothesis parameters.
2. the error can be differentiated with respect
h
• a
M gradient descent are
The key practical difficulties in applying
1. converging to a local minimum can sometimes be quite slow (i.e., it can require many
thousands of gradient descent steps), and
2. if there are multiple local minima in the error surface, then there is no guarantee that
the procedure will find the global minimum.
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
STOCHASTIC APPROXIMATION TO GRADIENT DESCENT

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


STOCHASTIC APPROXIMATION TO GRADIENT DESCENT

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Gradient Descent vs Stochastic Gradient Descent
Standard Gradient Descent Stochastic Gradient Descent

Error is summed over all examples before Weightsr are updated examining each
da
u d
updating weights
h H
training example
e s
h
Ma
Summing over multiple examples require Less computation as individual weights are

more computation per weight update step updated

Difficult when there are multiple local Uses various Ed (w) rather than E(w), hence

minimum handles multiple local minima in the ease.


Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
MULTILAYER NETWORKS
• Multilayer neural networks can classify a range of
functions, including non linearly separable ones.
dar
d
Hu to all neurons in the
• Each input layer neuron connects
h
hes
hidden layer. M a

• The neurons in the hidden layer connect to all neurons in


the output layer.
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
MULTILAYER NETWORKS

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


MULTILAYER NETWORKS
A Differentiable Threshold Unit
• What type of unit shall we use as the basis for constructing multilayer networks?
• At first we might be tempted to choose the linear units discussed in the previous
section, for which we have already derived a gradient
a r descent learning rule.
• However, multiple layers of cascaded linear units d d still produce only linear functions,
H u
and we prefer networks capable of representing
s h highly nonlinear functions.
h e choice, but its discontinuous threshold makes
• The perceptron unit is another possible
M a
it undifferentiable and hence unsuitable for gradient descent.
• What we need is a unit whose output is a nonlinear function of its inputs, but whose
output is also a differentiable function of its inputs.
• One solution is the sigmoid unit-a unit very much like a perceptron, but based on a
smoothed,
Watch Videodifferentiable
Tutorial at threshold function.
https://www.youtube.com/@MaheshHuddar
MULTILAYER NETWORKS
• The sigmoid unit is illustrated in below Figure. Like the perceptron, the sigmoid
unit first computes a linear combination of its inputs, then applies a threshold to
the result.
• ar
In the case of the sigmoid unit, however, the threshold output is a continuous
d
u d
function of its input.
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


MULTILAYER NETWORKS
More precisely, the sigmoid unit computes its output o as

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


The BACKPROPAGATIAON Algorithm
• Multilayer neural networks learn in the same way as perceptrons.

• However, there are many more weights, and it is important to


a r
assign credit (or blame) correctly when d
d changing weights.
Hu
h
es network output units
• E sums the errors over all of hthe
Ma
 1
E ( w)    (t kd − okd ) 2
2 dD koutputs

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


The BACKPROPAGATIAON Algorithm

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


1

4 6
dar
u d
2 h H
hes
M a
5 7

3
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
X1
1

4 6
dar
X2
u d
2 h H
hes
M a
5 7
X3

3
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
X1
1 X41 , w41

X51 , w51

4 6
dar
X2 X42 , w42
u d
2 h H
X52 , w52
hes
Ma
5 7
X43 , w43

X3
X53 , w53
3
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
X1
1 X41 , w41

X51 , w51 X64 , w64

4 6
dar
X74, w74
X2 X42 , w42
u d
2 h H
X52 , w52
hes
a
X65 , w65

M
5 7
X43 , w43 X75, w75

X3
X53 , w53
3
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
X1
1 X41 , w41

X51 , w51 X64 , w64

4 6 O6

dar
X74, w74
X2 X42 , w42
u d
2 h H
X52 , w52
hes
a
X65 , w65

M
5 7 O7
X43 , w43 X75, w75

X3
X53 , w53
3
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
X1
1 X41 , w41
O4 = δ(X41w41X42w42 +X43w43)

X51 , w51 X64 , w64

4 6 O6

dar
X74, w74
X2 X42 , w42
u d
2 h H
X52 , w52
hes
a
X65 , w65

M
5 7 O7
X43 , w43 X75, w75

X3 O5 = δ(X51w51+X52w52+X53w53)
X53 , w53
3
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
X1
1 X41 , w41
O4 = δ(X41w41X42w42 +X43w43)
X64 = X74 = O4

X51 , w51 X64 , w64

4 6 O6

dar
X74, w74
X2 X42 , w42
u d
2 h H
X52 , w52
hes
a
X65 , w65

M
5 7 O7
X43 , w43 X75, w75

X3 O5 = δ(X51w51+X52w52+X53w53)
X53 , w53
3 X75 = X65 = O5

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


X1
1 X41 , w41
O4 = δ(X41w41X42w42 +X43w43)
X64 = X74 = O4
O6 = δ(X64w64 +X65w65)
X51 , w51 X64 , w64

4 6 O6

dar
X74, w74
X2 X42 , w42
u d
2 h H
X52 , w52
hes
a
X65 , w65

M
5 7 O7
X43 , w43 X75, w75
O7 = δ(X74,w74+X75,w75)
X3 O5 = δ(X51w51+X52w52+X53w53)
X53 , w53
3 X75 = X65 = O5

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Back Propagation Algorithm
• Create a feed-forward network with ni inputs, nhidden hidden units, and nout output units.
• Initialize all network weights to small random numbers
• Until the termination condition is met, Do

ar
• For each (𝑥, t), in training examples, Do
Propagate the input forward through the network: d d

H u
1. Input the instance 𝑥, to the network andhcompute the output ou of every unit u in the network.
s
Propagate the errors backward throughhe

M a the network
2. For each network unit k, calculate its error term δk 4. Update each network weight wji

3. For each network unit h, calculate its error term δh

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar 174


Derivation of Back Propagation Algorithm
• To derive the equation for updating weights in back propagation algorithm, we use
Stochastic gradient descent rule.
• Stochastic gradient descent involves iterating through the training examples one at a time,
r
da of the error Ed with respect to this
d
for each training example d descending the gradient
u
single example.
s hH
he

M a
In other words, for each training example d every weight wji is updated by adding to it
∆𝑤𝑖𝑗 .
• That is,

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar 175


Derivation of Back Propagation Algorithm

r
• where Ed is the error on training example d, thatdisda
half the squared difference between the
Hu units in the network,
h
target output and the actual output over all output
hes
M a

• Here outputs is the set of output units in the network, tk is the target value of unit k for
training example d, and ok is the output of unit k given training example d.

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar 176


Derivation of Back Propagation Algorithm
Notation Used:
𝒙𝒋𝒊 = the ith input to unit j
𝒘𝒋𝒊 = the weight associated with the ith input to unit j
aunit j ) r
d d
𝒏𝒆𝒕𝒋 = σ𝒊 𝒘𝒋𝒊 𝑿𝒋𝒊 (the weighted sum of inputs for
Hu
h
es
𝒐𝒋 = the output computed by unit j
𝒕𝒋 = the target output for unit j a h
M
𝝈 = the sigmoid function
outputs = the set of units in the final layer of the network
Downstream(j) = the set of units whose immediate inputs include the output of unit j

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar 177


Derivation of Back Propagation Algorithm
X1
1 X41 , w41

X51 , w51 X64 , w64

4 r 6 O6

Xd da
u ,w

hH
74 74
X2 X42 , w42

2 es
X52 , w52
a h X65 , w65
M
X43 , w43
5 X75, w75
7 O7

X3
X53 , w53
3
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
Derivation of Back Propagation Algorithm

• To begin, notice that weight wji can influence the restr of the network only through netj.
da
Therefore, we can use the chain rule to write, u d
h H
es
𝒏𝒆𝒕𝒋 = ෍ 𝒘𝒋𝒊 𝑿𝒋𝒊

a h 𝒊

M
𝝏𝒏𝒆𝒕𝒋
= 𝒙𝒋𝒊
𝝏𝒘𝒋𝒊

• Our remaining task is to derive a convenient expression for


Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar 179
Derivation of Back Propagation Algorithm
To derive a convenient expression for

ar
We consider two cases in turn:
d d
Hu
• Case 1, where unit j is an output unit for the network, and
e sh
• a
Case 2, where unit j is an internal unit hof the network.
M

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar 180


Derivation of Back Propagation Algorithm
Case 1: Training Rule for Output Unit Weights
• Just as wji can influence the rest of the network only through netj, netj can influence the
network only through oj. Therefore, we can invoke the chain rule again to write,
dar
u
𝝏𝒐d 𝝏𝝈 𝒏𝒆𝒕
H
𝒋 𝒋 𝝏𝝈 𝒙
= = 𝝈 𝒙 (1 - 𝝈 𝒙 )
h
𝝏 𝒙

es
𝝏 𝒏𝒆𝒕 𝝏 𝒏𝒆𝒕
𝒋 𝒋

a h = 𝝈 𝒏𝒆𝒕𝒋 (1 -
M 𝝈 𝒏𝒆𝒕𝒋 )

= 𝒐𝒋 (𝟏 − 𝒐𝒋 )

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar 181


Derivation of Back Propagation Algorithm
Case 1: Training Rule for Output Unit Weights

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Derivation of Back Propagation Algorithm
X1
1 X41 , w41

X51 , w51 X64 , w64

4 r 6 O6

Xd da
u ,w

hH
74 74
X2 X42 , w42

2 es
X52 , w52
a h X65 , w65
M
X43 , w43
5 X75, w75
7 O7

X3
X53 , w53
3
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
Derivation of Back Propagation Algorithm
Case 2: Training Rule for Hidden Unit Weights

dar
u d
h H
es
𝝏𝒐𝒋 𝝏𝝈 𝒏𝒆𝒕𝒋
=
h
a 𝝏 𝒏𝒆𝒕
𝝏 𝒏𝒆𝒕𝒋 𝝏 𝒏𝒆𝒕𝒋

M 𝝏𝒐𝒋
𝒌
=
𝝏𝒙𝒌𝒋 𝒘𝒌𝒋 𝝏𝒐𝒋 𝒘𝒌𝒋
𝝏𝒐𝒋
=
𝝏𝒐𝒋 = 𝝈 𝒏𝒆𝒕𝒋 (1 -

𝝈 𝒏𝒆𝒕𝒋 )

= 𝒐𝒋 (𝟏 − 𝒐𝒋 )

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Derivation of Back Propagation Algorithm
Case 2: Training Rule for Hidden Unit Weights

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Neural Network Representations - ALVINN
• A prototypical example of ANN learning which uses a learned ANN to steer
an autonomous vehicle driving at normal speeds on public highways.
• The input to the neural network is a 30 x 32 grid of pixel intensities obtained
d ar
from a forward-pointed camera mounted on
u d the vehicle.
h H
• The network output is the direction sin which the vehicle is steered. The ANN
he
Masteering commands of a human driving the
is trained to mimic the observed
vehicle for approximately 5 minutes.
• ALVINN has used its learned networks to successfully drive at speeds up to
70 miles per hour and for distances of 90 miles on public highways
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar 187
Neural Network Representations - ALVINN
• Figure illustrates the neural network representation used in one version of the ALVINN system, and illustrates the kind of
representation typical of many ANN systems.
• The network is shown on the left side of the figure, with the input camera image depicted below it. Each node (i.e., circle) in
the network diagram corresponds to the output of a single network unit, and the lines entering the node from below are its
inputs.

ar
• As can be seen, there are four units that receive inputs directly from all of the 30 x 32 pixels in the image. These are called

d d
"hidden“ units because their output is available only within the network and is not available as part of the global network
output.
Hu
h
es
• Each of these four hidden units computes a single real-valued output based on a weighted combination of its 960 inputs.

a h
These hidden unit outputs are then used as inputs to a second layer of 30 "output" units.

M
Each output unit corresponds to a particular steering direction, and the output values of these units determine which steering
direction is recommended most strongly.

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


Neural Network Representations - ALVINN
• The diagrams on the right side of the figure depict the learned weight
values associated with one of the four hidden units in this ANN.
• The large matrix of black and white boxes r on the lower right depicts
da
u d
the weights from the 30 x 32 pixel inputs into the hidden unit.
h H
es
• Here, a white box indicates ahpositive weight, a black box a negative
M a
weight, and the size of the box indicates the weight magnitude.
• The smaller rectangular diagram directly above the large matrix
shows the weights from this hidden unit to each of the 30 output
units.
Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar
dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar


ALVINN

dar
u d
h H
hes
M a

Watch Video Tutorial at https://www.youtube.com/@MaheshHuddar 191


191

You might also like