0% found this document useful (0 votes)
2 views31 pages

Lecture 07

The document discusses machine learning algorithms, distinguishing between parametric and non-parametric methods. Parametric algorithms simplify learning by assuming a specific functional form, while non-parametric algorithms are more flexible and can learn any functional form from data. It also covers decision trees for classification, including concepts like entropy, Gini index, and information gain for evaluating splits in the tree.

Uploaded by

ranamzeeshan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views31 pages

Lecture 07

The document discusses machine learning algorithms, distinguishing between parametric and non-parametric methods. Parametric algorithms simplify learning by assuming a specific functional form, while non-parametric algorithms are more flexible and can learn any functional form from data. It also covers decision trees for classification, including concepts like entropy, Gini index, and information gain for evaluating splits in the tree.

Uploaded by

ranamzeeshan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

MACHINE LEARNING

Lecture 07
Dr. Samana Batool
DECISION TREES
PARAMETRIC ML ALGORITHMS
Assumptions can greatly simplify the learning process, but can also limit what can be learned.
Algorithms that simplify the function to a known form are called parametric machine learning algorithms.
The algorithms involve two steps:
1.Select a form for the function.
2.Learn the coefficients for the function from the training data.
Examples: Logistic Regression, Linear Regression, Linear Discriminant Analysis, Perceptron, Naive
Bayes, Simple Neural Networks
Benefits of Parametric Machine Learning Algorithms:
•Simpler: Easier to understand and interpret results.
•Speed: Very fast to learn from data.
•Less Data: Do not require as much training data and can work well even if the fit is not perfect.
Limitations of Parametric Machine Learning Algorithms:
•Constrained: By choosing a functional form these methods are highly constrained to the specified
form.
•Limited Complexity: The methods are more suited to simpler problems.
•Poor Fit: In practice the methods are unlikely to match the underlying mapping function.
NON-PARAMETRIC ML ALGORITHMS
Algorithms that do not make strong assumptions about the form of the mapping function are called
nonparametric machine learning algorithms. By not making assumptions, they are free to learn any
functional form from the training data.
Nonparametric methods are good when you have a lot of data and no prior knowledge, and when you
don’t want to worry too much about choosing just the right features.
Examples: k-Nearest Neighbors, Decision Trees, Support Vector Machines
Benefits of Nonparametric Machine Learning Algorithms:
 Flexibility: Capable of fitting a large number of functional forms.
 Power: No assumptions (or weak assumptions) about the underlying function.
 Performance: Can result in higher performance models for prediction.
Limitations of Nonparametric Machine Learning Algorithms:
 More data: Require a lot more training data to estimate the mapping function.
 Slower: A lot slower to train as they often have far more parameters to train.
 Overfitting: More of a risk to overfit the training data and it is harder to explain why specific
predictions are made.
CLASSIFICATION
 The classification of an unknown input vector is done by traversing the tree from
the root node to a leaf node.
 A record enters the tree at the root node.
 At the root node, a test is applied to determine which child node the record will
encounter next.
 This process is repeated until the record arrives at a leaf node.
 All the records that end up at given leaf of the tree are classified in the same way.
 There is a unique path from the root to each leaf.
 The path is a rule which is used to classify the records.
40 60 40 60

28 42 12 18 40 60

No Improvement Perfect Split


𝑘

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑇 = − ෍ 𝑝𝑙 log 2 𝑝𝑙
𝑙=1
Min Entropy= 0 (No impurity)
Max Entropy = 1 (Max impurity for
binary classes)
𝑘

Min Gini index = 0 (No impurity) 𝐺𝑖𝑛𝑖 𝑇 = 1 − ෍ 𝑝𝑙2


Max Gini index = 0.5 (Max impurity 𝑙=1
for binary classes)
INFORMATION GAIN
INFORMATION GAIN
INFORMATION GAIN

𝑁𝑙𝑒𝑓𝑡 𝑁𝑟𝑖𝑔ℎ𝑡
𝐼𝐺 = 𝐼 − 𝐼𝑙𝑒𝑓𝑡 − 𝐼𝑟𝑖𝑔ℎ𝑡
𝑁 𝑁

IG – Information Gain
I – Impurity calculated on parent node (Gini or Entropy)
Ileft – Impurity calculated on left child node
Iright – Impurity calculated on right child node
N – Total no. of samples
Nleft – No. of samples at left child node
Nright – No. of samples at left child node
INFORMATION GAIN FOR A1
2 2
29 35
𝐴𝑡 𝑟𝑜𝑜𝑡 𝑛𝑜𝑑𝑒; 𝐼 = 1 − − = 0.496
64 64
2 2
21 5
𝐴𝑡 𝑙𝑒𝑓𝑡 𝑛𝑜𝑑𝑒; 𝐼𝑙𝑒𝑓𝑡 =1 − − = 0.310
26 26
2 2
8 30
𝐴𝑡 𝑟𝑖𝑔ℎ𝑡 𝑛𝑜𝑑𝑒; 𝐼𝑟𝑖𝑔ℎ𝑡 = 1 − − = 0.332
38 38
𝑁𝑙𝑒𝑓𝑡 𝑁𝑟𝑖𝑔ℎ𝑡
𝐼𝐺 = 𝐼 − 𝐼𝑙𝑒𝑓𝑡 − 𝐼𝑟𝑖𝑔ℎ𝑡
𝑁 𝑁
26 38
𝐼𝐺 = 0.496 − 0.310 − 0.332
64 64
𝐼𝐺 = 0.496 − 0.33
𝐼𝐺 = 0.166
INFORMATION GAIN FOR A2
INFORMATION GAIN
2 2
29 35
𝐴𝑡 𝑟𝑜𝑜𝑡 𝑛𝑜𝑑𝑒; 𝐼 = 1 − − = 0.496
64 64
2 2
18 33
𝐴𝑡 𝑙𝑒𝑓𝑡 𝑛𝑜𝑑𝑒; 𝐼𝑙𝑒𝑓𝑡 =1 − − = 0.457
51 51
2 2
11 2
𝐴𝑡 𝑟𝑖𝑔ℎ𝑡 𝑛𝑜𝑑𝑒; 𝐼𝑟𝑖𝑔ℎ𝑡 = 1 − − = 0.260
13 13
𝑁𝑙𝑒𝑓𝑡 𝑁𝑟𝑖𝑔ℎ𝑡
𝐼𝐺 = 𝐼 − 𝐼𝑙𝑒𝑓𝑡 − 𝐼𝑟𝑖𝑔ℎ𝑡
𝑁 𝑁
51 13
𝐼𝐺 = 0.496 − 0.457 − 0.260
64 64
𝐼𝐺 = 0.496 − 0.417
𝐼𝐺 = 0.079
Evaluation data
Error rate

Training data

Number of split nodes


Ex
Ex
Ex
100 %
𝑹𝟐
80%

Performance
𝑹𝟑
𝑹𝟏

0%
0 year 5 years 20 years
Experience
TRAINING DATA EXAMPLE: GOAL IS TO PREDICT WHEN THIS
PLAYER WILL PLAY TENNIS?

You might also like