6.2.unit-2 ML Handsout
6.2.unit-2 ML Handsout
L-10
LECTURE HANDOUTS
IT IV/VII-A
Date of Lecture:
Introduction: (Maximum 5 sentences) : Decision Tree is the most powerful and popular tool for
classification and prediction. A Decision tree is a flowchart-like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node
(terminal node) holds a class label.
Prerequisite knowledge for Complete understanding and learning of Topic: (Max. Four
important topics)
Theory of computation basics
Non- Deterministic Finite Automata
Deterministic Finite Automata
Detailed content of the Lecture:
Construction of Decision Tree: A tree can be “learned” by splitting the source set into
subsets based on an attribute value test. This process is repeated on each derived subset in a
recursive manner called recursive partitioning.
The recursion is completed when the subset at a node all has the same value of the target
variable, or when splitting no longer adds value to the predictions.
The construction of a decision tree classifier does not require any domain knowledge or
parameter setting, and therefore is appropriate for exploratory knowledge discovery. Decision
trees can handle high-dimensional data.
In general decision tree classifier has good accuracy. Decision tree induction is a typical
inductive approach to learn knowledge on classification.
Decision Tree Representation: Decision trees classify instances by sorting them down the
tree from the root to some leaf node, which provides the classification of the instance.
An instance is classified by starting at the root node of the tree, testing the attribute specified by
this node, then moving down the tree branch corresponding to the value of the attribute as
shown in the above figure.
This process is then repeated for the subtree rooted at the new node. The decision tree in
above figure classifies a particular morning according to whether it is suitable for playing
tennis and returns the classification associated with the particular leaf.(in this case Yes or
No).Would be sorted down the leftmost branch of this decision tree and would therefore be
classified as a negative instance.
In other words, we can say that the decision tree represents a disjunction of conjunctions of
constraints on the attribute values of instances.
Gini Index:
Gini Index is a score that evaluates how accurate a split is among the classified groups. Gini
index evaluates a score in the range between 0 and 1, where 0 is when all observations belong to
one class, and 1 is a random distribution of the elements within classes.
In this case, we want to have a Gini index score as low as possible. Gini Index is the evaluation
metrics we shall use to evaluate our Decision Tree Model.
Video Content / Details of website for further learning (if any):
https://lecturenotes.in/notes/24274-note-for-machine-learning-ml-by-new- swaroop
https://www.youtube.com/watch?v=IpGxLWOIZy4
Course Teacher
Verified by HoD
MUTHAYAMMAL ENGINEERING COLLEGE
(An Autonomous Institution)
(Approved by AICTE, New Delhi, Accredited by NAAC & Affiliated to Anna University)
Rasipuram-637408, Namakkal Dist., TamilNadu
L-11
LECTURE HANDOUTS
IT IV/VII-A
Date of Lecture:
Missing values in data also do not influence the process of building a choice tree to any
considerable extent.
Video Content/Details of website for further learning (if any):
https://lecturenotes.in/notes/2 4274-note-for-machine-learning-ml-by-new-swaroop
https://www.youtube.com/watch?v=ukzFI9rgwfU
Important Books/Journals for further learning including the page nos.:
Tom Mitchell, Machine Learning ,Tata Mc Grill,1997
Course Teacher
Verified by HoD
MUTHAYAMMAL ENGINEERING COLLEGE
(An Autonomous Institution)
(Approved by AICTE, New Delhi, Accredited by NAAC & Affiliated to Anna University)
Rasipuram-637408, Namakkal Dist., TamilNadu
L-12
LECTURE HANDOUTS
IT IV/VII-A
Course Name with Code : Machine Learning -
Date of Lecture:
Topic of Lecture: Picking the best splitting attribute entropy and information gain
Introduction: (Maximum 5 Sentences) A decision tree is a supervised learning algorithm used for
both classification and regression problems. Simply put, it takes the form of a tree with branches
representing the potential answers to a given question. There are metrics used to train decision trees.
One of them is information gain. In this article, we will learn how information gain is computed, and
how it is used to train decision trees.
Prerequisite knowledge for Complete understanding and learning of Topic: (Max. Four
important topics)
Concepts of Supervised Learning
Application of supervised learning
Detailed content of the Lecture:
Decision trees are one of the predictive modeling approaches used in machine learning. It uses
a decision tree to travel from observations about an object (represented by the branches) to
inferences about the item’s target value (represented by the leaves) (as a predictive model)
A decision tree’s main idea is to locate the features that contain the most information about the
target feature and then split the dataset along with their values. The characteristic that best
isolates the uncertainty from knowledge about the target feature is the most informative. The
search for the most informative attribute continues until all we have are pure leaf nodes.
To put it another way, a high order of disorder indicates a low level of impurity. Entropy is a
measure of disorder that ranges from 0 to 1. It can be higher than 1 depending on the number
of groups or classes present in the data collection, but it has the same meaning.
Course Teacher
Verified by HoD
MUTHAYAMMAL ENGINEERING COLLEGE
(An Autonomous Institution)
(Approved by AICTE, New Delhi, Accredited by NAAC & Affiliated to Anna University)
Rasipuram-637408, Namakkal Dist., TamilNadu
L-13
LECTURE HANDOUTS
IT IV/VII-A
Date of Lecture:
Topic of Lecture: Searching for simple trees and computational complexity
Introduction: (Maximum 5 Sentences): The time complexity for creating a tree is O(1). The time
complexity for searching, inserting or deleting a node depends on the height of the tree h, so the worst
case is O(h) in case of skewed trees.
Prerequisite knowledge for Complete understanding and learning of Topic: (Max. Four
important topics)
Learning a single class Concepts of supervised learning
Course Teacher
Verified by HoD
MUTHAYAMMAL ENGINEERING COLLEGE
(An Autonomous Institution)
(Approved by AICTE, New Delhi, Accredited by NAAC & Affiliated to Anna University)
Rasipuram-637408, Namakkal Dist., TamilNadu
L-14
LECTURE HANDOUTS
IT IV/VII-A
Date of Lecture:
Topic of Lecture: Occam's razor Over fitting, noisy data, and pruning
Introduction: (Maximum 5 Sentences) : Ockham's razor (also spelled Occam's razor, pronounced AHK-
uhmz RAY-zuhr) is the idea that, in trying to understand something, getting unnecessary information out of the
way is the fastest way to the truth or to the best explanation.
Well, there can be many decision trees that are consistent with a given set of training
examples, but the inductive bias of the ID3 algorithm results in the preference for simper (or
shorter trees) trees.
This preference bias of ID3 arises from the fact that there is an ordering of the hypotheses in
the search strategy. This leads to additional bias that attributes high with information gain
closer to the root is preferred. Therefore, there is a definite order the algorithm follows until it
terminates on reaching a hypothesis that is consistent with the training data.
Course Teacher
Verified by HoD
MUTHAYAMMAL ENGINEERING COLLEGE
(An Autonomous Institution)
(Approved by AICTE, New Delhi, Accredited by NAAC & Affiliated to Anna University)
Rasipuram-637408, Namakkal Dist., TamilNadu
L-15
LECTURE HANDOUTS
IT IV/VII-A
Introduction:(Maximum 5 Sentences): Ensemble learning is the process by which multiple models, such as
classifiers or experts, are strategically generated and combined to solve a particular computational intelligence
problem. Ensemble learning is primarily used to improve the (classification, prediction, function approximation,
etc.)
Prerequisite knowledge for Complete understanding and learning of Topic: (Max. Four
important topics)
Concepts of Supervised Learning
Application of supervised
learning Probability and Inference
Course Teacher
Verified by HoD
MUTHAYAMMAL ENGINEERING COLLEGE
(An Autonomous Institution)
(Approved by AICTE, New Delhi, Accredited by NAAC & Affiliated to Anna University)
Rasipuram-637408, Namakkal Dist., TamilNadu
L-16
LECTURE HANDOUTS
IT IV/VII-A
Date of Lecture:
Topic of Lecture: Measuring the accuracy of learned hypotheses
Introduction: (Maximum 5 Sentences): Ensemble learning refers to the process of combining multiple
models, such as classifiers or experts into a committee, in order to solve a computational problem. The
main objective of using ensemble learning is to improve the model performance, such as classification
and predictions accuracy.
Prerequisite knowledge for Complete understanding and learning of Topic: (Max. Four
important topics)
Concepts of Supervised Learning
Bayes rule
Detailed content of the Lecture:
Labeled data can be expensive to acquire in several application domains, including medical
imaging, robotics, and computer vision.
To efficiently train machine learning models under such high labeling costs, active learning
(AL) judiciously selects the most informative data instances to label on-the-fly. This active
sampling process can benefit from a statistical function model, that is typically captured by a
Gaussian process (GP).
While most GP-based AL approaches rely on a single kernel function, the present contribution
advocates an ensemble of GP models with weights adapted to the labeled data collected
incrementally. Building on this novel EGP model, a suite of acquisition functions emerges based
on the uncertainty and disagreement rules.
An adaptively weighted ensemble of EGP-based acquisition functions is also introduced to
further robustify performance. Extensive tests on synthetic and real datasets showcase the merits
of the proposed EGP-based approaches with respect to the single GP-based AL alternatives.
Active Learning (AL) is an emerging field of machine learning focusing on creating a closed loop of
learner (statistical model) and oracle (expert able to label examples) in order to exploit the vast
amounts of accessible unlabeled datasets in the most effective way from the classification point of
view.
This paper analyzes the problem of multiclass active learning methods and proposes to approach it
in a new way through substitution of the original concept of predefined utility function with an
ensemble of learners.
As opposed to known ensemble methods in AL, where learners vote for a particular example, we
use them as a black box mechanisms for which we try to model the current competence value using
adaptive training scheme.
We show that modeling this problem as a multi-armed bandit problem and applying even very basic
strategies bring significant improvement to the AL process.
Video Content/Details of website for further learning(if any):
https://lecturenotes.in/notes/2 4274-note-for-machine-learning-ml-by-new-swaroop
https://www.youtube.com/watch?v=WpxKSK2a0
Course Teacher
Verified by HoD
MUTHAYAMMAL ENGINEERING COLLEGE
(An Autonomous Institution)
(Approved by AICTE, New Delhi, Accredited by NAAC & Affiliated to Anna University)
Rasipuram-637408, Namakkal Dist., TamilNadu
IT IV/VII-A
Date of Lecture:
Topic of Lecture: Comparing learning algorithms: cross validation
Introduction:(Maximum 5 Sentences):
This is made clear by distinguishing between the true error of a model and the estimated or sample
error. One is the error rate of the hypothesis over the sample of data that is available. The other is the
error rate of the hypothesis over the entire unknown distribution D of examples.
Second, depending on the nature of the particular set of test examples, even if the hypothesis
accuracy is tested over an unbiased set of test instances independent of the training examples,
the measurement accuracy can still differ from the true accuracy. The anticipated variance
increases as the number of test examples decreases.
When evaluating a taught hypothesis, we want to know how accurate it will be at classifying
future instances.Also, to be aware of the likely mistake in the accuracy estimate. There is an X-
dimensional space of conceivable scenarios. We presume that different instances of X will be
met at different times.
The following two questions are of particular relevance to us in this context,
1. What is the best estimate of the accuracy of h over future instances taken from the same
distribution, given a hypothesis h and a data sample containing n examples picked at random
according to the distribution D
True Error and Sample Error:
We must distinguish between two concepts of accuracy or, to put it another way, error. One is the
hypothesis’s error rate based on the available data sample. The hypothesis’ error rate over the
complete unknown distribution D of examples is the other. These will be referred to as the sampling
error and real error, respectively. The fraction of S that a hypothesis misclassifies is the sampling error
of a hypothesis with respect to some sample S of examples selected from X.
Sample Error:
It is denoted by errors(h) of hypothesis h with respect to target function f and data sample S is
Where n is the number of examples in S, and the quantity is 1 if f(x) != h(x), and
0 otherwise.
Course Teacher
Verified by HoD
MUTHAYAMMAL ENGINEERING COLLEGE
(AnAutonomousInstitution)
(Approved by AICTE, New Delhi, Accredited by NAAC & Affiliated to Anna
University) Rasipuram-637408, NamakkalDist., TamilNadu
L-18
LECTURE HANDOUTS
IT IV/VII-A
Date of Lecture:
Topic of Lecture: Learning curves and statistical hypothesis testing
Why cross-validation?
CV provides the ability to estimate model performance on unseen data not used while training.
Data scientists rely on several reasons for using cross-validation during their building process of
Machine Learning (ML) models. For instance, tuning the model hyperparameters, testing
different properties of the overall datasets, and iterate the training process. Also, in cases where
your training dataset is small, and the ability to split them into training, validation, and testing
will significantly affect training accuracy. The following main points can summarize the reason
we use a CV, but they overlap. Hence, the list is presented here in a simplified way:
Course Teacher
Verified by HoD