Module2 2
Module2 2
Bengaluru
Module 2
Supervised machine learning algorithms Part 2
– Classification models
Agenda
• Classification models
• Decision Tree algorithms using Entropy and Gini Index
as measures of node impurity,
• Model evaluation metrics for classification algorithms,
• Cohen's Kappa Statistic,
• Multi-class classification
• Class Imbalance problem.
• Naïve Bayes Classifiers
• Naive Bayes model for sentiment classification – An
Introduction
Artificial Intelligence
classification models
Artificial Intelligence
What is classification algorithm?
• Classification algorithm is a Supervised Learning technique in
which a program learns from the given dataset or observations
and then classifies new observation into a number of classes or
groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat
or dog, etc.
• Classes can be called as targets/labels or categories.
• Unlike regression, the output variable of Classification is a
category, not a value, such as "Green or Blue", "fruit or animal”.
• Since Classification algorithm is a Supervised learning
technique, it takes labeled input data, which means it contains
input with the corresponding output.
• In classification algorithm, a discrete output function(y) is
mapped to input variable(x), i.e.,
y=f(x), where y = categorical output
Artificial Intelligence
Types of classifications
Artificial Intelligence
Learners in classification problems
1. Lazy Learners
• Stores the training dataset and wait until it receives the test
dataset.
• In this case, classification is done on the basis of the most related
data stored in the training dataset.
• It takes less time in training but more time for predictions.
• Example: K-NN algorithm, Case-based reasoning
2. Eager Learners
• Eager Learners develop a classification model based on a training
dataset before receiving a test dataset.
• Unlike Lazy learners, Eager Learner takes more time in learning,
and less time in prediction.
• Example: Decision Trees, Naïve Bayes, ANN.
Artificial Intelligence
Types of classification algorithms
• Linear Models
• Logistic Regression
• Support Vector Machines
• Non-linear Models
• K-Nearest Neighbours
• Kernel SVM
• Naïve Bayes
• Decision Tree Classification
• Random Forest Classification
Artificial Intelligence
Methods for Evaluating a classification
model
Artificial Intelligence
Methods for Evaluating a classification
model
Confusion Matrix
• The confusion matrix provides us a matrix/table as output and
describes the performance of the model.
• It is also known as the error matrix.
• The matrix consists of predictions result in a summarized form,
which has a total number of correct predictions and incorrect
predictions. The matrix looks like as below table:
Actual Positive Actual
Negative
Predicted Positive True Positive False Positive
Artificial Intelligence
Methods for Evaluating a classification
model
AUC-ROC curve
Artificial Intelligence
Uses cases of classification algorithms
Artificial Intelligence
Decision Tree algorithms using
Entropy and Gini Index
Artificial Intelligence
What is decision tree?
• A Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is
preferred for solving Classification problems.
• Contains two nodes: Decision Node and Leaf Node.
• Decision nodes are used to make any decision and have
multiple branches, whereas Leaf nodes are the output of those
decisions and do not contain any further branches.
• It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent
the decision rules and each leaf node represents the
outcome.
• The decisions or the test are performed on the basis of features
of the given dataset.
• It is a graphical representation for getting all the
possible solutions to a problem/decision based on given
conditions.
Artificial Intelligence
CONTD…
• In order to build a tree, we use the CART algorithm, which
stands for Classification and Regression Tree algorithm.
• A decision tree simply asks a question, and based on the
answer (Yes/No), it further split the tree into subtrees
Artificial Intelligence
Significance of decision tree
Artificial Intelligence
Decision Tree Terminologies
Root Node: Node from where the decision tree starts. It
represents the entire dataset, which further gets divided into two
or more homogeneous sets.
Leaf Node: Final output node, and the tree cannot be segregated
further after getting a leaf node.
Parent & Child node: The root node of the tree is called the
parent node, and other nodes are called the child nodes.
Artificial Intelligence
Decision tree algorithm working
• Step-1: Begin the tree with the root node, says S, which
contains the complete dataset.
• Step-2: Find the best attribute in the dataset using Attribute
Selection Measure (ASM).
• Step-3: Divide the S into subsets that contains possible values
for the best attributes.
• Step-4: Generate the decision tree node, which contains the
best attribute.
• Step-5: Recursively make new decision trees using the subsets
of the dataset created in step -3. Continue this process until a
stage is reached where you cannot further classify the nodes
and called the final node as a leaf node.
Artificial Intelligence
Illustrative Example
Artificial Intelligence
Attribute Selection Measures: Entropy
Artificial Intelligence
Attribute Selection Measures: Information
Gains
Artificial Intelligence
Attribute Selection Measures: Gini Index
•It only creates binary splits, and the CART algorithm uses the
Gini index to create binary splits.
Artificial Intelligence
Numerical example –
Decision Tree (Entropy, Gini Impurity & Information
Gain)
Artificial Intelligence
Artificial Intelligence
Artificial Intelligence
Artificial Intelligence
Artificial Intelligence
Computation time is reduced as don’t use
logarithmic function in Gini impurity
Artificial Intelligence
Artificial Intelligence
Advantages & Disadvantages of the Decision Tree
Artificial Intelligence