ML Unit-4
ML Unit-4
• Strengths of SVM
o SVM can be used for both classification and regression.
o It is robust, i.e. not much impacted by data with noise or outliers.
o The prediction results using this model are very promising.
• Weaknesses of SVM
o SVM is applicable only for binary classification, i.e. when there are only two
classes in the problem domain.
o The SVM model is very complex – almost like a black box when it deals with
a high-dimensional data set. Hence, it is very difficult and close to impossible
to understand the model in such cases.
o It is slow for a large dataset, i.e. a data set with either a large number of
features or a large number of instances.
4.3.3 K-Nearest Neighbours Algorithm
K-nearest neighbors (K-NN) algorithm is a type of supervised ML algorithm that
can be used for both classification as well as regression.
How does K-NN work?
The K-NN working can be explained based on the below algorithm:
o Step-1: Select the number K of the neighbors
o Step 2: Calculate the Euclidean distance of K number of neighbors
o Step 3: Take the K nearest neighbors as per the calculated Euclidean distance.
o Step-4: Among these k neighbors, count the number of the data points in each
category.
o Step-5: Assign the new data points to that category for which the number of
neighbors is maximum.
o Step 6: Our model is ready.
Why the K-NN algorithm is called a lazy learner?
Eager learners follow the general steps of machine learning, i.e. perform an
abstraction of the information obtained from the input data and then follow it
through by a generalization step. However, as we have seen in the case of the
K-NN algorithm, these steps are completely skipped. It stores the training data
and directly applies the philosophy of nearest neighborhood finding to arrive at
the classification. So, for K-NN, there is no learning happening in the real sense.
Therefore, K-NN falls under the category of lazy learner.
• Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent the decision
rules and each leaf node represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions and do not contain any
further branches.
• The decisions or the tests are performed based on features of the given dataset.
• It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
• It is called a decision tree because, similar to a tree, it starts with the root node,
which expands on further branches and constructs a tree-like structure.
• To build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
• A decision tree simply asks a question, and based on the answer (Yes/No), it
further splits the tree into subtrees.
Decision Tree Terminologies
• Root Node: The root node is from where the decision tree starts. It represents the
entire dataset, which further gets divided into two or more homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
• Splitting: Splitting is the process of dividing the decision node/root node into
sub-nodes according to the given conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing unwanted branches from the tree.
• Parent/Child node: The root node of the tree is called the parent node, and other
nodes are called the child nodes.
• Attribute Selection Measures
While implementing a Decision tree, the main issue arises as to how to select
the best attribute for the root node and sub-nodes. So, to solve such problems
there is a technique which is called an Attribute selection measure or
ASM. By this measurement, we can easily select the best attribute for the nodes
of the tree. There are two popular techniques for ASM, which are:
o Information Gain
o Gini Index
o Information Gain:
o Information gain is the measurement of changes in entropy after the
segmentation of a dataset based on an attribute.
o It calculates how much information a feature provides us about a class.
o According to the value of information gain, we split the node and built the
decision tree.
o A decision tree algorithm always tries to maximize the value of information
gain, and a node/attribute having the highest information gain is split first. It
can be calculated using the below formula:
Information Gain= Entropy(S)- [(Weighted Avg) *Entropy (each feature)]
Where,
o S= Total number of samples
o P(yes)= probability of yes
o P(no)= probability of no
• Gini Index:
o The Gini index is a measure of impurity or purity used while creating a
decision tree in the CART (Classification and Regression Tree) algorithm.
o An attribute with the low Gini index should be preferred as compared to
the high Gini index.
o It only creates binary splits, and the CART algorithm uses the Gini index
to create binary splits.
o The Gini index can be calculated using the below formula:
Gini Index= 1- ∑jPj2