Refer For KNNDecison Tree SVM
Refer For KNNDecison Tree SVM
•
Linear vs Polynomial
• The main steps involved in Polynomial Regression are given below:
•
K-Nearest Neighborhood Algorithm (KNN)
• Intution behind KNN Algorithm
Features
• (K-NN) algorithm is a versatile and widely used machine learning algorithm
that is primarily used for its simplicity and ease of implementation.
• It does not require any assumptions about the underlying data distribution.
• It can also handle both numerical and categorical data, making it a flexible
choice for various types of datasets in classification and regression tasks.
• It is a non-parametric method that makes predictions based on the
similarity of data points in a given dataset.
• K-NN is less sensitive to outliers compared to other algorithms.
•
• The K-NN algorithm works by finding the K nearest neighbors to a
given data point based on a distance metric, such as Euclidean
distance.
• The class or value of the data point is then determined by the
majority vote or average of the K neighbors.
• This approach allows the algorithm to adapt to different patterns and
make predictions based on the local structure of the data.
•
Distance Metrics Used in KNN Algorithm
• Euclidean Distance
• Manhattan Distance
•
• The K-NN algorithm compares a new data entry to the values in a
given data set (with different classes or categories).
• Based on its closeness or similarities in a given range (K) of neighbors,
the algorithm assigns the new data to a class or category in the data
set (training data).
•
Steps in KNN Algorithm
KNN Example 1
• Since the value of K is 3, the algorithm will only consider the 3 nearest
neighbors to the green point (new entry). This is represented in the
graph above.
•
KNN Example 2
• Consider following dataset
Assumptions
KNN Algorithm
Decision Tree
• Decision trees, a key tool in machine learning,
• This model predict outcomes based on input data through a tree-like
structure.
• They offer interpretability, versatility, and simple visualization,
making them valuable for both categorization and regression tasks.
•
Concept
• It is a tree-like structure where
- each internal node tests on attribute,
- each branch corresponds to attribute value and
- each leaf node represents the final decision or prediction.
• While decision trees have advantages like ease of understanding,
they may face challenges such as overfitting.
• Understanding their terminologies and formation process is essential
for effective application in diverse scenarios.
• Decision trees are upside down which means the root is at the top
and then this root is split into various several nodes.
• Decision trees are nothing but a bunch of if-else statements in layman
terms.
• It checks if the condition is true and if it is then it goes to the next
node attached to that decision.
Example 1:
• Here, it will ask –
• what is the weather?
• Is it sunny, cloudy, or rainy?
• If yes then it will go to the next feature which is humidity and wind.
• It will again check if there is a strong wind or weak, if it’s a weak wind
and it’s rainy then the person may go and play.
We see that if the weather is cloudy then we must go to play.
Why didn’t it split more? Why did it stop there?
• But in simple terms,
• output for the training dataset is always yes for cloudy weather, since
there is no disorderliness here we don’t need to split the node
further.
• the higher the Entropy, the lower will be the purity and the higher
will be the impurity.
• The goal of machine learning is to decrease the uncertainty or
impurity in the dataset, here by using the entropy we are getting
the impurity of a particular node
• we don’t know if the parent entropy or the entropy of a particular
node has decreased or not.
• New metric called “Information gain” which tells us how much the
parent entropy has decreased after splitting it with some feature.
Information Gain
• Information gain measures the reduction of uncertainty given some
feature and it is also a deciding factor for which attribute should be
selected as a decision node or root node.
• Parent entropy was near 0.99 and after looking at this value of information
gain-
Conclusion : entropy of the dataset will decrease by 0.37 if we make
“Energy” as our root node.
• Feature 2
• Conclusions:
• “Energy” feature gives more reduction which is 0.37 than the
“Motivation” feature. Hence we will select the feature which has the
highest information gain and then split the node based on that
feature.
• “Energy” will be our root node and we’ll do the same for
sub-nodes. Here we can see that when the energy is “high” the
entropy is low and hence we can say a person will definitely go to
the gym if he has high energy,
• but what if the energy is low? We will again split the node based on
the new feature which is “Motivation”.
Prunning
• Pruning is another method that can help us avoid overfitting. It helps
in improving the performance of the Decision tree by cutting the
nodes or sub-nodes which are not significant. Additionally, it removes
the branches which have very low importance.
• There are mainly 2 ways for pruning:
• Pre-pruning – we can stop growing the tree earlier, which means we
can prune/remove/cut a node if it has low importance while
growing the tree.
• Post-pruning – once our tree is built to its depth, we can start
pruning the nodes based on their significance.
Example 3
SVM (Support Vector Machine)
⦿ Concept
⦿ Types
⦿ Linear
⦿ Non-linear
⦿ Use of Dot products
⦿ Examples
⦿ Kernel in SVM
Concept
• SVM is a powerful supervised algorithm that works best on smaller
datasets but on complex ones.
• used for both regression and classification tasks, but generally, they
work best in classification problems.
• It is a supervised machine learning problem where we try to find a
hyperplane that best separates the two classes.
• Don’t get confused between SVM and logistic regression.
• Both the algorithms try to find the best hyperplane, but the main
difference is logistic regression is a probabilistic approach whereas
support vector machine is based on statistical approaches.
• Answers to questions like –
- which hyperplane does it select?
- There can be an infinite number of hyperplanes passing through a
point and classifying the two classes perfectly.
- So, which one is the best?
• Depending on the number of features you have you can either
choose Logistic Regression or SVM.
• SVM works best when the dataset is small and complex.
• advisable to first use logistic regression and see how does it performs,
if it fails to give a good accuracy you can go for SVM without any
kernel
• Logistic regression and SVM without any kernel have similar
performance but depending on your features, one may be more
efficient than the other.
Types of SVM
• Linear SVM: When the data is perfectly linearly separable only then
we can use Linear SVM. Perfectly linearly separable means that the
data points can be classified into 2 classes by using a single straight
line(if 2D).
• Non-Linear SVM: When the data is not linearly separable then we can
use Non-Linear SVM, which means when the data points cannot be
separated into 2 classes by using a straight line (if 2D) then we use
some advanced techniques like kernel tricks to classify them.
• In most real-world applications we do not find linearly separable
datapoints hence we use kernel trick to solve them.
Important Definitions
• Support Vectors: These are the points that are closest to the
hyperplane. A separating line will be defined with the help of these
data points.
• Margin: it is the distance between the hyperplane and the
observations closest to the hyperplane (support vectors). In SVM
large margin is considered a good margin. There are two types of
margins hard margin and soft margin.
Example – Linear SVM
• We want to classify that the new data point as either blue or green.
• To classify these points, we can have many decision boundaries, but
the question is which is the best and how do we find it?
• The best hyperplane is that plane that has the maximum distance
from both the classes, and this is the main aim of SVM.
• This is done by finding different hyperplanes which classify the labels
in the best way then it will choose the one which is farthest from the
data points or the one which has a maximum margin.
How does it work?
⦿ Identify Cat or Dog?
⦿ Support Vectors :
⦿ Linear SVM : Hyperplane
⦿ Non-linear SVM example
Example - Non –linear SVM
⦿ Finding equation for SV :
⦿ Final Classification result
Use of Dot Product in SVM
• The dot product can be defined as the projection of one vector along
with another, multiply by the product of another vector.
• Consider a random point X and we want to know whether it lies on
the right side of the plane or the left side of the plane (positive or
negative).
• Assume this point is a vector (X) and then we make a vector (w) which
is perpendicular to the hyperplane. Let’s say the distance of vector w
from origin to decision boundary is ‘c’. Now we take the projection of
X vector on w.
• Criteria for Classification based on dot product:
- projection of any vector or another vector is called dot-product. we
take the dot product of x and w vectors.
• If the dot product is greater than ‘c’ point lies on the right side.
• If the dot product is less than ‘c’ then the point is on the left side
• If the dot product is equal to ‘c’ then the point lies on the decision
boundary.
Margin in Support Vector Machine
• To classify a point as negative or positive we need to define a decision
rule.
• The equation of a hyperplane is w.x+b=0 where w is a vector normal
to hyperplane and b is an offset.