0% found this document useful (0 votes)

19 views16 pages

Unit 2

The document discusses the K-Nearest Neighbor algorithm, explaining how it works and its advantages and disadvantages. It then discusses the Support Vector Machine algorithm, explaining how it finds the optimal separating hyperplane for classification problems and how it can be used for both linear and non-linear classification.

Uploaded by

antaryami barik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views16 pages

Unit 2

Uploaded by

antaryami barik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

K-Nearest Neighbor(KNN) Algorithm for Machine Learning

o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on

Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.
o K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well
suite category by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but mostly it is
used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
o It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets new data,
then it classifies that data into a category that is much similar to the new data.
o Example: Suppose, we have an image of a creature that looks similar to cat and dog, but
we want to know either it is a cat or dog. So for this identification, we can use the KNN
algorithm, as it works on a similarity measure. Our KNN model will find the similar
features of the new data set to the cats and dogs images and based on the most similar
features it will put it in either cat or dog category.
Why do we need a K-NN Algorithm?

Suppose there are two categories, i.e., Category A and Category B, and we have a new data point
x1, so this data point will lie in which of these categories. To solve this type of problem, we need
a K-NN algorithm. With the help of K-NN, we can easily identify the category or class of a
particular dataset. Consider the below diagram:

How does K-NN work?

The K-NN working can be explained on the basis of the below algorithm:

o Step-1: Select the number K of the neighbors

o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
o Step-4: Among these k neighbors, count the number of the data points in each category.
o Step-5: Assign the new data points to that category for which the number of the neighbor
is maximum.
o Step-6: Our model is ready.

Suppose we have a new data point and we need to put it in the required category. Consider the
below image:

o Firstly, we will choose the number of neighbors, so we will choose the k=5.
o Next, we will calculate the Euclidean distance between the data points. The Euclidean
distance is the distance between two points, which we have already studied in geometry.
It can be calculated as:
o By calculating the Euclidean distance we got the nearest neighbors, as three nearest
neighbors in category A and two nearest neighbors in category B. Consider the below
image:
o As we can see the 3 nearest neighbors are from category A, hence this new data point
must belong to category A.

How to select the value of K in the K-NN Algorithm?

Below are some points to remember while selecting the value of K in the K-NN algorithm:

o There is no particular way to determine the best value for "K", so we need to try some
values to find the best out of them. The most preferred value for K is 5.
o A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of
outliers in the model.
o Large values for K are good, but it may find some difficulties.

Advantages of KNN Algorithm:

o It is simple to implement.
o It is robust to the noisy training data
o It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:

o Always needs to determine the value of K which may be complex some time.
o The computation cost is high because of calculating the distance between the data points
for all the training samples.

Support Vector Machine Algorithm

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is used
for Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can segregate
n-dimensional space into classes so that we can easily put the new data point in the correct
category in the future. This best decision boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector Machine.
Consider the below diagram in which there are two different categories that are classified using a
decision boundary or hyperplane:

Example: SVM can be understood with the example that we have used in the KNN classifier.
Suppose we see a strange cat that also has some features of dogs, so if we want a model that can
accurately identify whether it is a cat or dog, so such a model can be created by using the SVM
algorithm. We will first train our model with lots of images of cats and dogs so that it can learn
about different features of cats and dogs, and then we test it with this strange creature. So as
support vector creates a decision boundary between these two data (cat and dog) and choose
extreme cases (support vectors), it will see the extreme case of cat and dog. On the basis of the
support vectors, it will classify it as a cat. Consider the below diagram:

Backward Skip 10sPlay VideoForward Skip 10s

SVM algorithm can be used for Face detection, image classification, text categorization, etc.

Types of SVM

SVM can be of two types:

o Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset
can be classified into two classes by using a single straight line, then such data is termed
as linearly separable data, and classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means
if a dataset cannot be classified by using a straight line, then such data is termed as non-
linear data and classifier used is called as Non-linear SVM classifier.

Hyperplane and Support Vectors in the SVM algorithm:

Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-

dimensional space, but we need to find out the best decision boundary that helps to classify the
data points. This best boundary is known as the hyperplane of SVM.
The dimensions of the hyperplane depend on the features present in the dataset, which means if
there are 2 features (as shown in image), then hyperplane will be a straight line. And if there are
3 features, then hyperplane will be a 2-dimension plane.

We always create a hyperplane that has a maximum margin, which means the maximum distance
between the data points.

Support Vectors:

The data points or vectors that are the closest to the hyperplane and which affect the position of
the hyperplane are termed as Support Vector. Since these vectors support the hyperplane, hence
called a Support vector.

How does SVM works?

Linear SVM:

The working of the SVM algorithm can be understood by using an example. Suppose we have a
dataset that has two tags (green and blue), and the dataset has two features x1 and x2. We want a
classifier that can classify the pair(x1, x2) of coordinates in either green or blue. Consider the
below image:

So as it is 2-d space so by just using a straight line, we can easily separate these two classes. But
there can be multiple lines that can separate these classes. Consider the below image:
Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or
region is called as a hyperplane. SVM algorithm finds the closest point of the lines from both the
classes. These points are called support vectors. The distance between the vectors and the
hyperplane is called as margin. And the goal of SVM is to maximize this margin.
The hyperplane with maximum margin is called the optimal hyperplane.
Non-Linear SVM:

If data is linearly arranged, then we can separate it by using a straight line, but for non-linear
data, we cannot draw a single straight line. Consider the below image:
So to separate these data points, we need to add one more dimension. For linear data, we have
used two dimensions x and y, so for non-linear data, we will add a third dimension z. It can be
calculated as:

z=x2 +y2

By adding the third dimension, the sample space will become as below image:
So now, SVM will divide the datasets into classes in the following way. Consider the below
image:

ADVERTISEMENT
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert it
in 2d space with z=1, then it will become as:
Hence we get a circumference of radius 1 in case of non-linear data.

Introduction to Support Vector Regression (SVR)

Support Vector Regression (SVR) is a type of machine learning algorithm used for regression

analysis. The goal of SVR is to find a function that approximates the relationship between the

input variables and a continuous target variable, while minimizing the prediction error.

Unlike Support Vector Machines (SVMs) used for classification tasks, SVR seeks to find a

hyperplane that best fits the data points in a continuous space. This is achieved by mapping the

input variables to a high-dimensional feature space and finding the hyperplane that maximizes

the margin (distance) between the hyperplane and the closest data points, while also minimizing

the prediction error.

SVR can handle non-linear relationships between the input variables and the target variable by

using a kernel function to map the data to a higher-dimensional space. This makes it a powerful

tool for regression tasks where there may be complex relationships between the input variables

and the target variable.

Support Vector Regression (SVR) uses the same principle as SVM, but for regression problems.

Let’s spend a few minutes understanding the idea behind SVR.

The Idea Behind Support Vector Regression

The problem of regression is to find a function that approximates mapping from an input domain

to real numbers on the basis of a training sample. So let’s now dive deep and understand how

SVR works actually.

Consider these two red lines as the decision boundary and the green line as the hyperplane. Our

objective, when we are moving on with SVR, is to basically consider the points that are within

the decision boundary line. Our best fit line is the hyperplane that has a maximum number of

points.

The first thing that we’ll understand is what is the decision boundary (the danger red line

above!). Consider these lines as being at any distance, say ‘a’, from the hyperplane. So, these are
the lines that we draw at distance ‘+a’ and ‘-a’ from the hyperplane. This ‘a’ in the text is

basically referred to as epsilon.

Assuming that the equation of the hyperplane is as follows:

Y = wx+b (equation of hyperplane)

Then the equations of decision boundary become:

wx+b= +a

wx+b= -a

Thus, any hyperplane that satisfies our SVR should satisfy:

-a < Y- wx+b < +a

30 Days of Interview Preparation
100% (1)
30 Days of Interview Preparation
415 pages
Presentation UNIT-2
No ratings yet
Presentation UNIT-2
96 pages
SVM7
No ratings yet
SVM7
53 pages
Classification Algorithm
No ratings yet
Classification Algorithm
43 pages
INT354 - Unit 3
No ratings yet
INT354 - Unit 3
60 pages
Dsbdunitiii T1729232981820-1
No ratings yet
Dsbdunitiii T1729232981820-1
26 pages
ML and Ai Unit 04 and Unit 05
No ratings yet
ML and Ai Unit 04 and Unit 05
58 pages
ML Module 3
No ratings yet
ML Module 3
44 pages
CH 7
No ratings yet
CH 7
33 pages
Unit 4
No ratings yet
Unit 4
69 pages
Supervised Learning - SVM - DT
No ratings yet
Supervised Learning - SVM - DT
43 pages
ML RUSA Module 6 Probablistic EM KNN SVM
No ratings yet
ML RUSA Module 6 Probablistic EM KNN SVM
51 pages
ML Unit-2 (CEC)
No ratings yet
ML Unit-2 (CEC)
96 pages
Machine Learning Algorithms Laiki
No ratings yet
Machine Learning Algorithms Laiki
123 pages
4.4-InstanceBasedLearning Part 2
No ratings yet
4.4-InstanceBasedLearning Part 2
16 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
Chapter 4. Classification Algorithms-Stud
No ratings yet
Chapter 4. Classification Algorithms-Stud
43 pages
Business Data Mining Week 6
No ratings yet
Business Data Mining Week 6
20 pages
Unit 3 Aam
No ratings yet
Unit 3 Aam
30 pages
Unit 5
No ratings yet
Unit 5
28 pages
Unit-1 DL
No ratings yet
Unit-1 DL
29 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
Module 3
No ratings yet
Module 3
79 pages
Unit 5 Learning With Algorithm
No ratings yet
Unit 5 Learning With Algorithm
7 pages
DWDM Manual-1
No ratings yet
DWDM Manual-1
96 pages
ML Lec-19
No ratings yet
ML Lec-19
20 pages
Data Science Unit 3
No ratings yet
Data Science Unit 3
33 pages
AI21
No ratings yet
AI21
6 pages
Support Vector Machine
No ratings yet
Support Vector Machine
18 pages
ML Unit 3 Part B Material
No ratings yet
ML Unit 3 Part B Material
15 pages
Classification Algorithms 3rd
No ratings yet
Classification Algorithms 3rd
15 pages
Define K - Means Clustering? Describe About Bayes Classifier and Support Vector Machine (SVM) ?
No ratings yet
Define K - Means Clustering? Describe About Bayes Classifier and Support Vector Machine (SVM) ?
1 page
ML04 KNN-SVM 2024-2025
No ratings yet
ML04 KNN-SVM 2024-2025
57 pages
Ai Unit 4
No ratings yet
Ai Unit 4
17 pages
Support Vector Machine
No ratings yet
Support Vector Machine
34 pages
Machine Learning Unit-3.3
No ratings yet
Machine Learning Unit-3.3
38 pages
AI Machine Learning All-In-One Mastery Course 2025 Volume 1 (AI Mastery Course Series) (Source, Creator Brown, Jamil)
No ratings yet
AI Machine Learning All-In-One Mastery Course 2025 Volume 1 (AI Mastery Course Series) (Source, Creator Brown, Jamil)
370 pages
Document
No ratings yet
Document
6 pages
ML Module Ii
No ratings yet
ML Module Ii
24 pages
ML Unit 3
No ratings yet
ML Unit 3
17 pages
SVM Algorithm
No ratings yet
SVM Algorithm
17 pages
Day 4 Content
No ratings yet
Day 4 Content
35 pages
Chapter 6 ML Classifications
100% (1)
Chapter 6 ML Classifications
51 pages
SVM Notes
No ratings yet
SVM Notes
4 pages
Basic of SVM Algorithm
No ratings yet
Basic of SVM Algorithm
10 pages
Unit 2 Svms Linear Logistic Regression
No ratings yet
Unit 2 Svms Linear Logistic Regression
9 pages
Support Vector Machine
No ratings yet
Support Vector Machine
12 pages
Assignment B 2 EmailClassification
No ratings yet
Assignment B 2 EmailClassification
6 pages
Presented By: M. Saqib Iqbal Gull Muhammad Presented To: Mr. Imran Ali Khan Artificial Intelligence National College of Bussiness Administration & Economics Multan
No ratings yet
Presented By: M. Saqib Iqbal Gull Muhammad Presented To: Mr. Imran Ali Khan Artificial Intelligence National College of Bussiness Administration & Economics Multan
11 pages
Unit-4 AI - SVM
No ratings yet
Unit-4 AI - SVM
21 pages
Tutorial 7 Machine Learning Algorithms
No ratings yet
Tutorial 7 Machine Learning Algorithms
30 pages
SVM Notes
No ratings yet
SVM Notes
8 pages
SVM
No ratings yet
SVM
11 pages
KNN - Algorithm - SVM - Algorithm
No ratings yet
KNN - Algorithm - SVM - Algorithm
27 pages
Machine Learning (R17a0534) 54 57
No ratings yet
Machine Learning (R17a0534) 54 57
4 pages
Unit 1
No ratings yet
Unit 1
15 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
Machine Learning Algorithms, Real World Applications and Research
No ratings yet
Machine Learning Algorithms, Real World Applications and Research
21 pages
Classification
No ratings yet
Classification
7 pages
SVM Using Python
No ratings yet
SVM Using Python
24 pages
SUpport Vector Machine
No ratings yet
SUpport Vector Machine
28 pages
Support Vector Machine Algorithm
No ratings yet
Support Vector Machine Algorithm
8 pages
K Nearest Neighbor - Step by Step Tutorial
No ratings yet
K Nearest Neighbor - Step by Step Tutorial
16 pages
Mini Project Phishing Website Detection Using ML
No ratings yet
Mini Project Phishing Website Detection Using ML
45 pages
Saurabh
No ratings yet
Saurabh
26 pages
Chapter 4 ML
No ratings yet
Chapter 4 ML
30 pages
Group 15 Report
No ratings yet
Group 15 Report
23 pages
EDAB Module 5 Singular Value Decomposition (SVD)
No ratings yet
EDAB Module 5 Singular Value Decomposition (SVD)
58 pages
Edited Version of Cardiovascular Diseases Risk Prediction Dataset Report
No ratings yet
Edited Version of Cardiovascular Diseases Risk Prediction Dataset Report
25 pages
AIML Question Bank
No ratings yet
AIML Question Bank
25 pages
Made Easy
No ratings yet
Made Easy
11 pages
Data Driven Modelling For Real-Time Flood Forecasting
No ratings yet
Data Driven Modelling For Real-Time Flood Forecasting
8 pages
Neural Networks and Statistical Models
No ratings yet
Neural Networks and Statistical Models
13 pages
FML Solution 3
No ratings yet
FML Solution 3
11 pages
10f 601 Midterm
No ratings yet
10f 601 Midterm
17 pages
Machine Learning: April 2022
No ratings yet
Machine Learning: April 2022
32 pages
CS8080 Irt Unit 3 23 24
No ratings yet
CS8080 Irt Unit 3 23 24
48 pages
Book Recommendation System
No ratings yet
Book Recommendation System
12 pages
Bana1 Visualization
No ratings yet
Bana1 Visualization
22 pages
ML Imp Ques 2
No ratings yet
ML Imp Ques 2
37 pages
Capturing Knowledge of User Preferences - Ontologies in Recommender Systems
No ratings yet
Capturing Knowledge of User Preferences - Ontologies in Recommender Systems
8 pages
Stress Detection in Computer Users From Keyboard and Mouse Dynamics
No ratings yet
Stress Detection in Computer Users From Keyboard and Mouse Dynamics
8 pages
Complexity - 2020 - Ahmad - Fake News Detection Using Machine Learning Ensemble Methods
No ratings yet
Complexity - 2020 - Ahmad - Fake News Detection Using Machine Learning Ensemble Methods
11 pages
User's Next Location Prediction Using ML Algo
No ratings yet
User's Next Location Prediction Using ML Algo
10 pages
Online Payment Fraud Detection Using Machine Learning Model
No ratings yet
Online Payment Fraud Detection Using Machine Learning Model
8 pages
Implementation of Real Time Activity Sensing
No ratings yet
Implementation of Real Time Activity Sensing
9 pages
AI Class 10 Sample Paper 3 Answer Key
No ratings yet
AI Class 10 Sample Paper 3 Answer Key
6 pages
PDF Malware Detection A Hybrid Approach Using Random Forest and K-Nearest Neighbors
No ratings yet
PDF Malware Detection A Hybrid Approach Using Random Forest and K-Nearest Neighbors
6 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet

Unit 2

Uploaded by

Unit 2

Uploaded by

K-Nearest Neighbor(KNN) Algorithm for Machine Learning

o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on

How does K-NN work?

o Step-1: Select the number K of the neighbors

How to select the value of K in the K-NN Algorithm?

Advantages of KNN Algorithm:

Disadvantages of KNN Algorithm:

Support Vector Machine Algorithm

Backward Skip 10sPlay VideoForward Skip 10s

SVM can be of two types:

Hyperplane and Support Vectors in the SVM algorithm:

Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-

How does SVM works?

Introduction to Support Vector Regression (SVR)

the prediction error.

and the target variable.

Let’s spend a few minutes understanding the idea behind SVR.

The Idea Behind Support Vector Regression

SVR works actually.

basically referred to as epsilon.

Assuming that the equation of the hyperplane is as follows:

Y = wx+b (equation of hyperplane)

Then the equations of decision boundary become:

Thus, any hyperplane that satisfies our SVR should satisfy:

-a < Y- wx+b < +a

You might also like