0% found this document useful (0 votes)

30 views7 pages

Unit 5 Learning with Algorithm

Uploaded by

riteshpc13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views7 pages

Unit 5 Learning with Algorithm

Uploaded by

riteshpc13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Unit 5 – Learning with Algorithm

K-Nearest Neighbour (KNN) Algorithm for Machine Learning:-

The K-Nearest Neighbors (KNN) algorithm is a simple, yet powerful, supervised

machine learning algorithm used for both classification and regression tasks. It is a type of
instance-based learning or lazy learning where the function is only approximated locally and
all computation is deferred until function evaluation.

Key Concepts of KNN:-

1. Basic Principle

The KNN algorithm classifies a data point based on how its neighbors are classified.
It works by finding the k closest data points (neighbors) to the input data point and making a
decision based on the majority class among the neighbors in classification or averaging the
values in regression.

2. Distance Metrics

To determine the closest neighbors, KNN relies on a distance metric to measure the
similarity between data points. Common distance metrics include:

 Euclidean Distance: The straight-line distance between two points.

d(p,q)=∑i=1n(pi−qi)2d(p, q) = \sqrt{\sum_{i=1}^{n} (p_i - q_i)^2}d(p,q)=i=1∑n(pi
−qi)2
 Manhattan Distance: The sum of the absolute differences of their coordinates.
d(p,q)=∑i=1n∣pi−qi∣d(p, q) = \sum_{i=1}^{n} |p_i - q_i|d(p,q)=i=1∑n∣pi−qi∣
 Minkowski Distance: A generalized distance metric. d(p,q)=(∑i=1n∣pi−qi∣m)1/md(p,
q) = \left( \sum_{i=1}^{n} |p_i - q_i|^m \right)^{1/m}d(p,q)=(i=1∑n∣pi−qi∣m)1/m
When m=2m = 2m=2, it becomes Euclidean distance, and when m=1m = 1m=1, it
becomes Manhattan distance.
 Cosine Similarity: Measures the cosine of the angle between two vectors.
similarity(p,q)=p⋅q∥p∥∥q∥\text{similarity}(p, q) = \frac{p \cdot q}{\|p\| \|
q\|}similarity(p,q)=∥p∥∥q∥p⋅q

3. Choosing k

The value of k (the number of neighbors) is crucial and can significantly affect the
performance of the algorithm:
 A small k may be sensitive to noise in the data.
 A large k may smooth out the predictions too much and lose important details.
 Common practice is to choose k via cross-validation.

4. Classification vs Regression

 Classification: The output is a class label. The class label is determined by the
majority vote of the nearest neighbors.
 Regression: The output is a continuous value. The value is typically the mean (or
sometimes the median) of the nearest neighbors' values.

Implementation of KNN in Python

Here’s a simple example of KNN for a classification problem using Python’s scikit-learn
library:

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report

# Sample data
data = {
'Feature1': [2, 3, 5, 7, 1, 6, 4, 8],
'Feature2': [1, 5, 8, 3, 4, 7, 2, 6],
'Label': [0, 1, 1, 0, 0, 1, 0, 1]
}

# Create DataFrame
df = pd.DataFrame(data)

# Features and target

X = df[['Feature1', 'Feature2']]
y = df['Label']

# Split the data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,
random_state=42)

# Create and train the model

knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
# Predictions
y_pred = knn.predict(X_test)

# Evaluation
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", class_report)

Explanation of the Code

1. Data Preparation:
o A sample dataset is created with two features and a binary label.
2. Feature and Target Selection:
o The features (X) and target (y) are separated.
3. Data Splitting:
o The data is split into training and test sets using a 75-25 split.
4. Model Creation and Training:
o A KNeighborsClassifier with k=3 is instantiated and trained using the
training data.
5. Predictions:
o Predictions are made on the test set.
6. Model Evaluation:
o The accuracy, confusion matrix, and classification report are calculated to
evaluate the model's performance.

Applications of KNN

 Pattern Recognition: Handwriting detection, image recognition.

 Medical Diagnosis: Classifying diseases based on symptoms.
 Recommendation Systems: Suggesting products or content based on user similarity.
 Finance: Predicting stock price movements, credit scoring.
Pros and Cons of KNN

Pros

 Simplicity: Easy to understand and implement.

 No Training Phase: All the work is done during prediction, which can be
computationally simple for small datasets.
 Adaptability: Can be used for both classification and regression tasks.

Cons

 Computationally Expensive: Prediction can be slow for large datasets since it

involves calculating the distance to all other points.
 Memory Intensive: Requires storing all training data.
 Sensitive to Irrelevant Features: Performance can degrade if irrelevant features are
present.
 Choice of k and Distance Metric: Requires careful selection of k and the distance
metric, which can be challenging.

KNN is a versatile and intuitive algorithm that can be highly effective, especially for small to
medium-sized datasets. However, its performance and efficiency can be significantly affected
by the choice of parameters and the scale of the data.

Support Vector Machine Algorithm:-

Support Vector Machine (SVM) is a powerful supervised machine learning algorithm used
for classification and regression tasks. It is particularly known for its effectiveness in high-
dimensional spaces and its ability to create a robust decision boundary between different
classes.

Key Concepts of SVM

1. Hyperplane

A hyperplane is a decision boundary that separates the data points of different classes. In a
2D space, it is a line, and in a 3D space, it is a plane. For higher-dimensional spaces, it is
called a hyperplane.

2. Support Vectors

Support vectors are the data points that are closest to the hyperplane and influence its position
and orientation. These points are critical in defining the optimal hyperplane.

3. Margin

The margin is the distance between the hyperplane and the nearest support vectors from both
classes. SVM aims to maximize this margin, ensuring that the data points are as far away
from the hyperplane as possible, leading to better generalization on new data.
4. Optimal Hyperplane

The optimal hyperplane is the one that maximizes the margin between the support vectors of
the two classes. This is also known as the maximum-margin hyperplane.

5. Soft Margin and Hard Margin

 Hard Margin: Assumes that the data is perfectly linearly separable. It tries to find a
hyperplane that completely separates the classes without any misclassification.
 Soft Margin: Allows some misclassifications to make the model more robust and
handle noisy data better. It introduces a regularization parameter (C) to control the
trade-off between maximizing the margin and minimizing the classification error.

6. Kernel Trick

When the data is not linearly separable, SVM uses the kernel trick to map the data into a
higher-dimensional space where it becomes linearly separable. Common kernels include:

 Linear Kernel: No transformation, used when data is linearly separable.

 Polynomial Kernel: Maps data to a higher-dimensional space using polynomial
functions.
 Radial Basis Function (RBF) Kernel / Gaussian Kernel: Maps data to an infinite-
dimensional space, effective for non-linear data.
 Sigmoid Kernel: Similar to neural networks.

Mathematical Formulation
For a binary classification problem, the decision function of SVM can be represented as:

f(x)=w⋅x+bf(x) = w \cdot x + bf(x)=w⋅x+b

where:

 www is the weight vector.

 xxx is the input feature vector.
 bbb is the bias term.

The optimization objective is to maximize the margin by solving:

min⁡12∥w∥2\min \frac{1}{2} \|w\|^2min21∥w∥2 subject to the constraint:

yi(w⋅xi+b)≥1y_i (w \cdot x_i + b) \geq 1yi(w⋅xi+b)≥1

∈{−1,+1} are the class labels.

for all training samples (xi,yi)(x_i, y_i)(xi,yi), where yi∈{−1,+1}y_i \in \{ -1, +1 \}yi

In the soft margin formulation, the optimization objective includes a regularization term to
penalize misclassifications:

min⁡12∥w∥2+C∑i=1nξi\min \frac{1}{2} \|w\|^2 + C \sum_{i=1}^n \xi_imin21∥w∥2+C∑i=1n

ξi subject to the constraint:

yi(w⋅xi+b)≥1−ξiy_i (w \cdot x_i + b) \geq 1 - \xi_iyi(w⋅xi+b)≥1−ξi ξi≥0\xi_i \geq 0ξi≥0

where ξi\xi_iξi are slack variables that allow for misclassification, and CCC is the
regularization parameter.
Implementation of SVM in Python
Here's an example of using SVM for classification with Python's scikit-learn library:

python
Copy code
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report

# Sample data
data = {
'Feature1': [2, 3, 5, 7, 1, 6, 4, 8],
'Feature2': [1, 5, 8, 3, 4, 7, 2, 6],
'Label': [0, 1, 1, 0, 0, 1, 0, 1]
}

# Create DataFrame
df = pd.DataFrame(data)

# Features and target

X = df[['Feature1', 'Feature2']]
y = df['Label']

# Split the data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,
random_state=42)

# Create and train the model

svm = SVC(kernel='linear', C=1.0)
svm.fit(X_train, y_train)

# Predictions
y_pred = svm.predict(X_test)

# Evaluation
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", class_report)

Explanation of the Code

1. Data Preparation:
o A sample dataset is created with two features and a binary label.
2. Feature and Target Selection:
o The features (X) and target (y) are separated.
3. Data Splitting:
o The data is split into training and test sets using a 75-25 split.
4. Model Creation and Training:
o A SVC (Support Vector Classifier) with a linear kernel is instantiated and
trained using the training data.
5. Predictions:
o Predictions are made on the test set.
6. Model Evaluation:
o The accuracy, confusion matrix, and classification report are calculated to
evaluate the model's performance.

Applications of SVM
 Text Classification: Spam detection, sentiment analysis.
 Image Classification: Object detection, face recognition.
 Bioinformatics: Protein classification, cancer detection.
 Finance: Credit risk assessment, fraud detection.

Pros and Cons of SVM

Pros

 Effective in High-Dimensional Spaces: SVM performs well when the number of

features is large.
 Robust to Overfitting: Especially in high-dimensional space, provided proper
regularization.
 Versatile: Can be used for both classification and regression tasks, and with various
kernel functions.

Cons

 Computationally Expensive: Especially with large datasets, the training time can be
significant.
 Memory Intensive: Requires storing the entire dataset, which can be problematic
with large datasets.
 Sensitive to Noise: Particularly in the case of overlapping classes, SVM can be
sensitive to outliers.

SVM is a versatile and powerful algorithm that can be highly effective in various
classification and regression tasks, especially when the data is high-dimensional and
separable with an appropriate kernel function. However, it requires careful tuning of
parameters and is computationally intensive for large datasets.

The Bridge, May 21, 2015
No ratings yet
The Bridge, May 21, 2015
28 pages
-Updated K-Nearest Neighbors in Machine Learning
No ratings yet
-Updated K-Nearest Neighbors in Machine Learning
11 pages
Sport Administration Thesis Topics
100% (3)
Sport Administration Thesis Topics
9 pages
41.is25 - General Guidance On Shielding Requirements For Medical X-Ray Diagnostic Rooms
100% (1)
41.is25 - General Guidance On Shielding Requirements For Medical X-Ray Diagnostic Rooms
2 pages
Kassahun Thesis Updated 1
No ratings yet
Kassahun Thesis Updated 1
83 pages
KNN
No ratings yet
KNN
29 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning - Javatpoint
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning - Javatpoint
18 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
17 pages
Knn
No ratings yet
Knn
11 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
Knowledge and Inquiry H2
No ratings yet
Knowledge and Inquiry H2
19 pages
Excel Blog Workbooks
No ratings yet
Excel Blog Workbooks
71 pages
GIKI Prospectus
No ratings yet
GIKI Prospectus
133 pages
ML Supervised Learning Unit 3
No ratings yet
ML Supervised Learning Unit 3
51 pages
Supervised Classification Notes
No ratings yet
Supervised Classification Notes
31 pages
Lesson Plan: A Guide To Editing: Essential Knowledge and Skills (TEKS Standards) Proofreading Marks Chart
No ratings yet
Lesson Plan: A Guide To Editing: Essential Knowledge and Skills (TEKS Standards) Proofreading Marks Chart
2 pages
(Qualitative Research Methods) Felice D. Billups - Qualitative Data Collection Tools_ Design, Development, And Applications-SAGE Publications, Inc (2020)
No ratings yet
(Qualitative Research Methods) Felice D. Billups - Qualitative Data Collection Tools_ Design, Development, And Applications-SAGE Publications, Inc (2020)
240 pages
Presentation UNIT-2
No ratings yet
Presentation UNIT-2
96 pages
Subject Report
No ratings yet
Subject Report
29 pages
ML unit-2 (CEC)
No ratings yet
ML unit-2 (CEC)
96 pages
Supervised Learning - SVM - DT
No ratings yet
Supervised Learning - SVM - DT
43 pages
Machine Learning
No ratings yet
Machine Learning
15 pages
UNIT 3 - Final
No ratings yet
UNIT 3 - Final
37 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
Ml 7th Sem Aiml Ite Notes Complete Long[1]-63-155
No ratings yet
Ml 7th Sem Aiml Ite Notes Complete Long[1]-63-155
93 pages
G8 T2 Int Sci Updated Sow
No ratings yet
G8 T2 Int Sci Updated Sow
6 pages
Unit 5
No ratings yet
Unit 5
28 pages
Super Duper Not Final Isa Pa Gid Ni Sa
100% (2)
Super Duper Not Final Isa Pa Gid Ni Sa
30 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
ML04_KNN-SVM_2024-2025
No ratings yet
ML04_KNN-SVM_2024-2025
57 pages
2016 Newsletter
No ratings yet
2016 Newsletter
17 pages
Presentation UNIT-2(Old)
No ratings yet
Presentation UNIT-2(Old)
58 pages
Cii Project
No ratings yet
Cii Project
45 pages
Article Critique
No ratings yet
Article Critique
3 pages
KMEANS
No ratings yet
KMEANS
9 pages
English Literature Project-Thank You Ma'am
No ratings yet
English Literature Project-Thank You Ma'am
9 pages
Vijay 20 CV
No ratings yet
Vijay 20 CV
2 pages
Indigenizing The Educational System of Ethiopia Mekele University
No ratings yet
Indigenizing The Educational System of Ethiopia Mekele University
7 pages
Unit 2
No ratings yet
Unit 2
16 pages
Lecture Week 2 KNN and Model Evaluation PDF
100% (1)
Lecture Week 2 KNN and Model Evaluation PDF
53 pages
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
No ratings yet
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
32 pages
ML 4 (1)
No ratings yet
ML 4 (1)
33 pages
Lesson 3 Text Painting
No ratings yet
Lesson 3 Text Painting
3 pages
Experiment 2.2 KNN Classifier
No ratings yet
Experiment 2.2 KNN Classifier
7 pages
02-knn Notes
No ratings yet
02-knn Notes
23 pages
ML-Unit 5
No ratings yet
ML-Unit 5
40 pages
Document
No ratings yet
Document
6 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
23 pages
BL Lesson Plan Template
No ratings yet
BL Lesson Plan Template
3 pages
Module Iii
No ratings yet
Module Iii
15 pages
U3 KNN
No ratings yet
U3 KNN
6 pages
Notes: KNN: K-Nearest Neighbors
No ratings yet
Notes: KNN: K-Nearest Neighbors
4 pages
ML UNIT 5..
No ratings yet
ML UNIT 5..
40 pages
FPA unit 2
No ratings yet
FPA unit 2
20 pages
Poster Reserch Hala Yehia
No ratings yet
Poster Reserch Hala Yehia
1 page
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
No ratings yet
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
13 pages
Sayan Das - Machine Learning
No ratings yet
Sayan Das - Machine Learning
4 pages
KNN
No ratings yet
KNN
53 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
Lesson 2 Technical Writing
No ratings yet
Lesson 2 Technical Writing
7 pages
Teachers
No ratings yet
Teachers
1 page
'Machine Learning (Nagarjun)
No ratings yet
'Machine Learning (Nagarjun)
10 pages
ML-KN
No ratings yet
ML-KN
12 pages
ML UNIT-2
No ratings yet
ML UNIT-2
33 pages
DSV ia2
No ratings yet
DSV ia2
18 pages
Slide 2 ML Basics
No ratings yet
Slide 2 ML Basics
42 pages
Alston Happy Science Grade 5
No ratings yet
Alston Happy Science Grade 5
10 pages
5. K-Nearest Neighbors
No ratings yet
5. K-Nearest Neighbors
35 pages
ch7
No ratings yet
ch7
33 pages
3.1 K Nearest Neighbour Classifier (1)
No ratings yet
3.1 K Nearest Neighbour Classifier (1)
24 pages
KNN - Algorithm - SVM - Algorithm
No ratings yet
KNN - Algorithm - SVM - Algorithm
27 pages
Data Science Unit 3 (1) - Copy
No ratings yet
Data Science Unit 3 (1) - Copy
33 pages
16. K- Nearest Neighbor
No ratings yet
16. K- Nearest Neighbor
22 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
K_Nearest_Neighbour_Classifier
No ratings yet
K_Nearest_Neighbour_Classifier
24 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
FML - |||
No ratings yet
FML - |||
7 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
PracResearch2 - Grade 12 - Q3 - Mod3 - Conceptual Framework and Review of Related Literature - Version4
No ratings yet
PracResearch2 - Grade 12 - Q3 - Mod3 - Conceptual Framework and Review of Related Literature - Version4
51 pages
Chapter 6 ML Classifications
No ratings yet
Chapter 6 ML Classifications
51 pages
Yunsu Han KNN K Means
No ratings yet
Yunsu Han KNN K Means
8 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
Machine Lar Arii
No ratings yet
Machine Lar Arii
9 pages
Professional Growth Plan Student Teaching
No ratings yet
Professional Growth Plan Student Teaching
3 pages
Communicative Language Teaching (CLT) : Porntip Bodeepongse
No ratings yet
Communicative Language Teaching (CLT) : Porntip Bodeepongse
7 pages
Resume of Zahirul Islam
No ratings yet
Resume of Zahirul Islam
3 pages
UI UX Design
100% (1)
UI UX Design
9 pages
Ajit Kumar 8899739001 C-64sanjay Gram Gurgaon: Objective
No ratings yet
Ajit Kumar 8899739001 C-64sanjay Gram Gurgaon: Objective
5 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Unit 5 Learning with Algorithm

Uploaded by

Unit 5 Learning with Algorithm

Uploaded by

Unit 5 – Learning with Algorithm

K-Nearest Neighbour (KNN) Algorithm for Machine Learning:-

The K-Nearest Neighbors (KNN) algorithm is a simple, yet powerful, supervised

Key Concepts of KNN:-

 Euclidean Distance: The straight-line distance between two points.

Implementation of KNN in Python

# Features and target

# Split the data

# Create and train the model

Explanation of the Code

 Pattern Recognition: Handwriting detection, image recognition.

 Simplicity: Easy to understand and implement.

 Computationally Expensive: Prediction can be slow for large datasets since it

Support Vector Machine Algorithm:-

Key Concepts of SVM

5. Soft Margin and Hard Margin

 Linear Kernel: No transformation, used when data is linearly separable.

f(x)=w⋅x+bf(x) = w \cdot x + bf(x)=w⋅x+b

 www is the weight vector.

The optimization objective is to maximize the margin by solving:

min⁡12∥w∥2\min \frac{1}{2} \|w\|^2min21∥w∥2 subject to the constraint:

yi(w⋅xi+b)≥1y_i (w \cdot x_i + b) \geq 1yi(w⋅xi+b)≥1

∈{−1,+1} are the class labels.

min⁡12∥w∥2+C∑i=1nξi\min \frac{1}{2} \|w\|^2 + C \sum_{i=1}^n \xi_imin21∥w∥2+C∑i=1n

yi(w⋅xi+b)≥1−ξiy_i (w \cdot x_i + b) \geq 1 - \xi_iyi(w⋅xi+b)≥1−ξi ξi≥0\xi_i \geq 0ξi≥0

# Features and target

# Split the data

# Create and train the model

Explanation of the Code

Pros and Cons of SVM

 Effective in High-Dimensional Spaces: SVM performs well when the number of

You might also like