0% found this document useful (0 votes)
18 views

Unit 5 Learning with Algorithm

Uploaded by

riteshpc13
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Unit 5 Learning with Algorithm

Uploaded by

riteshpc13
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Unit 5 – Learning with Algorithm

K-Nearest Neighbour (KNN) Algorithm for Machine Learning:-

The K-Nearest Neighbors (KNN) algorithm is a simple, yet powerful, supervised


machine learning algorithm used for both classification and regression tasks. It is a type of
instance-based learning or lazy learning where the function is only approximated locally and
all computation is deferred until function evaluation.

Key Concepts of KNN:-

1. Basic Principle

The KNN algorithm classifies a data point based on how its neighbors are classified.
It works by finding the k closest data points (neighbors) to the input data point and making a
decision based on the majority class among the neighbors in classification or averaging the
values in regression.

2. Distance Metrics

To determine the closest neighbors, KNN relies on a distance metric to measure the
similarity between data points. Common distance metrics include:

 Euclidean Distance: The straight-line distance between two points.


d(p,q)=∑i=1n(pi−qi)2d(p, q) = \sqrt{\sum_{i=1}^{n} (p_i - q_i)^2}d(p,q)=i=1∑n(pi
−qi)2
 Manhattan Distance: The sum of the absolute differences of their coordinates.
d(p,q)=∑i=1n∣pi−qi∣d(p, q) = \sum_{i=1}^{n} |p_i - q_i|d(p,q)=i=1∑n∣pi−qi∣
 Minkowski Distance: A generalized distance metric. d(p,q)=(∑i=1n∣pi−qi∣m)1/md(p,
q) = \left( \sum_{i=1}^{n} |p_i - q_i|^m \right)^{1/m}d(p,q)=(i=1∑n∣pi−qi∣m)1/m
When m=2m = 2m=2, it becomes Euclidean distance, and when m=1m = 1m=1, it
becomes Manhattan distance.
 Cosine Similarity: Measures the cosine of the angle between two vectors.
similarity(p,q)=p⋅q∥p∥∥q∥\text{similarity}(p, q) = \frac{p \cdot q}{\|p\| \|
q\|}similarity(p,q)=∥p∥∥q∥p⋅q

3. Choosing k

The value of k (the number of neighbors) is crucial and can significantly affect the
performance of the algorithm:
 A small k may be sensitive to noise in the data.
 A large k may smooth out the predictions too much and lose important details.
 Common practice is to choose k via cross-validation.

4. Classification vs Regression

 Classification: The output is a class label. The class label is determined by the
majority vote of the nearest neighbors.
 Regression: The output is a continuous value. The value is typically the mean (or
sometimes the median) of the nearest neighbors' values.

Implementation of KNN in Python

Here’s a simple example of KNN for a classification problem using Python’s scikit-learn
library:

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report

# Sample data
data = {
'Feature1': [2, 3, 5, 7, 1, 6, 4, 8],
'Feature2': [1, 5, 8, 3, 4, 7, 2, 6],
'Label': [0, 1, 1, 0, 0, 1, 0, 1]
}

# Create DataFrame
df = pd.DataFrame(data)

# Features and target


X = df[['Feature1', 'Feature2']]
y = df['Label']

# Split the data


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,
random_state=42)

# Create and train the model


knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
# Predictions
y_pred = knn.predict(X_test)

# Evaluation
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", class_report)

Explanation of the Code

1. Data Preparation:
o A sample dataset is created with two features and a binary label.
2. Feature and Target Selection:
o The features (X) and target (y) are separated.
3. Data Splitting:
o The data is split into training and test sets using a 75-25 split.
4. Model Creation and Training:
o A KNeighborsClassifier with k=3 is instantiated and trained using the
training data.
5. Predictions:
o Predictions are made on the test set.
6. Model Evaluation:
o The accuracy, confusion matrix, and classification report are calculated to
evaluate the model's performance.

Applications of KNN

 Pattern Recognition: Handwriting detection, image recognition.


 Medical Diagnosis: Classifying diseases based on symptoms.
 Recommendation Systems: Suggesting products or content based on user similarity.
 Finance: Predicting stock price movements, credit scoring.
Pros and Cons of KNN

Pros

 Simplicity: Easy to understand and implement.


 No Training Phase: All the work is done during prediction, which can be
computationally simple for small datasets.
 Adaptability: Can be used for both classification and regression tasks.

Cons

 Computationally Expensive: Prediction can be slow for large datasets since it


involves calculating the distance to all other points.
 Memory Intensive: Requires storing all training data.
 Sensitive to Irrelevant Features: Performance can degrade if irrelevant features are
present.
 Choice of k and Distance Metric: Requires careful selection of k and the distance
metric, which can be challenging.

KNN is a versatile and intuitive algorithm that can be highly effective, especially for small to
medium-sized datasets. However, its performance and efficiency can be significantly affected
by the choice of parameters and the scale of the data.

Support Vector Machine Algorithm:-

Support Vector Machine (SVM) is a powerful supervised machine learning algorithm used
for classification and regression tasks. It is particularly known for its effectiveness in high-
dimensional spaces and its ability to create a robust decision boundary between different
classes.

Key Concepts of SVM


1. Hyperplane

A hyperplane is a decision boundary that separates the data points of different classes. In a
2D space, it is a line, and in a 3D space, it is a plane. For higher-dimensional spaces, it is
called a hyperplane.

2. Support Vectors

Support vectors are the data points that are closest to the hyperplane and influence its position
and orientation. These points are critical in defining the optimal hyperplane.

3. Margin

The margin is the distance between the hyperplane and the nearest support vectors from both
classes. SVM aims to maximize this margin, ensuring that the data points are as far away
from the hyperplane as possible, leading to better generalization on new data.
4. Optimal Hyperplane

The optimal hyperplane is the one that maximizes the margin between the support vectors of
the two classes. This is also known as the maximum-margin hyperplane.

5. Soft Margin and Hard Margin

 Hard Margin: Assumes that the data is perfectly linearly separable. It tries to find a
hyperplane that completely separates the classes without any misclassification.
 Soft Margin: Allows some misclassifications to make the model more robust and
handle noisy data better. It introduces a regularization parameter (C) to control the
trade-off between maximizing the margin and minimizing the classification error.

6. Kernel Trick

When the data is not linearly separable, SVM uses the kernel trick to map the data into a
higher-dimensional space where it becomes linearly separable. Common kernels include:

 Linear Kernel: No transformation, used when data is linearly separable.


 Polynomial Kernel: Maps data to a higher-dimensional space using polynomial
functions.
 Radial Basis Function (RBF) Kernel / Gaussian Kernel: Maps data to an infinite-
dimensional space, effective for non-linear data.
 Sigmoid Kernel: Similar to neural networks.

Mathematical Formulation
For a binary classification problem, the decision function of SVM can be represented as:

f(x)=w⋅x+bf(x) = w \cdot x + bf(x)=w⋅x+b

where:

 www is the weight vector.


 xxx is the input feature vector.
 bbb is the bias term.

The optimization objective is to maximize the margin by solving:

min⁡12∥w∥2\min \frac{1}{2} \|w\|^2min21∥w∥2 subject to the constraint:

yi(w⋅xi+b)≥1y_i (w \cdot x_i + b) \geq 1yi(w⋅xi+b)≥1

∈{−1,+1} are the class labels.


for all training samples (xi,yi)(x_i, y_i)(xi,yi), where yi∈{−1,+1}y_i \in \{ -1, +1 \}yi

In the soft margin formulation, the optimization objective includes a regularization term to
penalize misclassifications:

min⁡12∥w∥2+C∑i=1nξi\min \frac{1}{2} \|w\|^2 + C \sum_{i=1}^n \xi_imin21∥w∥2+C∑i=1n


ξi subject to the constraint:

yi(w⋅xi+b)≥1−ξiy_i (w \cdot x_i + b) \geq 1 - \xi_iyi(w⋅xi+b)≥1−ξi ξi≥0\xi_i \geq 0ξi≥0

where ξi\xi_iξi are slack variables that allow for misclassification, and CCC is the
regularization parameter.
Implementation of SVM in Python
Here's an example of using SVM for classification with Python's scikit-learn library:

python
Copy code
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report

# Sample data
data = {
'Feature1': [2, 3, 5, 7, 1, 6, 4, 8],
'Feature2': [1, 5, 8, 3, 4, 7, 2, 6],
'Label': [0, 1, 1, 0, 0, 1, 0, 1]
}

# Create DataFrame
df = pd.DataFrame(data)

# Features and target


X = df[['Feature1', 'Feature2']]
y = df['Label']

# Split the data


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,
random_state=42)

# Create and train the model


svm = SVC(kernel='linear', C=1.0)
svm.fit(X_train, y_train)

# Predictions
y_pred = svm.predict(X_test)

# Evaluation
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", class_report)

Explanation of the Code

1. Data Preparation:
o A sample dataset is created with two features and a binary label.
2. Feature and Target Selection:
o The features (X) and target (y) are separated.
3. Data Splitting:
o The data is split into training and test sets using a 75-25 split.
4. Model Creation and Training:
o A SVC (Support Vector Classifier) with a linear kernel is instantiated and
trained using the training data.
5. Predictions:
o Predictions are made on the test set.
6. Model Evaluation:
o The accuracy, confusion matrix, and classification report are calculated to
evaluate the model's performance.

Applications of SVM
 Text Classification: Spam detection, sentiment analysis.
 Image Classification: Object detection, face recognition.
 Bioinformatics: Protein classification, cancer detection.
 Finance: Credit risk assessment, fraud detection.

Pros and Cons of SVM


Pros

 Effective in High-Dimensional Spaces: SVM performs well when the number of


features is large.
 Robust to Overfitting: Especially in high-dimensional space, provided proper
regularization.
 Versatile: Can be used for both classification and regression tasks, and with various
kernel functions.

Cons

 Computationally Expensive: Especially with large datasets, the training time can be
significant.
 Memory Intensive: Requires storing the entire dataset, which can be problematic
with large datasets.
 Sensitive to Noise: Particularly in the case of overlapping classes, SVM can be
sensitive to outliers.

SVM is a versatile and powerful algorithm that can be highly effective in various
classification and regression tasks, especially when the data is high-dimensional and
separable with an appropriate kernel function. However, it requires careful tuning of
parameters and is computationally intensive for large datasets.

You might also like