0% found this document useful (0 votes)

7 views18 pages

Unit 5-6

The document provides an overview of three machine learning algorithms: Naive Bayes, K-Nearest Neighbors (KNN), and Support Vector Machines (SVM). It details the principles, advantages, disadvantages, and applications of Naive Bayes, including its types and the zero-frequency problem, as well as KNN's methodology, distance metrics, and challenges like the curse of dimensionality. Additionally, it discusses cross-validation techniques and methods for handling categorical data in machine learning.

Uploaded by

1499 Renuka Sarvade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views18 pages

Unit 5-6

Uploaded by

1499 Renuka Sarvade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Unit No 5: Naive Bayes, KNN and SVM

Naive Bayes:

The Naive Bayes algorithm is a simple and powerful probabilistic machine learning classifier based on
Bayes' Theorem. It is particularly well-suited for classification tasks, especially in text classification, spam
detection, sentiment analysis, and recommendation systems. "Naive" refers to the assumption that features
are independent of each other, given the class label, which rarely holds true in real-world data but often
works surprisingly well in practice.

The Naive Bayes algorithm is based on Bayes' Theorem, which describes the probability of an event
occurring given prior knowledge of related events. It provides you with a way of calculating the probability
of a hypothesis with the provided evidence. The formula for Bayes' Theorem is:

Types of Naive Bayes Classifiers

1. Gaussian Naive Bayes: Assumes that features follow a normal (Gaussian) distribution, often used
when features are continuous. It's ideal when the data is normally distributed.
2. Multinomial Naive Bayes: Assumes feature vectors represent counts or frequency data, commonly
used for document classification where word counts or term frequencies are the features.
3. Bernoulli Naive Bayes: Assumes binary/Boolean features, where each feature is either "on" or "off."
It’s useful in text classification with binary features (e.g., whether a word appears or not in a
document).
The likelihood of the features is assumed to be Gaussian, hence, conditional probability is given by:

Consider the problem of playing golf, here the only predictor is Humidity and Play Golf? is the target. Using
the above formula we can calculate posterior probability if we know the mean and standard deviation.
The Zero-Frequency Problem:

One of the disadvantages of Naïve-Bayes is that if you have no occurrences of a class label and a certain
attribute value together then the frequency-based probability estimate will be zero. And this will get a zero
when all the probabilities are multiplied.
An approach to overcome this ‘zero-frequency problem’ in a Bayesian environment is to add one to the
count for every attribute value-class combination when an attribute value doesn’t occur with every class
value.
For example, say your training data looked like this:

P(TimeZone= US | Spam=yes)= 10/10 = 1

P(TimeZone= EU | Spam=yes )= 0/10 = 0

Then you should add one to every value in this table when you’re using it to calculate probabilities:

P(TimeZone= US | Spam=yes)= 11/12

P(TimeZone= EU | Spam=yes )= 1/12

This is how we’ll get rid of getting a zero probability.

Advantages of a Naive Bayes Classifier
• It doesn’t require larger amounts of training data.
• It is straightforward to implement.
• Convergence is quicker than other models, which are discriminative.
• It is highly scalable with several data points and predictors.
• It can handle both continuous and categorical data.
• It is not sensitive to irrelevant data and doesn’t follow the assumptions it holds.
• It is used in real-time predictions.

Disadvantages of a Naive Bayes Classifier

• The Naive Bayes Algorithm has trouble with the ‘zero-frequency problem’. It happens when you
assign zero probability for categorical variables in the training dataset that is not available. When you
use a smooth method for overcoming this problem, you can make it work the best.
• It will assume that all the attributes are independent, which rarely happens in real life. It will limit the
application of this algorithm in real-world situations.
• It will estimate things wrong sometimes, so you shouldn’t take its probability outputs seriously.

The Naive Bayes Algorithm is used for various real-world problems like those below:
• Text classification: The Naive Bayes Algorithm is used as a probabilistic learning technique for text
classification. It is one of the best-known algorithms used for document classification of one or many
classes.
• Sentiment analysis: The Naive Bayes Algorithm is used to analyse sentiments or feelings, whether
positive, neutral, or negative.
• Recommendation system: The Naive Bayes Algorithm is a collection of collaborative filtering issued
for building hybrid recommendation systems that assist you in predicting whether a user will receive
any resource.
• Spam filtering: It is also similar to the text classification process. It is popular for helping you
determine if the mail you receive is spam.
• Medical diagnosis: This algorithm is used in medical diagnosis and helps you to predict the patient’s
risk level for certain diseases.
• Weather prediction: You can use this algorithm to predict whether the weather will be good.
• Face recognition: This helps you identify faces.
Naïve Bayes Example

The dataset is represented as below.

Concerning our dataset, the concept of assumptions made by the algorithm can be understood as:
• We assume that no pair of features are dependent. For example, the color being ‘Red’ has nothing to
do with the Type or the Origin of the car. Hence, the features are assumed to be Independent.
• Secondly, each feature is given the same influence(or importance). For example, knowing the only
Color and Type alone can’t predict the outcome perfectly. So none of the attributes are irrelevant and
assumed to be contributing Equally to the outcome.

Note: The assumptions made by Naïve Bayes are generally not correct in real-world situations. The
independence assumption is never correct but often works well in practice.

According to this example, Bayes theorem can be rewritten as:

The variable y is the class variable(stolen?), which represents if the car is stolen or not given the conditions.
Variable X represents the parameters/features.
X is given as,

Here x1, x2…, xn represent the features, i.e they can be mapped to Color, Type, and Origin. By substituting
for X and expanding using the chain rule we get,
Now, you can obtain the values for each by looking at the dataset and substitute them into the equation. For
all entries in the dataset, the denominator does not change, it remains static. Therefore, the denominator can
be removed and proportionality can be injected.

In our case, the class variable(y) has only two outcomes, yes or no. There could be cases where the
classification could be multivariate. Therefore, we have to find the class variable(y) with maximum
probability.

Using the above function, we can obtain the class, given the predictors/features.
Below are the Frequency and likelihood tables for all three predictors.

Frequency and Likelihood tables of ‘Color’

Frequency and Likelihood tables of ‘Type’

Frequency and Likelihood tables of ‘Origin’

So in our example, we have 3 predictors X.

As per the equations discussed above, we can calculate the posterior probability

P (Yes | X) as :

P (No | X ) as:

Since 0.144 > 0.048,

Which means given the features RED SUV and Domestic, our example gets classified as ’NO’ the car is
not stolen.
K-Nearest Neighbors (KNN):

The K-Nearest Neighbors (KNN) algorithm is a simple, supervised machine learning technique used for
both classification and regression tasks. KNN is a non-parametric and instance-based learning
algorithm, which means it does not make any assumptions about the data distribution and stores all available
cases. When it makes predictions, it uses the similarity (distance) between data points to classify or predict
the target value.

In KNN:
• We choose a value for K, the number of neighbors to consider.
• For classification, KNN assigns a class based on the majority class among the K nearest neighbors.
• For regression, KNN assigns an average value from the K nearest neighbors.

KNN is the most commonly used and one of the simplest algorithms for finding patterns in classification
and regression problems. It is an unsupervised algorithm and also known as lazy learning algorithm. It
works by calculating the distance of 1 test observation from all the observation of the training dataset and
then finding K nearest neighbours of it.

Distance Metrics:
As we know that the KNN algorithm helps us identify the nearest points or the groups for a query point. But
to determine the closest groups or the nearest points for a query point we need some metric. For this
purpose, we use below distance metrics:

Minkowski Distance –

It is a metric intended for real-valued vector spaces. We can calculate Minkowski distance only in a normed
vector space, which means in a space where distances can be represented as a vector that has a length and
the lengths cannot be negative.

The p value in the formula can be manipulated to give us different distances like:
• p = 1, when p is set to 1 we get Manhattan distance
• p = 2, when p is set to 2 we get Euclidean distance
Manhattan Distance –

This distance is also known as taxicab distance or city block distance, that is because the way this distance is
calculated. The distance between two points is the sum of the absolute differences of their Cartesian
coordinates.

As we know we get the formula for Manhattan distance by substituting p=1 in the Minkowski distance
formula.

Suppose we have two points as shown in the image the red(4,4) and the green(1,1).
We will get, d = |4-1| + |4-1| = 6
This distance is preferred over Euclidean distance when we have a case of high dimensionality.

Euclidean Distance –

This distance is the most widely used one as it is the default metric that SKlearn library of Python uses for
K-Nearest Neighbour. It is a measure of the true straight line distance between two points
in Euclidean space.

It can be used by setting the value of p equal to 2 in Minkowski distance metric.

Now suppose we have two point the red (4,4) and the green (1,1).
And now we have to calculate the distance using Euclidean distance metric.
We will get, 4.24
Cross-Validation:

Cross-validation is a technique used to assess the performance of a machine learning model by dividing the
dataset into multiple subsets, or "folds," to ensure that the model generalizes well to unseen data. It’s a
reliable way to test model accuracy and stability, particularly in cases where data is limited.
Cross-validation helps:
• Avoid overfitting by ensuring the model is evaluated on data it hasn't seen.
• Provide a more robust estimate of model performance compared to a single train-test split.
• Enable tuning hyperparameters (such as K in KNN) based on average performance across folds.

Types of Cross-Validation

1. k-Fold Cross-Validation
o The dataset is split into k equal-sized folds.
o The model is trained on k-1 folds and tested on the remaining fold.
o This process repeats k times, with each fold serving as the test set once.
o The final performance is the average of the scores across all folds.
For example, in 5-fold cross-validation, the data is split into 5 parts; each time, 4 parts are used for
training and 1 part for testing, until every part has served as the test set.

2. Stratified k-Fold Cross-Validation

o Similar to k-Fold but ensures each fold has the same proportion of class labels as the
original dataset.
o This is especially useful for imbalanced datasets, ensuring each fold represents the overall
distribution.

3. Leave-One-Out Cross-Validation (LOOCV)

o A special case where k equals the number of observations in the dataset.
o Each observation is used as a test set once, and the model is trained on the remaining data.
o LOOCV provides a nearly unbiased estimate of performance but can be computationally
expensive.

4. Time Series Cross-Validation

o For time series data, splitting must respect the temporal order.
o In each fold, earlier observations are used to predict later observations to avoid "future data
leakage."

Optimal K:
Optimal K balances model bias and variance, making the model more generalizable to new data.

The choice of K affects the KNN model's performance:

• Small K values (e.g., 1 or 2) may result in high variance (overfitting), as the model may react
strongly to individual data points.
• Large K values (e.g., 15 or 20) can result in high bias (underfitting), as the model becomes too
generalized and loses the ability to differentiate classes well.
Curse of Dimensionality:

It refers to the various challenges that arise when working with high-dimensional data (i.e., data with a large
number of features or dimensions). This concept is particularly relevant in algorithms like K-Nearest
Neighbors (KNN), which rely on distance metrics to make predictions. As the number of dimensions
increases, the volume of the feature space grows exponentially, and data points become sparse, leading to
several issues.

Key Challenges of the Curse of Dimensionality

1. Increased Sparsity of Data Points
o In high-dimensional space, data points are spread out and become more sparse. This sparsity
can make it difficult for algorithms to identify meaningful patterns or clusters, as all points
may appear to be equidistant from each other, reducing the relevance of the nearest
neighbors.
2. Distance Metrics Become Less Informative
o Algorithms like KNN rely on distance calculations (e.g., Euclidean distance) to identify
nearby neighbors. However, in high dimensions, distances between points become similar,
making it harder to differentiate between close and distant points. This can lead to poor
predictions and loss of model accuracy.
3. Exponential Growth in Computational Complexity
o As dimensions increase, the number of possible combinations of feature values grows
exponentially. This requires more computational resources to store and process data, making
high-dimensional datasets computationally expensive to handle and slowing down model
training and prediction times.
4. Risk of Overfitting
o In high-dimensional datasets, models can easily overfit because they capture noise in the data
instead of generalizable patterns. Each dimension adds a degree of freedom, increasing the
risk that the model fits random variations specific to the training data, leading to poor
generalization on new data.
5. Difficulty in Visualizing Data
o With more than three dimensions, it becomes challenging to visualize data relationships.
Visual exploration techniques like scatter plots and histograms no longer apply, limiting the
analyst's ability to intuitively understand data patterns.

Handling categorical data

It is an essential step in data preprocessing, especially for machine learning algorithms that require
numerical input, such as K-Nearest Neighbors (KNN) and many other models. Categorical data consists of
values that represent different categories or groups, which can be nominal (no order) or ordinal (with an
order). Here's how you can handle categorical data effectively.

1. Encoding Categorical Variables

The most common approach to handling categorical data is to convert it into numerical form through
encoding. There are several encoding methods, each suited to different types of categorical data.

A. One-Hot Encoding
• Best for: Nominal categories with no inherent order (e.g., color: red, blue, green).
• Description: One-hot encoding creates binary columns for each category. For example, a “color”
column with values “red,” “blue,” and “green” would be converted into three binary columns:
color_red, color_blue, and color_green.

B. Label Encoding
• Best for: Ordinal categories with a meaningful order (e.g., rating: low, medium, high).
• Description: Label encoding assigns an integer to each category based on its order. For example, the
categories “low,” “medium,” and “high” could be encoded as 0, 1, and 2, respectively.

C. Target Encoding
• Best for: High-cardinality categorical features in regression problems.
• Description: Target encoding replaces categories with the mean value of the target variable for each
category. It is popular in cases where categorical variables have many unique values (e.g., zip codes).
• Implementation: Target encoding is not directly available in Scikit-Learn but can be implemented
using libraries like Category Encoders.

D. Binary Encoding
• Best for: High-cardinality categorical features in both classification and regression.
• Description: Binary encoding converts categories into binary digits, which can be efficient for
categorical variables with a large number of levels. This method reduces dimensionality more
effectively than one-hot encoding.
• Implementation: Also available in the Category Encoders library.

Pros of KNN algorithm

• KNN is known for its simplicity, comprehensibility, and scalability. Learning and implementation
are extremely simple and intuitive.
• It is easy to interpret. The mathematical computations are easy to comprehend and understand.
• The calculation time is less.
• Predictive power is very high, which makes it effective and efficient.
• KNN is very effective for large training sets.
• It is very useful for nonlinear data because this algorithm has no assumptions about data.
• It is a versatile algorithm as we can use it for classification and regression.
• It has relatively high accuracy.

Cons of KNN algorithm

• KNN can be expensive in determining K if the dataset is large. It requires more memory storage than
an effective classifier or supervised learning algorithms.
• In KNN, the prediction phase is slow for a larger dataset. The computation of accurate distances
plays a big role in determining the algorithm’s accuracy.
• One of the major steps in KNN is determining the parameter K. Sometimes, it is unclear which type
of distance to use and which feature will give the best result.
• It is very sensitive to the data’s scale and irrelevant features. Irrelevant or correlated features have a
high impact and must be eliminated.
SVM:

“Support Vector Machine” (SVM) is a supervised learning machine learning algorithm that can be used for
both classification or regression challenges. However, it is mostly used in classification problems, such as
text classification. In the SVM algorithm, we plot each data item as a point in n-dimensional space (where n
is the number of features you have), with the value of each feature being the value of a particular coordinate.
Then, we perform classification by finding the optimal hyper-plane that differentiates the two classes very
well (look at the below snapshot).

Support Vectors are simply the coordinates of individual observation, and a hyper-plane is a form of SVM
visualization. The SVM classifier is a frontier that best segregates the two classes (hyper-plane/line).

• Hyperplane : In an SVM, a hyperplane is a decision boundary that separates different classes of

data points. For instance, in a two-dimensional space, the hyperplane is a line; in a three-dimensional
space, it is a plane. The goal of the SVM is to find the optimal hyperplane that maximizes the margin
between the classes. The margin is defined as the distance between the hyperplane and the nearest
data points from either class.

• Support Vectors: Support vectors are the data points that are closest to the hyperplane. These points
are critical because they determine the position and orientation of the hyperplane. If you remove a
support vector, it can change the hyperplane’s position.

• Decision Boundary: A decision boundary can be thought of as a demarcation line (for

simplification) on one side of which lie positive examples and on the other side lie the negative
examples. On this very line, the examples may be classified as either positive or negative.

Types of Support Vector Machine

1. Linear SVM
Linear SVM is used when the data is linearly separable, which means that the classes can be separated
with a straight line (in 2D) or a flat plane (in 3D). The SVM algorithm finds the hyperplane that best
divides the data into classes.

2. Non-Linear SVM
Non-Linear SVM is used when the data is not linearly separable. In such cases, SVM employs kernel
functions to transform the data into a higher-dimensional space where a linear separation is possible. The
algorithm then finds the optimal hyperplane in this new space.

Decision Boundary and the C Parameter:

The decision boundary in SVM is defined by the equation:

w⋅x+b=0\mathbf{w} \cdot \mathbf{x} + b = 0w⋅x+b=0
Where:
• w\mathbf{w}w is the weight vector (normal to the hyperplane),
• bbb is the bias term.

The Role of the C Parameter

The C parameter plays a crucial role in controlling the trade-off between margin size and classification
accuracy:

• When C is large:
o The penalty for misclassification is high.
o The model will prioritize correct classification of every training point, resulting in a smaller
margin and potentially overfitting to noise in the data (especially in cases where there are
outliers or overlapping classes).
o The model becomes less tolerant to errors.

• When C is small:
o The penalty for misclassification is low.
o The model will allow more misclassifications to obtain a larger margin between classes,
which can lead to better generalization.
o The model becomes more tolerant to errors.

Thus, C is a hyperparameter that allows the user to control the complexity of the model:
• Small C: More regularization, wider margin, but more possible misclassifications.
• Large C: Less regularization, narrower margin, fewer misclassifications but more likely to overfit.

Kernel Functions in SVM

Kernels are functions that take low-dimensional input space and transform it into a higher-dimensional
space. SVM can create complex decision boundaries by using kernel functions. A kernel helps us find a
hyperplane in the higher dimensional space without increasing the computational cost.
Types are as follows:
1. Linear Kernel

Used when the data is linearly separable.

2. Polynomial Kernel

Where c is a constant, and d is the degree of the polynomial. This kernel is useful for classifying data
with polynomial relationships.

3. Radial Basis Function (RBF) Kernel / Gaussian Kernel

Where γ is a parameter that defines the influence of a single training example. This is one of the most
popular kernels for non-linear data.

4. Sigmoid Kernel

Where α and care kernel parameters. It behaves like a neural network’s activation function.

Choosing Landmark Points:

• Support vectors are the data points that lie closest to the decision boundary (hyperplane) and are
critical in determining the margin between classes. In a sense, these points can be considered the
"landmarks" of the SVM model because they define the decision boundary.
• The role of these points is significant because the SVM maximizes the margin (distance between the
hyperplane and the nearest data points) to improve generalization.

While in SVM the concept of choosing landmark points typically relates to identifying support vectors, there
are some strategies you could consider for selecting them or for visualizing their importance:

• In Linear SVM:
o Landmark points are easy to visualize in 2D or 3D by plotting the support vectors and the
hyperplane. These points lie at the edges of the margin and directly affect the boundary
between classes.
• In Non-linear SVM:
o Landmark points are less directly interpretable due to the transformation of the data into a
higher-dimensional feature space using kernels (e.g., RBF, polynomial). However, they still
correspond to the support vectors that are closest to the decision boundary in this transformed
space.

Similarity Function

A similarity function in SVM is a measure that quantifies how similar two data points are in terms of their
features. The main goal of using similarity functions (via kernel functions) in SVM is to implicitly map the
input data into a higher-dimensional space without explicitly calculating the transformation. This is done
using kernel tricks.

Implementing SVM Algorithm on Iris Dataset:

# Import necessary libraries

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score
from sklearn.preprocessing import StandardScaler

# Load the Iris dataset

iris = datasets.load_iris()
X = iris.data # Features
y = iris.target # Labels (setosa, versicolor, virginica)

# Standardize the features

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Create an SVM classifier (using RBF kernel by default)

svm_classifier = SVC(kernel='rbf', random_state=42)

# Train the model on the training data

svm_classifier.fit(X_train, y_train)

# Make predictions on the test set

y_pred = svm_classifier.predict(X_test)

# Calculate the accuracy

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

# Print the classification report (precision, recall, F1-score)

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Optional: Visualize the decision boundary for the first two features (2D visualization)
X_train_2d = X_train[:, :2]
X_test_2d = X_test[:, :2]

# Train the SVM classifier on the 2D data

svm_classifier_2d = SVC(kernel='rbf', random_state=42)
svm_classifier_2d.fit(X_train_2d, y_train)

# Create a mesh grid for plotting the decision boundary

h = .02 # Step size in the mesh
x_min, x_max = X_train_2d[:, 0].min() - 1, X_train_2d[:, 0].max() + 1
y_min, y_max = X_train_2d[:, 1].min() - 1, X_train_2d[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

# Plot the decision boundary

Z = svm_classifier_2d.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=0.8)

plt.scatter(X_train_2d[:, 0], X_train_2d[:, 1], c=y_train, edgecolors='k', marker='o', s=100, label='Train
Data')
plt.scatter(X_test_2d[:, 0], X_test_2d[:, 1], c=y_test, edgecolors='r', marker='x', s=100, label='Test Data')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('SVM Decision Boundary (2D)')
plt.legend()
plt.show()

Support Vector Regression

Support Vector Machines (SVM) are widely used in machine learning for classification problems, but they
can also be applied to regression problems through Support Vector Regression (SVR). SVR uses the same
principles as SVM but focuses on predicting continuous outputs rather than classifying data points. This
tutorial will explore SVR’s work, emphasizing key concepts such as quadratic, radial basis function, and
sigmoid kernels. By leveraging these kernels, SVR can effectively handle complex, non-linear relationships
in data.

The SVR model, unlike typical regression models, employs support vector machines (SVMs) principles to
transform input features into high-dimensional spaces to locate the ideal hyperplane that accurately
represents the data. This method enables support vector regression (SVR) to effectively manage both linear
and non-linear relationships, rendering it a versatile tool across different fields, such as financial forecasting
and scientific research.

The problem of regression is to find a function that approximates mapping from an input domain to real
numbers based on a training sample. So, let’s dive deep and understand how SVR actually works.
Consider these two red lines as the decision boundary and the green line as the hyperplane. When we move
on with SVR in Machine Learning, our objective is to consider the points within the decision boundary line.
Our best fit line is the hyperplane with the maximum number of points.
The first thing that we’ll understand is what is the decision boundary (the danger red line above!). Consider
these lines as being at any distance, say ‘a’, from the hyperplane. So, these are the lines that we draw at
distance ‘+a’ and ‘-a’ from the hyperplane. This ‘a’ in the text is basically referred to as epsilon.

Assuming that the equation of the hyperplane is as follows:

Y = wx+b (equation of hyperplane)

Then the equations of decision boundary become:

wx+b= +a

wx+b= -a
Thus, any hyperplane that satisfies our SVM for Regression Model should satisfy:
-a < Y- wx+b < +a

Pros and Cons of SVM

Pros:
• It works really well with a clear margin of separation.
• It is effective in high-dimensional spaces.
• It is effective in cases where the number of dimensions is greater than the number of samples.
• It uses a subset of the training set in the decision function (called support vectors), so it is also
memory efficient.

Cons:
• It doesn’t perform well when we have a large data set because the required training time is higher.
• It also doesn’t perform very well when the data set has more noise, i.e., target classes are
overlapping.
• The SVM algorithm doesn’t directly provide probability estimates; it calculates them using an
expensive five-fold cross-validation. The related SVC method of the Python scikit-learn library
includes this feature.

ML GTU Solution
No ratings yet
ML GTU Solution
83 pages
IML-IITKGP - Assignment 1 Solution
No ratings yet
IML-IITKGP - Assignment 1 Solution
7 pages
Chapter
100% (1)
Chapter
101 pages
ML Viva Questions
No ratings yet
ML Viva Questions
25 pages
9-Decision Tree Induction-23-01-2025
No ratings yet
9-Decision Tree Induction-23-01-2025
40 pages
Navies Bayes
No ratings yet
Navies Bayes
18 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
Naïve Bayes
No ratings yet
Naïve Bayes
15 pages
(MCQ) Data
No ratings yet
(MCQ) Data
8 pages
Naive Bayes Classifier in Machine Learning - Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning - Javatpoint
19 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
37 pages
Smola PDF
No ratings yet
Smola PDF
271 pages
DWDM Lab Manual r20
No ratings yet
DWDM Lab Manual r20
97 pages
UNIT - IV
No ratings yet
UNIT - IV
169 pages
Machine Learning Algorithms Laiki
No ratings yet
Machine Learning Algorithms Laiki
123 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
3 pages
ML Module4 Classification
No ratings yet
ML Module4 Classification
79 pages
Explainable AI Revolutionizing Kidney Transplants Seeing Clearly To Save Lives
No ratings yet
Explainable AI Revolutionizing Kidney Transplants Seeing Clearly To Save Lives
6 pages
ML Notes (III BCA)
No ratings yet
ML Notes (III BCA)
64 pages
Co-2 ML 2019
No ratings yet
Co-2 ML 2019
71 pages
Data Analysis and Graphics Using R An Example Based Approach Third Edition John Maindonald
100% (1)
Data Analysis and Graphics Using R An Example Based Approach Third Edition John Maindonald
59 pages
Unit-4 Naïve Bayes & Support Vector Machine
No ratings yet
Unit-4 Naïve Bayes & Support Vector Machine
79 pages
Mooc On Weka
No ratings yet
Mooc On Weka
59 pages
Unit - 3
No ratings yet
Unit - 3
83 pages
(PDF Download) Modern Statistics With R Måns Thulin Fulll Chapter
100% (3)
(PDF Download) Modern Statistics With R Måns Thulin Fulll Chapter
64 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
UNIT 2 AAM Notes
No ratings yet
UNIT 2 AAM Notes
38 pages
Using Machine Learning To Predict Future TV Ratings
No ratings yet
Using Machine Learning To Predict Future TV Ratings
13 pages
UNIT II Machine Learning
No ratings yet
UNIT II Machine Learning
43 pages
Ame: Waqar Ali
No ratings yet
Ame: Waqar Ali
22 pages
Naive Bayes Classifier in Machine Learning Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning Javatpoint
23 pages
Lo, Mamaysky &wang - Foundations of Technical Analysis PDF
No ratings yet
Lo, Mamaysky &wang - Foundations of Technical Analysis PDF
61 pages
Unit 2 AAM
No ratings yet
Unit 2 AAM
32 pages
CSL0777 L24
No ratings yet
CSL0777 L24
38 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
Bayes Classifier
No ratings yet
Bayes Classifier
35 pages
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
No ratings yet
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
28 pages
Notes
No ratings yet
Notes
32 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Predicting The Price of Airline Tickets
No ratings yet
Predicting The Price of Airline Tickets
30 pages
Unit 3
No ratings yet
Unit 3
20 pages
Bayesian
No ratings yet
Bayesian
23 pages
07 - ML - Naive-Bayes-update
No ratings yet
07 - ML - Naive-Bayes-update
26 pages
NOTES
No ratings yet
NOTES
15 pages
LM3 - Naive Bayes Model
No ratings yet
LM3 - Naive Bayes Model
21 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
Artificial Intelligence Lec 3
No ratings yet
Artificial Intelligence Lec 3
17 pages
Unit 5
No ratings yet
Unit 5
28 pages
Naive Bayes
No ratings yet
Naive Bayes
26 pages
AIML Interview Questions
No ratings yet
AIML Interview Questions
17 pages
16 - Naïve Bayes Classifier
No ratings yet
16 - Naïve Bayes Classifier
21 pages
Stepwise Versus Hierarchical Regression: Pros and Cons
No ratings yet
Stepwise Versus Hierarchical Regression: Pros and Cons
30 pages
L25 - Naïve Bayes
No ratings yet
L25 - Naïve Bayes
18 pages
Bayes Classifier
No ratings yet
Bayes Classifier
20 pages
Ds Answers
No ratings yet
Ds Answers
14 pages
Cours #5 - Naive Bayes Classification
No ratings yet
Cours #5 - Naive Bayes Classification
18 pages
Naive Bates Classifier
No ratings yet
Naive Bates Classifier
18 pages
ML Process in Azure Cloud
No ratings yet
ML Process in Azure Cloud
17 pages
Lecture 3
No ratings yet
Lecture 3
15 pages
Learning To Identify Internet Sexual Predation
No ratings yet
Learning To Identify Internet Sexual Predation
22 pages
Deep Learning For Automatic Violence Detection - Tests On The AIRTLab Dataset
No ratings yet
Deep Learning For Automatic Violence Detection - Tests On The AIRTLab Dataset
16 pages
What Is Naive Bayes Algorithm
No ratings yet
What Is Naive Bayes Algorithm
10 pages
PPT9 Final Clubbed
No ratings yet
PPT9 Final Clubbed
12 pages
FPA Notes
No ratings yet
FPA Notes
13 pages
Naive Bayes Classifier in Machine Learning
No ratings yet
Naive Bayes Classifier in Machine Learning
16 pages
Notes On Module 3 - Pattern Recognition
No ratings yet
Notes On Module 3 - Pattern Recognition
17 pages
Pa ZG512 Ec-3r First Sem 2022-2023
No ratings yet
Pa ZG512 Ec-3r First Sem 2022-2023
5 pages
Gait Cycle Prediction Model Based On Gait Kinematic Using Machine Learning Technique For Assistive Rehabilitat
No ratings yet
Gait Cycle Prediction Model Based On Gait Kinematic Using Machine Learning Technique For Assistive Rehabilitat
12 pages
WK 08
No ratings yet
WK 08
10 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
11 pages
6d7701 - Bayesean Classifer
No ratings yet
6d7701 - Bayesean Classifer
8 pages
Pgm5 With Output
No ratings yet
Pgm5 With Output
13 pages
Unit 2.2
No ratings yet
Unit 2.2
9 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
6 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
7 pages
Predicting Ayurveda-Based Constituent Balancing in Human Body Using Machine Learning Methods
No ratings yet
Predicting Ayurveda-Based Constituent Balancing in Human Body Using Machine Learning Methods
11 pages
IS328 Assignment1-Unlocked
No ratings yet
IS328 Assignment1-Unlocked
10 pages
Using Accuracy and Diversity To Select Classifiers To Build Ensembles
No ratings yet
Using Accuracy and Diversity To Select Classifiers To Build Ensembles
7 pages
6 Easy Steps To Learn Naive Bayes Algorithm With Codes in Python and R
No ratings yet
6 Easy Steps To Learn Naive Bayes Algorithm With Codes in Python and R
6 pages
Condition Monitoring of Self Aligning Carrying Idler (SAI) in
No ratings yet
Condition Monitoring of Self Aligning Carrying Idler (SAI) in
6 pages
Naive Bayes Theory
No ratings yet
Naive Bayes Theory
4 pages
Naive Bayesian Classifier: National Institute of Technology Sikkim
No ratings yet
Naive Bayesian Classifier: National Institute of Technology Sikkim
6 pages
6 Easy Steps To Learn Naive Bayes Algorithm (With Code in Python)
No ratings yet
6 Easy Steps To Learn Naive Bayes Algorithm (With Code in Python)
3 pages
Learning in Artificial Intelligence
No ratings yet
Learning in Artificial Intelligence
6 pages
Validating Computational Models: Kathleen M. Carley
No ratings yet
Validating Computational Models: Kathleen M. Carley
24 pages

Unit 5-6

Uploaded by

Unit 5-6

Uploaded by

Unit No 5: Naive Bayes, KNN and SVM

Types of Naive Bayes Classifiers

P(TimeZone= US | Spam=yes)= 10/10 = 1

P(TimeZone= US | Spam=yes)= 11/12

This is how we’ll get rid of getting a zero probability.

Disadvantages of a Naive Bayes Classifier

The dataset is represented as below.

According to this example, Bayes theorem can be rewritten as:

Frequency and Likelihood tables of ‘Color’

Frequency and Likelihood tables of ‘Type’

So in our example, we have 3 predictors X.

Since 0.144 > 0.048,

It can be used by setting the value of p equal to 2 in Minkowski distance metric.

2. Stratified k-Fold Cross-Validation

3. Leave-One-Out Cross-Validation (LOOCV)

4. Time Series Cross-Validation

The choice of K affects the KNN model's performance:

Key Challenges of the Curse of Dimensionality

Handling categorical data

1. Encoding Categorical Variables

Pros of KNN algorithm

Cons of KNN algorithm

• Hyperplane : In an SVM, a hyperplane is a decision boundary that separates different classes of

• Decision Boundary: A decision boundary can be thought of as a demarcation line (for

Types of Support Vector Machine

Decision Boundary and the C Parameter:

The decision boundary in SVM is defined by the equation:

The Role of the C Parameter

Kernel Functions in SVM

Used when the data is linearly separable.

3. Radial Basis Function (RBF) Kernel / Gaussian Kernel

Choosing Landmark Points:

Implementing SVM Algorithm on Iris Dataset:

# Import necessary libraries

# Load the Iris dataset

# Standardize the features

# Create an SVM classifier (using RBF kernel by default)

# Train the model on the training data

# Make predictions on the test set

# Calculate the accuracy

# Print the classification report (precision, recall, F1-score)

# Train the SVM classifier on the 2D data

# Create a mesh grid for plotting the decision boundary

# Plot the decision boundary

plt.contourf(xx, yy, Z, alpha=0.8)

Support Vector Regression

Assuming that the equation of the hyperplane is as follows:

Then the equations of decision boundary become:

Pros and Cons of SVM

You might also like