UNIT-2 (SVM and KNN)+
Density Estimation
What is SVM?
• Support Vector Machine (SVM) is a supervised machine learning algorithm used
for both classification and regression. Though we say regression problems as well
it’s best suited for classification.
• The main objective of the SVM algorithm is to find the optimal hyperplane in an
N-dimensional space that can separate the data points in different classes in the
feature space.
• The hyperplane tries that the margin between the closest points of different classes
should be as maximum as possible.
• The dimension of the hyperplane depends upon the number of features. If the
number of input features is two, then the hyperplane is just a line.
• If the number of input features is three, then the hyperplane becomes a 2-D plane.
It becomes difficult to imagine when the number of features exceeds three.
Hyperplane: There can be multiple lines/decision boundaries to segregate the
classes in n-dimensional space, but we need to find out the best decision
boundary that helps to classify the data points. This best boundary is known as
the hyperplane of SVM. Hyperplane equation is: wx+b
Types of SVM:
• Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
Example:
Let’s plot given points:
Each vector is augmented with bias input
1:
Find the value of a1, a2, and a3.
Solve the following equations:
After solving these equations, a1: -3.5, a2: 0.75, and a3: 0.75
Now, calculate weight vector:
TASK-1
Q 1. Positively labelled data points (2,1)(2,-1)(5,1)(5,-1) and Negatively
labelled data points (1,0)(0,1)(0,-1)(-1,0)
KNN
• The K-Nearest Neighbors (KNN) algorithm is a supervised machine learning
algorithm and finds intense application in pattern recognition, data mining, and
intrusion detection.
• It does not require any assumptions about the underlying data distribution.
• It can also handle both numerical and categorical data, making it a flexible choice for
various types of datasets in classification and regression tasks.
• It is a non-parametric method that makes predictions based on the similarity of data
points in a given dataset. K-NN is less sensitive to outliers compared to other
algorithms.
• The K-NN algorithm works by finding the K nearest neighbors to a given data point
based on a distance metric, such as Euclidean distance.
• The class or value of the data point is then determined by the majority vote or average
of the K neighbors. This approach allows the algorithm to adapt to different patterns
and make predictions based on the local structure of the data.
How to Choose the Value of K in the K-NN
Algorithm:
• There is no particular way of choosing the value K, but here are some
common conventions to keep in mind:
• Choosing a very low value will most likely lead to inaccurate
predictions.
• The commonly used value of K is 5.
• Always use an odd number as the value of K.
Example:
Step 1: Find distance
Step 2: Assign rank to calculated distances from 1 to n
1 for lowest distance
Step 3: Assign label to new datapoint on basis of k value
• If value of K=1 label for new datapoint will be Normal.
• In this scenario value of K=5, and new datapont lies in
Normal class.
Advantages and Disadvantages of K-NN Algorithm:
• Advantages of K-NN Algorithm
• It is simple to implement.
• No training is required before classification.
• Disadvantages of K-NN Algorithm
• Can be cost-intensive when working with a large data set.
• A lot of memory is required for processing large data sets.
• Choosing the right value of K can be tricky.
Density estimation:
• Density estimation is a statistical technique used to estimate the
probability density function (PDF) of a random variable.
• In simple terms, it is a method to estimate how likely it is that a given
observation belongs to a certain distribution.
• Density estimation is particularly useful in fields such as machine
learning, statistics, and data analysis, where understanding the
underlying distribution of data is important.
• There are various methods for density estimation, and here are a few common
ones:
1.Histograms:
1. Divide the data into bins.
2. Count the number of data points in each bin.
3. Normalize the counts by the total number of observations and the bin width to obtain a
probability density.
2. Kernel Density Estimation (KDE):
1. Place a kernel (smooth, continuous function, such as a Gaussian) at each data point.
2. Sum the contributions from all kernels to obtain the estimated density.
3. The bandwidth of the kernel controls the smoothness of the estimate.
In the following figure, 100 points are drawn from a bimodal distribution, and the kernel
density estimates are shown for three choices of kernels:
3. Parametric Methods:
1. Assume a specific parametric form for the underlying distribution (e.g., normal distribution,
exponential distribution).
2. Estimate the parameters of the distribution from the data.
4. Non-parametric Methods:
3. Do not assume a specific parametric form for the distribution.
4. KDE is an example of a non-parametric method.
• Density estimation is often used for tasks such as anomaly detection, clustering, and
generating synthetic data. It helps in understanding the structure of the data and can be a
crucial step in exploratory data analysis.
• In machine learning, density estimation can be part of various algorithms, such as Gaussian
Mixture Models (GMMs) and certain types of neural networks, where estimating the
underlying distribution is essential for making predictions or generating new data samples.
Parzen Window:
• The Parzen window, also known as the kernel density estimation with
a fixed kernel or the "window" method, is a non-parametric technique
used for estimating the probability density function (PDF) of a random
variable.
• It falls under the category of non-parametric density estimation
methods, where the goal is to estimate the underlying distribution of a
set of data points without assuming a specific parametric form for the
distribution.
Feature of Parzen window:
1.Kernel Function:
1. Choose a kernel function, typically a smooth, symmetric, and positive function (e.g., Gaussian
or Epanechnikov kernel).
2. The kernel function defines the shape of the "window" around each data point.
2.Window (or Bandwidth):
1. Choose a fixed window (bandwidth) size or adaptively select it based on the data.
2. The window determines the region around each data point where the kernel function
contributes to the density estimate.
3.Estimation:
1. For each data point, place a window centered at that point.
2. The contribution of each data point to the overall density estimate is given by the chosen
kernel function within its window.
3. Sum the contributions from all data points to obtain the final density estimate.
• The choice of the kernel and bandwidth is crucial, and it
affects the smoothness and accuracy of the density estimate.
Common kernels include the Gaussian, Epanechnikov, and
rectangular kernels.
• The Parzen window method is a flexible and intuitive
approach to estimate probability densities, especially in
situations where the underlying distribution is unknown or
complex.
Thank you!