0% found this document useful (0 votes)
17 views7 pages

Unit - 2-1

Support Vector Machines (SVM) are supervised machine learning models used for classification, focusing on finding the optimal hyperplane that maximizes the margin between classes. Key concepts include hyperplanes, margins, and support vectors, with applications in areas like face detection, text categorization, and medical diagnosis. SVMs can handle both linearly and non-linearly separable data through techniques like the kernel trick and are known for their robustness and effectiveness in various domains.

Uploaded by

toy955086
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views7 pages

Unit - 2-1

Support Vector Machines (SVM) are supervised machine learning models used for classification, focusing on finding the optimal hyperplane that maximizes the margin between classes. Key concepts include hyperplanes, margins, and support vectors, with applications in areas like face detection, text categorization, and medical diagnosis. SVMs can handle both linearly and non-linearly separable data through techniques like the kernel trick and are known for their robustness and effectiveness in various domains.

Uploaded by

toy955086
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

UNIT- 2

Support Vector Machines (SVM)


Introduction to SVM
 Support Vector Machine (SVM) is a powerful supervised machine learning
model used for classification tasks.
 It finds the optimal hyperplane that separates different classes in the
dataset.
Key Concepts
1. Hyperplane:
 In one-dimensional space, it's a point; in two dimensions, it's a line;
and in three dimensions, it's a surface.
 The goal of SVM is to find the hyperplane that maximizes the
margin between classes.
2. Margin:
 The margin is defined as the distance between the hyperplane and
the nearest data points from either class, known as support vectors.
 SVM is often referred to as a "maximum margin classifier" because
it seeks to maximize this margin.
3. Support Vectors:
 These are the data points closest to the hyperplane and are critical
in defining its position.
 Only support vectors influence the decision boundary; other points
do not affect it.
Mathematical Formulation
 To mathematically find the optimal hyperplane, we define it as:

w⋅x+b=0w⋅x+b=0
where ww is the weight vector, xx is the feature vector, and bb is the bias.
 The objective is to maximize the margin defined as:

Margin=2∥w∥Margin=∥w∥2

This leads to minimizing 12∥w∥221∥w∥2 subject to constraints based on class


labels.
Lagrange Multipliers
 To solve this constrained optimization problem, we use Lagrange
multipliers.
 The Lagrangian can be expressed as:
L(w,b,α)=12∥w∥2−∑i=1Nαi(yi(w⋅xi+b)−1)L(w,b,α)=21∥w∥2−i=1∑Nαi(yi(w⋅xi+b)
−1)
where yiyi are class labels and αiαi are Lagrange multipliers.
Hard Margin vs. Soft Margin
 Hard Margin SVM: Assumes data is linearly separable without any noise
or outliers.
 Soft Margin SVM: Introduces slack variables to allow some
misclassifications, accommodating non-linearly separable data and
outliers. The optimization problem becomes:

min⁡w,b,ξ(12∥w∥2+C∑i=1Nξi)w,b,ξmin(21∥w∥2+Ci=1∑Nξi)
where CC controls the trade-off between maximizing the margin and minimizing
classification errors.
Kernel Trick
 For non-linearly separable data, SVM can apply kernel functions to
transform data into higher dimensions where a linear separation is
possible.
 Common kernels include:

 Polynomial Kernel: K(xi,xj)=(xi⋅xj+c)dK(xi,xj)=(xi⋅xj+c)d

 Radial Basis Function (RBF) Kernel: K(xi,xj)=e−γ∣∣xi−xj∣∣2K(xi,xj


)=e−γ∣∣xi−xj∣∣2
Conclusion
 SVM is a robust method for classification that effectively handles both
linear and non-linear problems through its mathematical framework and
kernel trick. It emphasizes maximizing margins while being flexible enough
to accommodate noise and outliers in datasets.
Applications of SVM
1. Face Detection
 SVMs classify parts of images as either face or non-face, creating
boundaries around detected faces. This technology is commonly
used in security systems and social media platforms for automatic
tagging and recognition
2. Text and Hypertext Categorization
 SVMs are employed to categorize documents into various classes,
such as news articles or emails. They analyze text data and assign
categories based on scores compared against predefined
thresholds, enhancing content organization and retrieval systems

3. Image Classification
 In image processing, SVMs improve accuracy in classifying images
compared to traditional methods. They are used for object detection
and image retrieval, significantly enhancing search results in visual
databases
4. Bioinformatics
 SVMs play a crucial role in biological data analysis, including protein
classification and cancer diagnosis. They help identify gene
expressions and classify patients based on genetic information,
aiding in personalized medicine
5. Handwriting Recognition
 This application involves recognizing handwritten characters and is
widely used in postal services and document digitization. SVMs
analyze character features to enable accurate transcription of
handwritten text
6. Spam Detection
 In natural language processing (NLP), SVMs are effective for filtering
spam emails by classifying messages based on their content,
improving email delivery systems like those used by Gmail
7. Financial Forecasting
 SVMs are applied in the financial sector for stock market analysis
and fraud detection. Their ability to handle high-dimensional data
makes them suitable for predicting market trends and identifying
unusual patterns indicative of fraudulent activities

8. Medical Diagnosis
 Beyond cancer detection, SVMs assist in diagnosing various
diseases by analyzing complex medical datasets, helping healthcare
professionals make informed decisions based on predictive analytics
9. Remote Homology Detection
 In computational biology, SVMs are used to detect similarities
between protein structures, which is essential for understanding
biological functions and evolutionary relationships

10.Generalized Predictive Control (GPC)


 SVM-based GPC is used to manage chaotic systems in engineering
applications, allowing for better control over dynamic processes
with useful parameters
These applications demonstrate the versatility of SVMs across different domains,
highlighting their importance in both research and practical implementations.
Separating data with the maximum margin
Support vector machines
Pros: Low generalization error, computationally inexpensive, easy to interpret
results
Cons: Sensitive to tuning parameters and kernel choice; natively only handles
binary classification
Works with: Numeric values, nominal values
To introduce the subject of support vector machines I need to explain a few
concepts. Consider the data in frames A–D in figure 6.1; could you draw a
straight line to put all of the circles on one side and all of the squares on another
side? Now consider the data in figure 6.2, frame A. There are two groups of data,
and the data points are separated enough that you could draw a straight line on
the figure with all the points of one class on one side of the line and all the points
of the other class on the other side of the line. If such a situation exists, we say
the data is linearly separable. Don’t worry if this assumption seems too perfect.
We’ll later make some changes where the data points can spill over the line.
Framing the optimization problem in terms of our classifier
I’ve talked about the classifier but haven’t mentioned how it works.
Understanding how the classifier works will help you to understand the
optimization problem. We’ll have a simple equation like the sigmoid where we
can enter our data values and get a class label out. We’re going to use
something like the Heaviside step function, f(wTx+b), where the
function f(u) gives us -1 if u<0, and 1 otherwise. This is different from logistic
regression in the previous chapter where the class labels were 0 or 1.
Why did we switch from class labels of 0 and 1 to -1 and 1? This makes the math
manageable, because -1 and 1 are only different by the sign. We can write a
single equation to describe the margin or how close a data point is to our
separating hyperplane and not have to worry if the data is in the -1 or +1 class.
When we’re doing this and deciding where to place the separating line, this
margin is calculated by label*(wTx+b). This is where the -1 and 1 class labels
help out. If a point is far away from the separating plane on the positive side,
then wTx+b will be a large positive number, and label*(wTx+b) will give us a
large number. If it’s far from the negative side and has a negative
label, label*(wTx+b) will also give us a large positive number.
The goal now is to find the w and b values that will define our classifier. To do
this, we must find the points with the smallest margin. These are the support
vectors briefly mentioned earlier. Then, when we find the points with the
smallest margin, we must maximize that margin. This can be written as
Solving this problem directly is pretty difficult, so we can convert it into another
form that we can solve more easily. Let’s look at the inside of the previous
equation, the part inside the curly braces. Optimizing multiplications can be
nasty, so what we do is hold one part fixed and then maximize the other part. If
we set label*(wTx+b) to be 1 for the support vectors, then we can maximize ||
w||-1 and we’ll have a solution. Not all of the label*(wTx+b) will be equal to 1,
only the closest values to the separating hyper-plane. For values farther away
from the hyperplane, this product will be larger.
The optimization problem we now have is a constrained optimization problem
because we must find the best values, provided they meet some constraints.
Here, our constraint is that label*(wTx+b) will be 1.0 or greater. There’s a well-
known method for solving these types of constrained optimization problems,
using something called Lagrange multipliers. Using Lagrange multipliers, we can
write the problem in terms of our constraints. Because our constraints are our
data points, we can write the values of our hyperplane in terms of our data
points. The optimization function turns out to be

Instance-based learning
The Machine Learning systems which are categorized as instance-based
learning are the systems that learn the training examples by heart and then
generalizes to new instances based on some similarity measure. It is called
instance-based because it builds the hypotheses from the training instances. It is
also known as memory-based learning or lazy-learning (because they delay
processing until a new instance must be classified). The time complexity of this
algorithm depends upon the size of training data. Each time whenever a new
query is encountered, its previously stores data is examined. And assign to a
target function value for the new instance.
The worst-case time complexity of this algorithm is O (n), where n is the number
of training instances. For example, If we were to create a spam filter with an
instance-based learning algorithm, instead of just flagging emails that are
already marked as spam emails, our spam filter would be programmed to also
flag emails that are very similar to them. This requires a measure of resemblance
between two emails. A similarity measure between two emails could be the same
sender or the repetitive use of the same keywords or something else.
Advantages:
1. Instead of estimating for the entire instance set, local approximations can
be made to the target function.
2. This algorithm can adapt to new data easily, one which is collected as we
go .
Disadvantages:
1. Classification costs are high
2. Large amount of memory required to store the data, and each query
involves starting the identification of a local model from scratch.
Some of the instance-based learning algorithms are :
1. K Nearest Neighbor (KNN)
2. Self-Organizing Map (SOM)
3. Learning Vector Quantization (LVQ)
4. Locally Weighted Learning (LWL)
5. Case-Based Reasoning

You might also like