0% found this document useful (0 votes)
8 views15 pages

Unit - 2

Support Vector Machines (SVM) are supervised machine learning models used for classification by finding the optimal hyperplane that maximizes the margin between different classes. Key concepts include hyperplanes, margins, and support vectors, with mathematical formulations involving Lagrange multipliers and kernel tricks for non-linear data. SVMs have diverse applications across various fields such as face detection, text categorization, bioinformatics, and financial forecasting, showcasing their versatility and effectiveness.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views15 pages

Unit - 2

Support Vector Machines (SVM) are supervised machine learning models used for classification by finding the optimal hyperplane that maximizes the margin between different classes. Key concepts include hyperplanes, margins, and support vectors, with mathematical formulations involving Lagrange multipliers and kernel tricks for non-linear data. SVMs have diverse applications across various fields such as face detection, text categorization, bioinformatics, and financial forecasting, showcasing their versatility and effectiveness.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

UNIT- 2

Support Vector Machines (SVM)


Introduction to SVM
 Support Vector Machine (SVM) is a powerful supervised machine learning
model used for classification tasks.
 It finds the optimal hyperplane that separates different classes in the
dataset.
Key Concepts
1. Hyperplane:
 In one-dimensional space, it's a point; in two dimensions, it's a line;
and in three dimensions, it's a surface.
 The goal of SVM is to find the hyperplane that maximizes the
margin between classes.
2. Margin:
 The margin is defined as the distance between the hyperplane and
the nearest data points from either class, known as support vectors.
 SVM is often referred to as a "maximum margin classifier" because
it seeks to maximize this margin.
3. Support Vectors:
 These are the data points closest to the hyperplane and are critical
in defining its position.
 Only support vectors influence the decision boundary; other points
do not affect it.
Mathematical Formulation
 To mathematically find the optimal hyperplane, we define it as:

w⋅x+b=0w⋅x+b=0
where ww is the weight vector, xx is the feature vector, and bb is the bias.
 The objective is to maximize the margin defined as:

Margin=2∥w∥Margin=∥w∥2

This leads to minimizing 12∥w∥221∥w∥2 subject to constraints based on class


labels.
Lagrange Multipliers
 To solve this constrained optimization problem, we use Lagrange
multipliers.
 The Lagrangian can be expressed as:
L(w,b,α)=12∥w∥2−∑i=1Nαi(yi(w⋅xi+b)−1)L(w,b,α)=21∥w∥2−i=1∑Nαi(yi(w⋅xi+b)
−1)
where yiyi are class labels and αiαi are Lagrange multipliers.
Kernel Trick
 For non-linearly separable data, SVM can apply kernel functions to
transform data into higher dimensions where a linear separation is
possible.
 Common kernels include:

 Polynomial Kernel: K(xi,xj)=(xi⋅xj+c)dK(xi,xj)=(xi⋅xj+c)d

 Radial Basis Function (RBF) Kernel: K(xi,xj)=e−γ∣∣xi−xj∣∣2K(xi,xj


)=e−γ∣∣xi−xj∣∣2
Conclusion
 SVM is a robust method for classification that effectively handles both
linear and non-linear problems through its mathematical framework and
kernel trick. It emphasizes maximizing margins while being flexible enough
to accommodate noise and outliers in datasets.
Applications of SVM
1. Face Detection
 SVMs classify parts of images as either face or non-face, creating
boundaries around detected faces. This technology is commonly
used in security systems and social media platforms for automatic
tagging and recognition
2. Text and Hypertext Categorization
 SVMs are employed to categorize documents into various classes,
such as news articles or emails. They analyze text data and assign
categories based on scores compared against predefined
thresholds, enhancing content organization and retrieval systems

3. Image Classification
 In image processing, SVMs improve accuracy in classifying images
compared to traditional methods. They are used for object detection
and image retrieval, significantly enhancing search results in visual
databases
4. Bioinformatics
 SVMs play a crucial role in biological data analysis, including protein
classification and cancer diagnosis. They help identify gene
expressions and classify patients based on genetic information,
aiding in personalized medicine
5. Handwriting Recognition
 This application involves recognizing handwritten characters and is
widely used in postal services and document digitization. SVMs
analyze character features to enable accurate transcription of
handwritten text
6. Spam Detection
 In natural language processing (NLP), SVMs are effective for filtering
spam emails by classifying messages based on their content,
improving email delivery systems like those used by Gmail
7. Financial Forecasting
 SVMs are applied in the financial sector for stock market analysis
and fraud detection. Their ability to handle high-dimensional data
makes them suitable for predicting market trends and identifying
unusual patterns indicative of fraudulent activities

8. Medical Diagnosis
 Beyond cancer detection, SVMs assist in diagnosing various
diseases by analyzing complex medical datasets, helping healthcare
professionals make informed decisions based on predictive analytics
9. Remote Homology Detection
 In computational biology, SVMs are used to detect similarities
between protein structures, which is essential for understanding
biological functions and evolutionary relationships

10.Generalized Predictive Control (GPC)


 SVM-based GPC is used to manage chaotic systems in engineering
applications, allowing for better control over dynamic processes
with useful parameters
These applications demonstrate the versatility of SVMs across different domains,
highlighting their importance in both research and practical implementations.

Separating data with the maximum margin Support


vector machines
Pros: Low generalization error, computationally inexpensive, easy to interpret
results
Cons: Sensitive to tuning parameters and kernel choice; natively only handles
binary classification
Works with: Numeric values, nominal values
To introduce the subject of support vector machines I need to explain a few
concepts. Consider the data in frames A–D in figure 6.1; could you draw a
straight line to put all of the circles on one side and all of the squares on another
side? Now consider the data in figure 6.2, frame A. There are two groups of data,
and the data points are separated enough that you could draw a straight line on
the figure with all the points of one class on one side of the line and all the points
of the other class on the other side of the line. If such a situation exists, we say
the data is linearly separable. Don’t worry if this assumption seems too perfect.
We’ll later make some changes where the data points can spill over the line.

Framing the optimization problem in terms of our classifier


I’ve talked about the classifier but haven’t mentioned how it works.
Understanding how the classifier works will help you to understand the
optimization problem. We’ll have a simple equation like the sigmoid where we
can enter our data values and get a class label out. We’re going to use
something like the Heaviside step function, f(wTx+b), where the
function f(u) gives us -1 if u<0, and 1 otherwise. This is different from logistic
regression in the previous chapter where the class labels were 0 or 1.
Why did we switch from class labels of 0 and 1 to -1 and 1? This makes the math
manageable, because -1 and 1 are only different by the sign. We can write a
single equation to describe the margin or how close a data point is to our
separating hyperplane and not have to worry if the data is in the -1 or +1 class.
When we’re doing this and deciding where to place the separating line, this
margin is calculated by label*(wTx+b). This is where the -1 and 1 class labels
help out. If a point is far away from the separating plane on the positive side,
then wTx+b will be a large positive number, and label*(wTx+b) will give us a
large number. If it’s far from the negative side and has a negative
label, label*(wTx+b) will also give us a large positive number.
The goal now is to find the w and b values that will define our classifier. To do
this, we must find the points with the smallest margin. These are the support
vectors briefly mentioned earlier. Then, when we find the points with the
smallest margin, we must maximize that margin. This can be written as

Solving this problem directly is pretty difficult, so we can convert it into another
form that we can solve more easily. Let’s look at the inside of the previous
equation, the part inside the curly braces. Optimizing multiplications can be
nasty, so what we do is hold one part fixed and then maximize the other part. If
we set label*(wTx+b) to be 1 for the support vectors, then we can maximize ||
w||-1 and we’ll have a solution. Not all of the label*(wTx+b) will be equal to 1,
only the closest values to the separating hyper-plane. For values farther away
from the hyperplane, this product will be larger.
The optimization problem we now have is a constrained optimization problem
because we must find the best values, provided they meet some constraints.
Here, our constraint is that label*(wTx+b) will be 1.0 or greater. There’s a well-
known method for solving these types of constrained optimization problems,
using something called Lagrange multipliers. Using Lagrange multipliers, we can
write the problem in terms of our constraints. Because our constraints are our
data points, we can write the values of our hyperplane in terms of our data
points. The optimization function turns out to be
Instance-based learning
The Machine Learning systems which are categorized as instance-based
learning are the systems that learn the training examples by heart and then
generalizes to new instances based on some similarity measure. It is called
instance-based because it builds the hypotheses from the training instances. It is
also known as memory-based learning or lazy-learning (because they delay
processing until a new instance must be classified). The time complexity of this
algorithm depends upon the size of training data. Each time whenever a new
query is encountered, its previously stores data is examined. And assign to a
target function value for the new instance.
The worst-case time complexity of this algorithm is O (n), where n is the number
of training instances. For example, If we were to create a spam filter with an
instance-based learning algorithm, instead of just flagging emails that are
already marked as spam emails, our spam filter would be programmed to also
flag emails that are very similar to them. This requires a measure of resemblance
between two emails. A similarity measure between two emails could be the same
sender or the repetitive use of the same keywords or something else.

Advantages:
1. Instead of estimating for the entire instance set, local approximations can
be made to the target function.
2. This algorithm can adapt to new data easily, one which is collected as we
go .
Disadvantages:
1. Classification costs are high
2. Large amount of memory required to store the data, and each query
involves starting the identification of a local model from scratch.
Some of the instance-based learning algorithms are :
1. K Nearest Neighbor (KNN)
2. Self-Organizing Map (SOM)
3. Learning Vector Quantization (LVQ)
4. Locally Weighted Learning (LWL)
5. Case-Based Reasoning
k-Nearest Neighbor Learning
o K-Nearest Neighbour is one of the simplest Machine Learning algorithms
based on Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar
to the available categories.
o K-NN algorithm stores all the available data and classifies a new data point
based on the similarity. This means when new data appears then it can be
easily classified into a well suite category by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
o It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it
gets new data, then it classifies that data into a category that is much
similar to the new data.

o Example: Suppose, we have an image of a creature that looks


similar to cat and dog, but we want to know either it is a cat or dog.
So for this identification, we can use the KNN algorithm, as it works
on a similarity measure. Our KNN model will find the similar features
of the new data set to the cats and dogs images and based on the
most similar features it will put it in either cat or dog category.
Why do we need a K-NN Algorithm?
Suppose there are two categories, i.e., Category A and Category B, and we have
a new data point x1, so this data point will lie in which of these categories. To
solve this type of problem, we need a K-NN algorithm. With the help of K-NN, we
can easily identify the category or class of a particular dataset. Consider the
below diagram

How does K-NN work?


The K-NN working can be explained on the basis of the below algorithm:
o Step-1: Select the number K of the neighbors

o Step-2: Calculate the Euclidean distance of K number of neighbors

o Step-3: Take the K nearest neighbors as per the calculated Euclidean


distance.
o Step-4: Among these k neighbors, count the number of the data points in
each category.
o Step-5: Assign the new data points to that category for which the number
of the neighbor is maximum.
o Step-6: Our model is ready.

Suppose we have a new data point and we need to put it in the required
category. Consider the below image

o Firstly, we will choose the number of neighbors, so we will choose the k=5.

o Next, we will calculate the Euclidean distance between the data points.
The Euclidean distance is the distance between two points, which we have
already studied in geometry. It can be calculated as:
o By calculating the Euclidean distance we got the nearest neighbors, as
three nearest neighbors in category A and two nearest neighbors in
category B. Consider the below image:

Advantages of KNN Algorithm:


o It is simple to implement.

o It is robust to the noisy training data

o It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:


o Always needs to determine the value of K which may be complex some
time.
o The computation cost is high because of calculating the distance between
the data points for all the training samples.
Radial Basis Functions
An RBF is a real-valued function whose output depends only on the distance from
a central point, called the centre. This distance-based dependency makes RBFs
suitable for various applications in Machine Learning, such as function
approximation, interpolation, and classification.
The core idea behind RBFs is that they respond more significantly to inputs
closer to their centre, diminishing their influence as the distance increases. This
characteristic allows RBFs to capture local patterns and nuances in the data
effectively.
Mathematical Formulation

Mathematically, an RBF is represented as ϕ(∥x−c∥), where 𝑥 is the input vector, 𝑐


is the centre, and ∥ 𝑥 − 𝑐 ∥ denotes the Euclidean distance between 𝑥 and 𝑐. The
function ϕ is typically chosen to be a smooth, monotonic function. One of the
most commonly used RBFs is the Gaussian function, which is defined as:

Here, 𝑟 = ∥ 𝑥 − 𝑐 ∥, and σ is a parameter that determines the width of the


Gaussian curve. This formulation ensures that the RBF reaches its peak value at
the centre and decays exponentially as the distance from the centre increases.
Types of Radial Basis Functions
Several types of RBFs are utilised in Machine Learning, each with distinct
properties and applications:
Gaussian
The Gaussian RBF is widely used due to its smooth and localised nature. It is
particularly effective in capturing fine details and data variations.
Multiquadric

The Multiquadric RBF is defined as , where β is a


constant. This function grows with the distance, making it useful for specific
interpolation tasks where long-range influences are needed.
Inverse Multiquadric

The Inverse Multiquadric RBF is given by . Its decreasing


nature provides a smooth transition from the centre outwards, which can be
beneficial for smoothing and regularisation.
How Radial Basis Function Networks Work
Understanding how Radial Basis Function Networks work is crucial for Machine
Learning enthusiasts. Mastering RBF networks enhances one’s ability to tackle
complex data problems with efficient and effective solutions.
Input Layer
The input layer of a Radial Basis Function Network (RBFN) serves as the initial
point of interaction with the data. Here, data features are received and
processed before being forwarded to the next layer.
Each node in the input layer corresponds to a specific feature or attribute of the
input data, ensuring that all relevant information is captured and prepared for
further analysis.
Hidden Layer (RBF Layer)
The hidden layer in an RBFN is where the core processing takes place. Unlike
traditional neural networks that use activation functions like sigmoid or tanh,
RBFNs employ radial basis functions such as Gaussian or Multiquadric.
These functions transform the input data into a higher-dimensional feature space
where similarities between data points are calculated based on their distances
from predefined centres. This layer is crucial in mapping input data to a more
complex representation suitable for learning and decision-making.
Output Layer
After processing through the hidden layer, the transformed data moves to the
RBFN’s output layer. Here, the network makes predictions or classifications based
on the patterns learned during training.
The number of nodes in the output layer typically corresponds to the number of
classes or targets in a supervised learning scenario, where each node represents
a different class or outcome.
Advantages and Disadvantages of Using RBFNs
RBFNs offer distinct advantages and face specific challenges in Machine
Learning. Understanding these strengths and limitations is crucial for effectively
applying RBFNs in various applications.
Advantages of RBFNs:
 Simplicity: RBFNs are known for their straightforward architecture,
consisting of input, hidden (RBF), and output layers. This simplicity
facilitates more straightforward implementation and understanding than
more complex neural networks.
 Fast Training: Due to their radial basis function activation in the hidden
layer, RBFNs often require fewer iterations during training. This results in
faster convergence and reduced computational time, making them
suitable for real-time applications.
 Universal Approximation: RBFNs can approximate any continuous
function to arbitrary accuracy. This versatility makes them powerful tools
for function approximation tasks across various domains.
Disadvantages of RBFNs:
 Sensitivity to Input Data: RBFNs heavily rely on selecting appropriate
centres and spreading radial basis functions. Improper selection can lead
to poor performance or overfitting, especially when dealing with noisy or
sparse datasets.
 Complexity with High-Dimensional Data: Determining radial basis
functions becomes more challenging as input data dimensionality
increases. This complexity can result in increased computational costs and
difficulty in achieving optimal network performance.
 Limited Scalability: RBFNs may face scalability issues when applied to
large datasets or complex problems. Managing many radial basis functions
and optimising network parameters can become impractical and resource-
intensive.
Case-Based Reasoning
Introduction
Case-Based Reasoning in machine learning is an AI technique that is used to
solve problems based on past experiences. The technique is derived from human
problem-solving approaches, where people often rely on their past experiences
to make decisions in new situations. CBR is a type of machine learning that
utilizes a database of previously solved problems or cases to solve new
problems. CBR is based on the idea that similar problems can have similar
solutions, and it uses this similarity to find solutions to new problems.
What is Case-Based Reasoning?
In Case-Based Reasoning in machine learning, a problem is solved by retrieving
similar past cases and adapting them to the current situation. The key terms
present in CBR are:
 Case: A case is a problem that has been previously solved and stored in
the database.
 Similarity: The similarity measure is used to determine the degree of
resemblance between past cases and the current situation.
 Adaptation: Adaptation is the process of modifying a retrieved past case to
fit the current situation.
Process in Case-Based Reasoning
The CBR process typically involves four main steps: retrieve, reuse, revise, and
retain.
 Retrieve: The first step in the CBR process is to retrieve relevant cases
from a case library. This involves searching through the library to find
cases that are similar to the current problem. The goal is to identify cases
that are as close to the current problem as possible, as these are the most
likely to provide useful information. In some cases, the retrieval step may
involve the use of keyword searches or other forms of data mining to
identify relevant cases.
 Reuse: Once relevant cases have been retrieved, the next step is to reuse
them to solve the current problem. This involves adapting the solutions
used in past cases to fit the current problem. The goal is to find a solution
that is similar enough to the past cases to be effective, but also different
enough to address the unique aspects of the current problem. This step
may involve selecting one or more past cases to use as a starting point for
the solution, or it may involve combining elements from multiple past
cases to create a new solution.
 Revise: After a solution has been developed using past cases, the next
step is to revise it to better fit the current problem. This may involve
modifying the solution based on feedback from the user or on new
information that has become available. The goal is to refine the solution to
make it as effective as possible for the current problem. In some cases,
the revision step may involve the use of machine learning algorithms to
optimize the solution.
 Retain: The final step in the CBR process is to retain the newly developed
solution for future use. This involves adding the new case to the case
library so that it can be used in the retrieval step for future problems. The
goal is to continually improve the quality of the case library and the
effectiveness of the CBR process over time. The retention step may also
involve the use of knowledge management tools to help organize and
maintain the case library.
Advantages & Challenges in Case-Based Reasoning
 Reusability: CBR systems can reuse past solutions to similar problems,
which can save time and effort compared to developing a solution from
scratch.
 Adaptability: CBR systems can adapt to changing situations or contexts by
selecting and modifying relevant cases.
 Explanation: CBR systems can provide explanations for their solutions
based on the similar cases they retrieved.
 Learning: CBR systems can learn from new cases and refine their
knowledge base over time.
Challenges of Case-Based Reasoning in machine learning:
 Case Representation: The quality of CBR depends on the accuracy and
completeness of the cases used to solve a problem. If the cases are not
well represented, it may lead to incorrect solutions.
 Case Retrieval: The success of CBR systems depends on the ability to
retrieve relevant cases from the case base. If the retrieval process is not
efficient or effective, it may lead to a poor solution.
 Adaptation: Adapting a retrieved case to a new problem domain can be
difficult, as the retrieved case may not perfectly match the new problem.
 Scalability: As the size of the case base grows, the time required to
retrieve and adapt cases can become significant, which can impact the
efficiency of the system.
Applications of Case-Based Reasoning
It has been applied in various fields, including:
 Financial Decision Making: CBR systems can be used in financial
institutions to help make decisions on loan approvals, risk assessments,
and investment strategies by comparing past cases with current
situations.
 Legal Reasoning: Case-Based Reasoning in machine learning systems can
be used in the legal field to assist with case law research and the
preparation of legal arguments by retrieving and adapting cases with
similar legal issues.
 Transportation and Logistics: CBR systems can be used in transportation
and logistics to optimize routing, scheduling, and resource allocation by
learning from past cases.

You might also like