0% found this document useful (0 votes)
17 views20 pages

Ann Unit III

Artificial neural network

Uploaded by

Chandiran S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views20 pages

Ann Unit III

Artificial neural network

Uploaded by

Chandiran S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

UNIT III

UNIT III – Support Vector Machines and Radial Basis Function: Learning from
Examples, Statistical Learning Theory, Support Vector Machines, SVM application to
Image Classification, Radial Basis Function Regularization theory, Generalized RBF
Networks, Learning in RBFNs, RBF application to face recognition.

PART A

1. What is statistical learning method in AI?


 Statistical Learning is a set of tools for understanding data.
 These tools broadly come under two classes: supervised learning &
unsupervised learning.
 Generally, supervised learning refers to predicting or estimating an output
based on one or more inputs.

2. What is SVM?
 Support Vector Machines(SVM) is considered to be a classification approach
but it can be employed in both types of classification and regression problems.
 It can easily handle multiple continuous and categorical variables. SVM
constructs a hyperplane in multidimensional space to separate different classes.
 SVM generates optimal hyperplane in an iterative manner, which is used to
minimize an error.
 The core idea of SVM is to find a maximum marginal hyperplane (MMH) that
best divides the dataset into classes.

3. Discuss about Radial Basis Function Network?


 A Radial Basis Function network is an artificial forward single hidden layer
feed neural network that uses in the field of mathematical modeling as
activation functions.
 The output of the RBF network is a linear combination of neuron parameters
and radial basis functions of the inputs.
 This network is used in time series prediction, function approximation, system
control and classification.

4. Elaborate the terms ? a)SVM b)RBF c)GRBF

a) support vector machine (SVM)


b) Radial Basis Functions (RBF)
c) Generalized Radial Basis Functions (GRBF)

5. What is the application of RBF in neural network?

RBF networks are being used for function approximation, pattern recognition, and
time series prediction problems.

1
SUPPORT VECTOR MACHINES
 Support vector machine (SVMs) is a supervised machine learning algorithm that
classifies data by finding an optimal line or hyperplane that maximizes the distance
between each class in an N-dimensional space

 Objective:
Trying to find a hyperplane that best separates the two classes.

 Difference between SVM and Logistic Regression


Both algorithms tries to find the best hyperplane, but the main difference is logistic
regression is a probabilistic approach whereas support vector machine is based on
statistical approaches.

 Question is?
There can be an infinite number of hyperplanes passing through a point and classifying
the two classes perfectly.

 Which hyperplane does it select?


SVM does this by finding the maximum margin between the hyperplanes that means
maximum distances between the two classes.

 Depending on the number of features either choose Logistic Regression or SVM.


 SVM works best when the dataset is small and complex.

 Types of Support Vector Machine (SVM) Algorithms


1. Linear SVM
2. Non-Linear SVM

 LINEAR SVM
When the data is perfectly linearly separable only then we can use Linear SVM.
That is., the data points can be classified into 2 classes by using a single straight line
(if 2D).

 NON-LINEAR SVM
When the data points cannot be separated into 2 classes by using a straight line (if 2D),
then use some advanced techniques like kernel tricks to classify them.

 SUPPORT VECTORS AND MARGIN

2
 SUPPORT VECTORS
Support Vectors are the points that are closest to the hyperplane.
A separating line will be defined with the help of these data points.

 MARGINS
Margin is the distance between the hyperplane and the observations closest to the
hyperplane (support vectors).
In SVM large margin is considered a good margin. There are two types of
margins hard margin and soft margin.

 Support Vector Machine Algorithm:


Support Vector Machine (SVM) is a powerful machine learning algorithm used for linear or
nonlinear classification, regression, and even outlier detection tasks. SVMs can be used for a
variety of tasks, such as text classification, image classification, spam detection, handwriting
identification, gene expression analysis, face detection, and anomaly detection.
Let’s consider two independent variables x1, x2, and one dependent variable which is either a
blue circle or a red circle.

Linearly Separable Data points

From the figure above it’s very clear that there are multiple lines (our hyperplane here is a line
because we are considering only two input features x1, x2) that segregate our data points or do
a classification between red and blue circles. So how do we choose the best line or in general
the best hyperplane that segregates our data points?

How does SVM work?

One reasonable choice as the best hyperplane is the one that represents the largest separation
or margin between the two classes.

3
Multiple hyperplanes separate the data from two classes

So we choose the hyperplane whose distance from it to the nearest data point on each side is
maximized. If such a hyperplane exists it is known as the maximum-margin or
hyperplane/hard margin. So from the above figure, we choose L2.

Let’s consider a scenario like shown below

Selecting hyperplane for data with outlier

Here we have one blue ball in the boundary of the red ball. So how does SVM classify the
data? It’s simple! The blue ball in the boundary of red ones is an outlier of blue balls. The
SVM algorithm has the characteristics to ignore the outlier and finds the best hyperplane that
maximizes the margin. SVM is robust to outliers.

Hyperplane which is the most optimized one

4
So in this type of data point what SVM does is, finds the maximum margin as done with
previous data sets along with that it adds a penalty each time a point crosses the margin. So
the margins in these types of cases are called soft margins.

When there is a soft margin to the data set, the SVM tries to
minimize (1/margin+∧(∑penalty)).

Hinge loss is a commonly used penalty. If no violations no hinge loss. If violations hinge loss
proportional to the distance of violation.

If data are not linearly separable?

Original 1D dataset for classification

Say, our data is shown in the figure above. SVM solves this by creating a new variable using
a kernel. We call a point xi on the line and we create a new variable yi as a function of
distance from origin o. so if we plot this we get something like as shown below

Mapping 1D data to 2D to become able to separate the two classes

5
In this case, the new variable y is created as a function of distance from the origin. A non-
linear function that creates a new variable is referred to as a kernel.

Support Vector Machine Terminology

1. Hyperplane: Hyperplane is the decision boundary that is used to separate the data points
of different classes in a feature space. In the case of linear classifications, it will be a linear
equation i.e. wx+b = 0.
2. Support Vectors: Support vectors are the closest data points to the hyperplane, which
makes a critical role in deciding the hyperplane and margin.
3. Margin: Margin is the distance between the support vector and hyperplane. The main
objective of the support vector machine algorithm is to maximize the margin. The wider
margin indicates better classification performance.
4. Kernel: Kernel is the mathematical function, which is used in SVM to map the original
input data points into high-dimensional feature spaces, so, that the hyperplane can be easily
found out even if the data points are not linearly separable in the original input space.
Some of the common kernel functions are linear, polynomial, radial basis function(RBF),
and sigmoid.
5. Hard Margin: The maximum-margin hyperplane or the hard margin hyperplane is a
hyperplane that properly separates the data points of different categories without any
misclassifications.
6. Soft Margin: When the data is not perfectly separable or contains outliers, SVM permits a
soft margin technique. Each data point has a slack variable introduced by the soft-margin
SVM formulation, which softens the strict margin requirement and permits certain
misclassifications or violations. It discovers a compromise between increasing the margin
and reducing violations.
7. C: Margin maximisation and misclassification fines are balanced by the regularisation
parameter C in SVM. The penalty for going over the margin or misclassifying data items is
decided by it. A stricter penalty is imposed with a greater value of C, which results in a
smaller margin and perhaps fewer misclassifications.
8. Hinge Loss: A typical loss function in SVMs is hinge loss. It punishes incorrect
classifications or margin violations. The objective function in SVM is frequently formed
by combining it with the regularisation term.
9. Dual Problem: A dual Problem of the optimisation problem that requires locating the
Lagrange multipliers related to the support vectors can be used to solve SVM. The dual
formulation enables the use of kernel tricks and more effective comp

Advantages of SVM
 Effective in high-dimensional cases.
 Its memory is efficient as it uses a subset of training points in the decision function called
support vectors.
 Different kernel functions can be specified for the decision functions and its possible to
specify custom kernels.

6
SVM APPLICATIONS TO IMAGE CLASSIFICATION

The SVM algorithm works by finding the hyperplane that separates the different classes in the
feature space. The key idea behind SVMs is to find the hyperplane that maximizes the margin,
which is the distance between the closest points of the different classes. The points that are
closest to the hyperplane are called support vectors.

In machine learning, the model is trained by input and expected output data. To create a
model, it is necessary to go through the following phases:

1- Data Collection and Preprocessing

1.1 Image Acquisition:

Capture a substantial dataset of images containing for example ripe oranges, and unripe
oranges. Consider acquiring images from different angles and distances to enhance model
generalizability.

1.2 Image Preprocessing:

 Resizing
 Normalization
 Color Space Conversion
 Data Augmentation (Optional)

2- Feature Extraction:

To classify an image using an SVM, We first need to extract features from the image. These
features can be the color values of the pixels, edge detection, or even the textures present in
the image. Once the features are extracted, we can use them as input for the SVM
algorithm. An image can contain:

 Color Features
 Texture Features
 Shape Features
 Combining Features

3- SVM Classification

Image classification is based on the idea of a colour histogram. A colour is represented by a


point in a three dimensional colour space (HSV) hue-saturation-luminance value which is in
direct correspondence with the RGB space.

Kernel Selection:

 Start with a linear kernel if the feature distribution appears linearly separable.
 If data exhibits nonlinear patterns, consider kernels like RBF (Radial Basis Function),
polynomial, or sigmoid.

7
 Experiment with different kernel types and parameters (e.g., gamma for RBF) through
grid search or random search to find the best combination that maximizes performance

Hyperparameter Tuning

 Fine-tune SVM hyperparameters like the regularization parameter (C) and kernel-
specific parameters using techniques like grid search or random search.
 Aim to balance training accuracy with generalization ability to avoid overfitting.

Model Training

 Split your preprocessed data into training, validation, and testing sets. Train the SVM
model on the training set and evaluate its performance on the validation set to prevent
overfitting. Refine hyperparameters based on validation results.

Model Evaluation

 Evaluate the final model's performance on the unseen testing set using metrics like
accuracy, precision, recall, F1-score, and confusion matrix.

RADIAL BASIS FUNCTION REGULARIZATION THEORY

RBF NETWORKS

The idea of Radial Basis Function (RBF) Networks derives from the theory of function
approximation. We have already seen how Multi-Layer Perceptron (MLP) networks
with a hidden layer of sigmoidal units can learn to approximate functions. RBF
Networks take a slightly different approach. Their main features are:
1. They are two-layer feed-forward networks.
2. The hidden nodes implement a set of radial basis functions (e.g. Gaussian functions).
3. The output nodes implement linear summation functions as in an MLP.
4. The network training is divided into two stages: first the weights from the input
tohidden layer are determined, and then the weights from the hidden to output
layer.
5. The training/learning is very fast.
6. The networks are very good at interpolation.

8
9
10
Refer class notes also

What is a radial basis function network?


The radial basis function network (RBFN) is a type of artificial neural network that uses radial
basis functions as activation functions. It is a powerful learning algorithm primarily used for
regression, classification, time series prediction, and control applications. Unlike traditional

11
neural networks, RBFNs have an input layer, a hidden layer with radial basis neurons, and an
output layer for producing the network's output.

Background and evolution of radial basis function network


The historical evolution of the radial basis function network dates back to the 1980s when it
was formalized for function approximation and interpolation. Initially, it gained prominence in
the field of applied mathematics and data analysis, where it was utilized for solving various
approximation problems. Over time, its adaptation to the domain of artificial intelligence
materialized, leading to its widespread adoption in complex problem-solving scenarios.

Significance of radial basis function network in the AI field


The radial basis function network is of significant importance in the AI field due to its unique
capabilities in pattern recognition and function approximation. In AI model training and
implementation, RBFNs offer a distinct advantage in processing complex and non-linear data,
making it an invaluable tool for addressing real-world challenges.

Understanding how radial basis function network works


The operational principle of the radial basis function network lies in its characteristic feature of
using radial basis functions as activation functions. These functions are based on the distance
between the input and prototype vectors, capturing the input space's non-linear relationships.
RBFNs are capable of adaptively learning from training data and generalize learned patterns to
make predictions on unseen data.

Pros & cons of radial basis function network


Benefits of RBFN in AI Applications

 RBFNs are effective in modeling and approximating complex non-linear relationships in


data, making them suitable for a wide range of problem-solving tasks.
 They require relatively fewer training samples compared to other neural network
architectures, making them efficient in scenarios with limited training data.
 RBFNs offer transparent and interpretable models, allowing users to understand the
reasoning behind the model's predictions and decisions.

Drawbacks and Limitations

 RBFNs are susceptible to overfitting when the number of basis functions is not
appropriately selected, leading to reduced generalization capabilities.

12
 They can be computationally expensive when dealing with high-dimensional data,
which may limit their scalability in certain applications.
 Designing the architecture and selecting appropriate parameters for RBFNs can be a
challenging task, requiring domain expertise and rigorous experimentation.

GENERALIZED RBF NETWORKS

13
LEARNING IN RBFNS

14
⁃ In Single Perceptron / Multi-layer Perceptron(MLP), we only have linear separability
because they are composed of input and output layers(some hidden layers in MLP)
⁃ For example, AND, OR functions are linearly-separable & XOR function is not linearly
separable.

Linear-separability of AND, OR, XOR functions

⁃ We atleast need one hidden layer to derive a non-linearity separation.


⁃ Our RBNN what it does is, it transforms the input signal into another form, which can be
then feed into the network to get linear separability.
⁃ RBNN is structurally same as perceptron(MLP).

15
⁃ RBNN is composed of input, hidden, and output layer. RBNN is strictly limited to have
exactly one hidden layer. We call this hidden layer as feature vector.
⁃ RBNN increases dimenion of feature vector.

Simplest diagram shows the architecture of RBNN

Extended diagram shows the architecture of RBNN with hidden functions.

⁃ We apply non-linear transfer function to the feature vector before we go for classification
problem.
⁃ When we increase the dimension of the feature vector, the linear separability of feature vector
increases.

16
A non-linearity separable problem(pattern classification problem) is highly separable in high
dimensional space than it is in low dimensional space.
[Cover’s Theorem]
⁃ What is a Radial Basis Function ?
⁃ we define a receptor = t
⁃ we draw confrontal maps around the receptor.
⁃ Gaussian Functions are generally used for Radian Basis Function(confrontal mapping). So we
define the radial distance r = ||x- t||.

Radial distance and Radial Basis function with confrontal map


Gaussian Radial Function :=
ϕ(r) = exp (- r²/2σ²)
where σ > 0

Classification only happens on the second phase, where linear combination of hidden
functions are driven to output layer.
Advantages of using RBNN than the MLP :-
1. Training in RBNN is faster than in Multi-layer Perceptron (MLP) → takes many
interactions in MLP.

17
2. We can easily interpret what is the meaning / function of the each node in hidden layer of
the RBNN. This is difficult in MLP.
3. (what should be the # of nodes in hidden layer & the # of hidden layers)
this parameterization is difficult in MLP. But this is not found in RBNN.
4. Classification will take more time in RBNN than MLP.

COMPARE RBF AND MULTILAYER PERCEPTION

Feature RBF MLP


Typically one hidden layer with radial Can have one or more hidden
Hidden Layer basis functions layers with activation functions
Two-stage learning: finding centers Backpropagation to adjust
Learning and widths, then fitting a linear model weights and biases
Faster learning, less sensitive to data
order, potentially better for specific Higher flexibility, potentially
Advantages tasks better accuracy
Less flexible, may not be suitable for Slower learning, can be sensitive
Disadvantages complex problems to data order
Function approximation, interpolation, Image recognition, speech
classification (especially with kernel recognition, natural language
Applications methods) processing
Good for fast learning and specific More powerful and flexible for
Summary tasks with radial basis functions complex problems

Radial Basis Function Network (RBF)


A Radial Basis Function network is an artificial forward single hidden layer feed neural
network that uses in the field of mathematical modeling as activation functions.
The output of the RBF network is a linear combination of neuron parameters and radial
basis functions of the inputs. This network is used in time series prediction, function
approximation, system control and classification.

Cover’s theorem on the separability of patterns:


This theorem justify the use of a linear output layer in an RBF network. According to this
theorem, the transformation from the input space to the feature space is nonlinear and the
dimensionality of the feature space is high compared to that of the input space, so there is
a high likelihood that a non separable pattern classification task in the input space is
transformed into a linearly separable one in the feature space.

Interpolation problem:
It requires every input vector to be mapped exactly onto the corresponding target vector.
The interpolation problem is to determine the real coefficient and the polynomial term.
The function is called a radial basis function if the interpolation problem has a unique
solution.
The learning of a neural network is viewed as a hyper surface reconstruction problem, is
an ill-posed inverse problem for following reasons-

18
i) Lack of information in the training data as need to reconstruct the input-output mapping
uniquely.
ii) Presence of noise in the input data adds uncertainty.

Regularization theory-
Regularization technique is a way of controlling the smoothness properties of a mapping
function. It involves adding to the error function an extra term which is designed to
penalize mappings which are not smooth. Instead of restricting the number of hidden
units, an alternative approach for preventing over fitting in RBF networks comes from the
regularization theory.

Regularization network-
RBF network can be seen as special case of regularization network. RBF network have
sound theoretical foundation in regularization theory. RBF network fit naturally into the
framework of the regularization of interpolation/approximation task. For these problems,
regularization means the smoothing of the interpolation/approximation curve, surface.
This approach to RBF network, is also known as regularization network.

Generalized Radial Basis Function networks (RBF):

RBF network have good generalization ability and a simple network structure that avoids
unnecessary and lengthy calculation. The modified or generalized RBF network has
following characteristics-
i) Gaussian function is modified
ii) Hidden neuron activation is normalized
iii) Output weights are the function of input variables
iv) A sequential learning algorithm is presented.

Regularization parameter estimation:


Unknown weights and the error variance are estimated by regularization. The
regularization parameter have an effect on reducing the variances of the network
parameter estimates. The maximum penalized likelihood estimates the weight parameter
in the RBF network and regularization parameter is given as β=Өα2 and α2 is error
variance.

RBF networks- Approximation properties:


RBF are embedded in a two layer neural network, where each hidden unit implements
radial activated function. The output unit implement weighted sum of hidden unit outputs.
While the input into RBF network is nonlinear, the output is often linear. Owing to their
nonlinear approximation capabilities, RBF network are able to model complex mappings.

RBF networks and multilayer Perceptron comparison:


Similarities-
i) They are both non-linear feed forward network
ii) They are both universal approximates
iii) They are both used in similar application areas
Dissimilarities-
An RBF network has a single hidden layer, whereas multilayer perceptron can have any
number of hidden layers.

19
Kernel regression and RBF networks relationship:
The theory of kernel regression provides another viewpoint for the use of RBF network
for function approximation. It provides a framework for estimating regression function for
noisy data using kernel density estimation technique. The objective of function
approximation is to find a mapping from input space to output space. The mapping is
provided by forming the regression, or conditional average of target data, conditioned on
input variables. The regression function is known as the Nadaraya-Watson estimator.

Learning strategies:
Common learning strategies are orthogonal least squares method and hybrid learning
method. In OLS the hidden neuron, RBF centers, are selected one by one in a supervised
manner. Computationally more efficient hybrid learning method combines both self
organized and supervised learning strategies.

20

You might also like