0% found this document useful (0 votes)
7 views11 pages

Assignment 3 Masters of Engineering Project Report Template Copy

This report explores the application of Multi-Layer Perceptron (MLP) and Kernel Support Vector Machines (KSVM) for classifying clustered data sets. It compares the performance of a custom sigmoid MLP against MATLAB's built-in MLPs and analyzes the effectiveness of various kernel functions in KSVM, highlighting that the Gaussian kernel outperforms others. The findings indicate that while MLPs can improve with complexity, KSVM's Gaussian kernel provides superior flexibility and accuracy in handling complex data distributions.

Uploaded by

Abdullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views11 pages

Assignment 3 Masters of Engineering Project Report Template Copy

This report explores the application of Multi-Layer Perceptron (MLP) and Kernel Support Vector Machines (KSVM) for classifying clustered data sets. It compares the performance of a custom sigmoid MLP against MATLAB's built-in MLPs and analyzes the effectiveness of various kernel functions in KSVM, highlighting that the Gaussian kernel outperforms others. The findings indicate that while MLPs can improve with complexity, KSVM's Gaussian kernel provides superior flexibility and accuracy in handling complex data distributions.

Uploaded by

Abdullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

MECE 694:

Assignment 3
Multi Layer Perceptron and support Vector Machines

Abdullah Abdullah - P.Eng

1410635

Prepared for Dr. Milad Nazarahari,


November 8th 2024
MecE 694 - Fall 2024
Classical Feature Selection and NSGA II Abdullah Abdullah

Contents
1 Introduction 2

2 Code 1 : MLP (Multi Layer Perceptron) 3


2.1 Accuracy of Matlab’s MLPs vs our Sigmoid MLP’s . . . . . . . . . . . . . . 3
2.2 Matlab’s MLP’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Potential Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.1 More hidden layers: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.2 Multi-Class Classification Problem . . . . . . . . . . . . . . . . . . . 5

3 Code 2 : KSVM (Kernel Support Vector Machine): 6


3.1 Accuracy comparison of our KSVM vs Matlab’s built in. . . . . . . . . . . . 6
3.1.1 Gaussian: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.2 Polynomial: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.3 Logistic and Hyperbolic Tangent: . . . . . . . . . . . . . . . . . . . . 7
3.2 Graphical Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Varying Gaussian Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Conclusions 10

Page 1
MecE 694 - Fall 2024
Classical Feature Selection and NSGA II Abdullah Abdullah

1 Introduction

This report investigates the application of machine learning using supervised classification

techniques for clustered data sets. Large datasets can contain many complex and high di-

mensional data points which may require comprehensive optimization techniques to achieve

appropriate classification. We explore the implementation of Kernel Vector Support Ma-

chines (KVSM), and of Multi-Layer Perceptron Machines (MLP). We analyze how well these

algorithms perform in classifying a two-class dataset. The codes were partially completed

CODE 1: MLP (Multi Layer Perceptron): First we initialize the weights randomly

for both the input-hidden layer and hidden to output layer connections. The number of

observations was 200 for 2 classes. Then a Matlab MLP was implemented to classify the

data as a reference. Our customized MLP was generated utilizing sigmoid activation, and

training with back propagation. Then utilizing Matlab’s fminunc to optimize and find the

MLP weights which would minimize the objective function. Then a function was utilized to

predict the MLP outputs into two-classes, and then a confusion matrix was plotted.

Code 2: KSVM (Kernel Support Vector Machine): This section implements

KSVM to classify data by mapping it to a higher dimensional space using various kernel

functions. It allows for the separation of data which may not be linear by utilizing various

kernel functions. After the transformation, a boundary is made (and plotted) to effectively

separate the classes. Matlab’s KSVM’s were utilized to compare against the ones we would

generate. First a kernel function would map the two class data. The quadratic programming

problem would be solved, then optimized, and then only the values above a criteria would

be selected to find the support vectors. The decision boundary would be plotted and then a

confusion matrix be created to observe how well this function classifies data.

Page 2
MecE 694 - Fall 2024
Classical Feature Selection and NSGA II Abdullah Abdullah

2 Code 1 : MLP (Multi Layer Perceptron)


2.1 Accuracy of Matlab’s MLPs vs our Sigmoid MLP’s

The codes were ran 10 times for our own sigmoid function, and for each of the 3 different

Matlab functions to compare performance. The averages and it’s standard deviations are

tabulated below.

Figure 1: Accuracy of Matlab’s MLPs vs our Sigmoid MLP’s Comparisons

Observation: From the table above it is evident that for all cases, with added complexity

in the hidden layer network (e.g. more neurons), the accuracy increases across the board.

This is evident for all cases, but it reaches as plateau for our sigmoid function. It can be seen

that for train lm, the efficiency continually improves reaching a high of 97.5 percent, whereas

our sigmoid function only reached a maximum of 90.2 percent. It can also be seen that the

standard deviation also reduces across the board for all functions as the number of neurons

are increased, therefore indicating increased stability. However, our sigmoid function remains

to be the worst performer still with the lowest accuracy and highest standard deviation. ’

’ trainlm and trainscg in MATLAB handle larger networks more effectively, likely due to

optimized gradient calculations and convergence methods, whereas our custom MLP shows

diminishing returns after around 15 neurons. trainlm has the best accuracy and stability and

our sigmoid has the worst, which may be explained as ours lacks the advanced optimization

which the others have.

Page 3
MecE 694 - Fall 2024
Classical Feature Selection and NSGA II Abdullah Abdullah

Figure 2: Custom Sigmoid MLP vs Matlab MLPs Accuracy Comparision

2.2 Matlab’s MLP’s

trainlm (Levenberg-Marquadt): This is a fast and efficient algorithm for training

medium sized neural networks, and is especially efficient for networks with fewer param-

eters. It combines gradient descent using a second order approximation, which helps it

converge quickly in networks with fewer neurons. However it may require larger memory,

therefore it may not perform well on extremely large datasets.

trainbr (Bayesian Regularization): This utilizes Bayesian regularization to prevent

over fitting by adjusting the effective number of parameters during training, which helps

balancing network complexity and error. However, due to its adjustment feature, it’s ideal

Page 4
MecE 694 - Fall 2024
Classical Feature Selection and NSGA II Abdullah Abdullah

for datasets which are particularly noisy or small datasets where over fitting may be a

concern. While it can be slower than train lm counterpart, it often can results in enhanced

robustness, particularly for data with noise or where generalization is crucial .

trainscg(Scaled Conjugate Gradient): This a memory efficient, gradient based

method which is first order (does not compute second order derivatives) and it makes is

suitable for large datasets with many parameters. It however, is slower to converge than

trainlm, however it does utilize less memory which can make it preferable for larger datasets.

This is chosen when memory is limited, or when the dataset is large and convoluted.

2.3 Potential Improvements


2.3.1 More hidden layers:

To implement an MLP with any desired number of hidden layers, I need to adjust the

code to dynamically create and initialize weight matrices for each additional hidden layer. I

would also modify the forward and backward propagation steps to loop through each hidden

layer, applying the activation function and calculating gradients at each layer during back

propagation. Additionally, I would update the prediction function to handle multiple hidden

layers by iteratively passing outputs from each layer as inputs to the next.

2.3.2 Multi-Class Classification Problem

To solve a multi-class classification problem, I would need to modify the output layer to

have one neuron per class instead of a single output, using a softmax activation function

to produce probabilities for each class. The loss function should be updated to categorical

cross-entropy to handle multiple classes effectively. Additionally, during prediction, I would

select the class with the highest probability as the final output for each data point.

Page 5
MecE 694 - Fall 2024
Classical Feature Selection and NSGA II Abdullah Abdullah

3 Code 2 : KSVM (Kernel Support Vector Machine):


3.1 Accuracy comparison of our KSVM vs Matlab’s built in.

For the analysis, each initialized set of data was then passed through our KSVM (Gaussian,

logistic and hyperbolic tangent kernel transforms) and through Matlab’s Gaussian and poly-

nomial KSVMs. This was done to ensure that the data set remains the same as we wish

to observe the performance of each KSVM, while holding the input class data constant as

a control. The accuracy results are plotted below. Note that separately, I had a for loop

which would cycle through various parameter settings to then manually observe and predict

what would be the best parameters.

Figure 3: Accuracy comparison of our KSVM vs Matlab’s built in.

Observation: The best performance comes from the Gaussian kernel transform, whether

it is our own function, or Matlab’s built in functions. the worst is the polynomial transform

of Matlab’s with the logistic and hyperbolic tangent ones all in between.

3.1.1 Gaussian:

performs best on complex non-linear datasets as it allows for the KSVM to create flex-

ible decision boundaries that can adapt around complex data structures. We have also

observed this as will be shown in the next section. This kernel function works by consid-

ering the distance between data points, mapping them into a higher-dimensional space in

Page 6
MecE 694 - Fall 2024
Classical Feature Selection and NSGA II Abdullah Abdullah

which separation of classes becomes easier. This feature makes it particularly effective for

datasets that contain clusters of varying density or irregular shapes, as it can bend and curve

to fit the data distribution. This is what we have seen it to outperform all the other Kernels.’

3.1.2 Polynomial:

This can work well if our data has polynomial characteristics to it. However also tuning

is also crucial to prevent over-fitting or under-fitting. Although the polynomial kernel can

perform well in cases where the underlying relationships follow a polynomial pattern, it may

not generalize as effectively as the Gaussian kernel on datasets with diverse data distribu-

tions. This is why we have observed to be the worst performer, despite objectively tuning

for the best parameters. ’

3.1.3 Logistic and Hyperbolic Tangent:

These kernels introduce non-linearity and can be effective to capture and transform data

sets which have logistic or hyperbolic-tangent features. However again, these are sensitive

to parameter settings and can also present issues in convergence if the data set is not in

congruence with the kernel transform we intend to implement. ’

We will see in the following section, that apart from the Gaussian KSVM, none of the

other decision boundaries are able to bend around the data as well. The Polynomial, the

hyperbolic tangent, and logistic all make curves to attempt to separate the data, but do not

do as well of a job as does Gaussian.

Page 7
MecE 694 - Fall 2024
Classical Feature Selection and NSGA II Abdullah Abdullah

3.2 Graphical Representation

Sample 3 is shown in the figure below, where we have plotted the sample data, the Logis-

tic, Hyperbolic Tangent and the Gaussian Kernel SVM Transforms, along with each one’s

individual classification accuracy. As discussed above, it is evident that the Gaussian deci-

sion boundary is the most effective and best at wrapping around the convoluted data and

segregating the two classes of data, which is why it is the best performer.

Figure 4: Comparison of the different KSVM’s and their decision boundaries

Page 8
MecE 694 - Fall 2024
Classical Feature Selection and NSGA II Abdullah Abdullah

3.3 Varying Gaussian Parameters

As we can see from the figure below. As the Gaussian parameter is reduced, the decision

boundary wraps around the data much better, thereby leading to higher accuracy. This

could potentially be capturing noise and over-fitting however. Conversely, we observe that

as we increase the Gaussian parameter, it then approaches more to a linear KSVM solution,

and the accuracy plateaus. Note that so does the Accuracy Plateaus

Figure 5: Observing Decision Boundary as Gaussian Parameter is Varied

Page 9
MecE 694 - Fall 2024
Classical Feature Selection and NSGA II Abdullah Abdullah

4 Conclusions

Page 10

You might also like