Assignment 3 Masters of Engineering Project Report Template Copy
Assignment 3 Masters of Engineering Project Report Template Copy
Assignment 3
Multi Layer Perceptron and support Vector Machines
1410635
Contents
1 Introduction 2
4 Conclusions 10
Page 1
MecE 694 - Fall 2024
Classical Feature Selection and NSGA II Abdullah Abdullah
1 Introduction
This report investigates the application of machine learning using supervised classification
techniques for clustered data sets. Large datasets can contain many complex and high di-
mensional data points which may require comprehensive optimization techniques to achieve
chines (KVSM), and of Multi-Layer Perceptron Machines (MLP). We analyze how well these
algorithms perform in classifying a two-class dataset. The codes were partially completed
CODE 1: MLP (Multi Layer Perceptron): First we initialize the weights randomly
for both the input-hidden layer and hidden to output layer connections. The number of
observations was 200 for 2 classes. Then a Matlab MLP was implemented to classify the
data as a reference. Our customized MLP was generated utilizing sigmoid activation, and
training with back propagation. Then utilizing Matlab’s fminunc to optimize and find the
MLP weights which would minimize the objective function. Then a function was utilized to
predict the MLP outputs into two-classes, and then a confusion matrix was plotted.
KSVM to classify data by mapping it to a higher dimensional space using various kernel
functions. It allows for the separation of data which may not be linear by utilizing various
kernel functions. After the transformation, a boundary is made (and plotted) to effectively
separate the classes. Matlab’s KSVM’s were utilized to compare against the ones we would
generate. First a kernel function would map the two class data. The quadratic programming
problem would be solved, then optimized, and then only the values above a criteria would
be selected to find the support vectors. The decision boundary would be plotted and then a
confusion matrix be created to observe how well this function classifies data.
Page 2
MecE 694 - Fall 2024
Classical Feature Selection and NSGA II Abdullah Abdullah
The codes were ran 10 times for our own sigmoid function, and for each of the 3 different
Matlab functions to compare performance. The averages and it’s standard deviations are
tabulated below.
Observation: From the table above it is evident that for all cases, with added complexity
in the hidden layer network (e.g. more neurons), the accuracy increases across the board.
This is evident for all cases, but it reaches as plateau for our sigmoid function. It can be seen
that for train lm, the efficiency continually improves reaching a high of 97.5 percent, whereas
our sigmoid function only reached a maximum of 90.2 percent. It can also be seen that the
standard deviation also reduces across the board for all functions as the number of neurons
are increased, therefore indicating increased stability. However, our sigmoid function remains
to be the worst performer still with the lowest accuracy and highest standard deviation. ’
’ trainlm and trainscg in MATLAB handle larger networks more effectively, likely due to
optimized gradient calculations and convergence methods, whereas our custom MLP shows
diminishing returns after around 15 neurons. trainlm has the best accuracy and stability and
our sigmoid has the worst, which may be explained as ours lacks the advanced optimization
Page 3
MecE 694 - Fall 2024
Classical Feature Selection and NSGA II Abdullah Abdullah
medium sized neural networks, and is especially efficient for networks with fewer param-
eters. It combines gradient descent using a second order approximation, which helps it
converge quickly in networks with fewer neurons. However it may require larger memory,
over fitting by adjusting the effective number of parameters during training, which helps
balancing network complexity and error. However, due to its adjustment feature, it’s ideal
Page 4
MecE 694 - Fall 2024
Classical Feature Selection and NSGA II Abdullah Abdullah
for datasets which are particularly noisy or small datasets where over fitting may be a
concern. While it can be slower than train lm counterpart, it often can results in enhanced
method which is first order (does not compute second order derivatives) and it makes is
suitable for large datasets with many parameters. It however, is slower to converge than
trainlm, however it does utilize less memory which can make it preferable for larger datasets.
This is chosen when memory is limited, or when the dataset is large and convoluted.
To implement an MLP with any desired number of hidden layers, I need to adjust the
code to dynamically create and initialize weight matrices for each additional hidden layer. I
would also modify the forward and backward propagation steps to loop through each hidden
layer, applying the activation function and calculating gradients at each layer during back
propagation. Additionally, I would update the prediction function to handle multiple hidden
layers by iteratively passing outputs from each layer as inputs to the next.
To solve a multi-class classification problem, I would need to modify the output layer to
have one neuron per class instead of a single output, using a softmax activation function
to produce probabilities for each class. The loss function should be updated to categorical
select the class with the highest probability as the final output for each data point.
Page 5
MecE 694 - Fall 2024
Classical Feature Selection and NSGA II Abdullah Abdullah
For the analysis, each initialized set of data was then passed through our KSVM (Gaussian,
logistic and hyperbolic tangent kernel transforms) and through Matlab’s Gaussian and poly-
nomial KSVMs. This was done to ensure that the data set remains the same as we wish
to observe the performance of each KSVM, while holding the input class data constant as
a control. The accuracy results are plotted below. Note that separately, I had a for loop
which would cycle through various parameter settings to then manually observe and predict
Observation: The best performance comes from the Gaussian kernel transform, whether
it is our own function, or Matlab’s built in functions. the worst is the polynomial transform
of Matlab’s with the logistic and hyperbolic tangent ones all in between.
3.1.1 Gaussian:
performs best on complex non-linear datasets as it allows for the KSVM to create flex-
ible decision boundaries that can adapt around complex data structures. We have also
observed this as will be shown in the next section. This kernel function works by consid-
ering the distance between data points, mapping them into a higher-dimensional space in
Page 6
MecE 694 - Fall 2024
Classical Feature Selection and NSGA II Abdullah Abdullah
which separation of classes becomes easier. This feature makes it particularly effective for
datasets that contain clusters of varying density or irregular shapes, as it can bend and curve
to fit the data distribution. This is what we have seen it to outperform all the other Kernels.’
3.1.2 Polynomial:
This can work well if our data has polynomial characteristics to it. However also tuning
is also crucial to prevent over-fitting or under-fitting. Although the polynomial kernel can
perform well in cases where the underlying relationships follow a polynomial pattern, it may
not generalize as effectively as the Gaussian kernel on datasets with diverse data distribu-
tions. This is why we have observed to be the worst performer, despite objectively tuning
These kernels introduce non-linearity and can be effective to capture and transform data
sets which have logistic or hyperbolic-tangent features. However again, these are sensitive
to parameter settings and can also present issues in convergence if the data set is not in
We will see in the following section, that apart from the Gaussian KSVM, none of the
other decision boundaries are able to bend around the data as well. The Polynomial, the
hyperbolic tangent, and logistic all make curves to attempt to separate the data, but do not
Page 7
MecE 694 - Fall 2024
Classical Feature Selection and NSGA II Abdullah Abdullah
Sample 3 is shown in the figure below, where we have plotted the sample data, the Logis-
tic, Hyperbolic Tangent and the Gaussian Kernel SVM Transforms, along with each one’s
individual classification accuracy. As discussed above, it is evident that the Gaussian deci-
sion boundary is the most effective and best at wrapping around the convoluted data and
segregating the two classes of data, which is why it is the best performer.
Page 8
MecE 694 - Fall 2024
Classical Feature Selection and NSGA II Abdullah Abdullah
As we can see from the figure below. As the Gaussian parameter is reduced, the decision
boundary wraps around the data much better, thereby leading to higher accuracy. This
could potentially be capturing noise and over-fitting however. Conversely, we observe that
as we increase the Gaussian parameter, it then approaches more to a linear KSVM solution,
and the accuracy plateaus. Note that so does the Accuracy Plateaus
Page 9
MecE 694 - Fall 2024
Classical Feature Selection and NSGA II Abdullah Abdullah
4 Conclusions
Page 10