0% found this document useful (0 votes)
82 views26 pages

CS6011: Kernel Methods For Pattern Analysis: Submitted by

The document summarizes the results of applying various machine learning algorithms including polynomial curve fitting, linear regression with Gaussian basis functions, generalized RBF, MLFFNN, GMM, and Bayesian classification to different datasets. For each algorithm and dataset, it reports the optimal hyperparameters, mean squared error on test/train/validation data, and classification accuracy based on the confusion matrix. It finds that some algorithms like GMM and Bayesian classification work best for linearly separable data, while other models like polynomial fitting and MLFFNN perform better on more complex datasets.

Uploaded by

Pawan Goyal
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views26 pages

CS6011: Kernel Methods For Pattern Analysis: Submitted by

The document summarizes the results of applying various machine learning algorithms including polynomial curve fitting, linear regression with Gaussian basis functions, generalized RBF, MLFFNN, GMM, and Bayesian classification to different datasets. For each algorithm and dataset, it reports the optimal hyperparameters, mean squared error on test/train/validation data, and classification accuracy based on the confusion matrix. It finds that some algorithms like GMM and Bayesian classification work best for linearly separable data, while other models like polynomial fitting and MLFFNN perform better on more complex datasets.

Uploaded by

Pawan Goyal
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

CS6011 : Kernel Methods for Pattern Analysis

Assignment 1

Submitted by
CS08B025 CS08B036

Regression tasks:
i) Polynomial Curve fitting (Dataset I, univariate data):

Degree of Polynomial for which MSE is minimum (for validation data) = 7 and corresponding MSE on test data = 4.9869e+003 From the Graph , we can see steep decrease in MSE from degree = 2 to degree = 3. So, data can be approximated by 3rd degree polynomial. And also as the degree of polynomial is increasing ( 16 to 20), MSE on test data is also increasing (Over Fitting).

Plots of Model output and Target output

Test data

Training data

Validation data Approximating the given data by a polynomial of degree 7 (optimum). Y(x, W) = w0 + w1 * x + w2 * x^2 + + w7 * x^7.

Scatter plot with target output on x-axis and model output on y-axis

Test data

Training data

Validation data

These plots can be approximated as straight lines of slope close to 1 (which shows target output ans model output are approximately equal). Ideally, If the model can approximate data exactly then all the points in the plot fall onto a line of slope = 1.

ii)

Linear Model for Regression using Gaussian Basis functions(Dataset II, bivariate data):

Plots of Mean square error Lambda = 1

Test data

Training data

Validation data When lambda = 1 , optimum values of Width of gaussian = 7.6000, no. of clusters = 20, MSE (val. data)= 338.0205

From the Graphs , MSE is very high when c is low (close to zero). As value of basis functions is low (exp(-(r*r)/(c*c))).

Lambda = 0.5

Test data

Training data

Validation data When lambda = 0.5 , optimum values of Width of gaussian = 8.4000, no. of clusters = 20, MSE (val. data)= 263.5788

As the value of C (width of gaussian) is increasing from 0 , MSE decreases first and after reaching a optimum C, starts increasing. When C is near to zero the outputs are predicted by the points which are very near to the mean and we are not able to predict points farther from the mean. The reverse happens when we increase the value of C. And as the no. of Clusters is increasing, MSE is decreasing. Increasing number of cluster after some time leads to overlap of the Gaussians which leads to wrong prediction. For a larger variance even slightly less number of clusters may give high error.

Lambda = 0

When lambda = 0 , optimum values of Width of gaussian = 9.9000, no. of clusters = 18, MSE (val. data)= 39.8534

Scatter plot with target output on x-axis and model output on y-axis

Test data

Training data

Validation data

Plots of Model output and Target output

Test data Blue Target output, Green Model output

iii)

Generalized RBF Model: For dataset I :


Plots of Mean square error

Test data

Training data

Validation data

When lambda = 0 , optimum values of Width of gaussian = 1.4970, no. of clusters = 6, MSE (val. data)= 223.691 MSE is approximately in the same range for different values of c and no.of clusters.

Plots of Model output and Target output

Test data

Training data

Validation data

Scatter plot with target output on x-axis and model output on y-axis

Test data

Training data

Validation data

For dataset II :
Plots of Mean square error

Test data

Training data

Lambda = 0.5

Validation data

When lambda = 0 .5, optimum values of Width of gaussian = 3.4000, no. of clusters = 20, MSE (val. data)= 302.9161
For high values of width of Gaussian and no.of clusters , MSE is also high.

Plots of Model output and Target output

Test data

Scatter plot with target output on x-axis and model output on y-axis

Test data

Training data

Validation data Performance of Generalized RBF is better for Dataset II(Bivariate) than Dataset I(Univariate) (from scatter plots). And Linear model for regression performed better than Generalized RBF, but reverse is expected.

For dataset III :


Plots of Mean square error

Test data

Training data

Lambda = 0.5

Validation data

When lambda = 0 .5, optimum values of Width of gaussian = 190, no. of clusters = 5, MSE (val. data)= 410.3364
MSE is similar to above case (dataset II), initially decreasing with c, after reaching minimum, starts increasing.

Scatter plot with target output on x-axis and model output on y-axis

Test data

Training data

Validation data

MLFFNN: For Dataset I :


Plots of Mean square error
Colour is proportional to the surfaceheight.

Test data

Training data

Validation data Optimum Values of no.of nodes in Hidden layers Hidden Layer 1 = 11 Hidden Layer 2 = 7 We observe that choosing of optimal parameters in MLFFNN regression is very important as the error changes drastically for different values of hidden layers.

Scatter plot with target output on x-axis and model output on y-axis

Test data

Training data

Validation data

For Dataset II: Plots of Mean square error

Test data

Training data

Validation data Optimum Values of no.of nodes in Hidden layers: Hidden layer 1 = 7 Hidden layer 2 =9

Scatter plot with target output on x-axis and model output on y-axis

Test data

Training data

Validation data

For Dataset III: Plots of Mean square error

Test data

Training data

Validation data

Scatter plot with target output on x-axis and model output on y-axis

Test data

Training data

Validation data

Classification Tasks:
GMM Dataset Ia (Linearly Separable Data):
While doing GMM, in UCI benchmark, in the initialization phase, we did k means clustering. After doing k means, when we found the covariance matrices, we found that some rows of the covariance matrices were all rows. We inferred that for all the feature vectors, values of a particular dimension are same. To overcome this ill conditioned covariance matrix problem, we added gaussian noise for this dimension. We had to scale the data appropriately, otherwise all the components were collapsing to a single component.

We use a greedy strategy for finding the number of mixtures. We set maximum number of mixtures to k_max and initialize the number of mixtures to be k_max/2 for all the classes. Then we vary k from 1 to k_max for each class keeping the ks for other classes fixed. Validation is done for improving each classes accuracy without taking into consideration global accuracy. After we have optimal k for each class, we change ks for each class from 1 to k_max and find out if global accuracy is increasing on validation data. This is done two times. After applying the above method we observed the following, the linearly seperable data and overlapping data were both gaussian distribusions, so k = 1 was the optimal distribution. For the non linearly seperable data, k greater than 17 was able to accurately classify the data for both the classes. We used same covariance matrix for this case as estimating parameters is not possible with less data. For UCI benchmark data we got k's equal to 10, 19, 3, 7, 1 and were able

to get accuracy greater 98.5%. We infer that increasing k beyond a certain point does not increase accuracy as the clusters converge to the same mean. For image dataset GMM is comparable to mlffnn and bayes classifier. All the classifiers give almost the same accuracy for this dataset. We were not able to train the GMM for full covariance matrices for all mixtures because of scarcity of data available for images(100 data points as training data, as we had only 100 images for each class).

Decision Region Plot:

Confusion Matrix = 100 0 0 0 0 0 0

0 100 0 0

0 100 0

0 100

accuracy = 100.
Optimum no. of Clusters(K): 1.000000 1.000000 1.000000 1.000000

Dataset Ib (Non-linearly separable data):


Decision Region Plot:

Optimum no.of Clusters(K): 17.000000 17.000000 Accuracy 100.000000 Confusion matrix 489.000000 0.000000 0.000000 488.000000

Dataset Ic (Overlapping data) :


Decision Region Plot:

Confusion Matrix = 80 0 14 4 6 7 0

0 89

7 16 77 8 13

0 79

accuracy = 81.2500 Optimum no.of Clusters(K): 1.000000 1.000000 1.000000 1.000000

Dataset II (UCI Benchmark Data) :


Optimum no. of Clusters(K): 10.000000 19.000000 3.000000 Accuracy 98.355755 Confusion matrix 278.000000 1.000000 0.000000 270.000000 0.000000 9.000000 0.000000 0.000000 0.000000 1.000000 7.000000 1.000000

0.000000 8.000000 271.000000 0.000000 0.000000

0.000000 1.000000 0.000000 218.000000 0.000000

1.000000 0.000000 0.000000 1.000000 279.000000

Dataset III :
Optimum no. of Clusters(K): 3.000000 1.000000 3.000000 Accuracy 76.250000 Confusion matrix 141.000000 11.000000 19.000000 96.000000 13.000000 21.000000

26.000000 43.000000 190.000000

Bayes Classifier(Gaussian Model):


Bayes classifier is able to classify linearly separable data accurately, however it does not work well with non linearly separable data. It does well in terms of accuracy and is comparable to other classifiers like gmm and mlffnn for overlapping data as the data seems to be a gaussian distribution. It gives the better accuracy than gmm (77%) for scene classification as we can estimate the full covariance matrix in this case. It even performs better than mlffnn. Though we got 91% accuracy, it does not give very good results when compared to gmm and mlffnn when we ran on UCI benchmark database.

For Dataset Ia:

Accuracy 100.000000 Confusion matrix 100.000000 0.000000 0.000000 100.000000 0.000000 0.000000 0.000000 0.000000

0.000000 0.000000 100.000000 0.000000

0.000000 0.000000 0.000000 100.000000

For Dataset Ib:


Accuracy 64.073695 Confusion matrix 319.000000 170.000000 181.000000 307.000000

Decision Region Plot

For Dataset Ic: Decision Region Plot:

Accuracy 81.250000 Confusion matrix 80.000000 0.000000 0.000000 89.000000 7.000000 16.000000 8.000000 13.000000

14.000000 4.000000 77.000000 0.000000

6.000000 7.000000 0.000000 79.000000

For Dataset II :
Accuracy 91.704036

Confusion matrix 273.000000 1.000000 0.000000 184.000000 0.000000 6.000000 0.000000 0.000000 0.000000 2.000000

0.000000 60.000000 274.000000 0.000000 0.000000

0.000000 34.000000 0.000000 218.000000 0.000000

6.000000 1.000000 0.000000 1.000000 278.000000

For Dataset III:


Accuracy 77.678571 Confusion matrix 136.000000 16.000000 14.000000 100.000000 7.000000 18.000000

26.000000 44.000000 199.000000

Perceptron for Dataset Ia :

Decision Region Plot :

Confusion Matrix = 100 0 0 0 0 0 0

0 100 0 0

0 100 0

0 100

Accuracy = 100 Perceptron was used to classify linearly seperable dataset. Voting mechanism was used for classifying. We got 100% percent accuracy for test data. However, the boundary drawn is never the optimal seperating hyperplane.

MLFFNN:
For Dataset Ia :
It gave very good results on synthetic data and the UCI benchmark data (> 99% for all of these except the overlapping dataset). However its performance was the poorest(75%) when we ran on the image data set(though the difference between all the classifiers was within 2%). Although we get very good accuracies, we need to choose the number of elements in the layers correctly otherwise mlffnn can give very bad results. For UCI benchmark dataset the accuracy varied from 20% to 99.4%, for image dataset from 50 to 75%, for non linearly seperable synthetic dataset 92% to 100%, for overlapping data 25% to 84.5%, when we varied the number of hidden layers(2 layers) from 7 to 12, so cross validation and plays a very important role in MLFFNN for model selection. However for other classifiers, (even for gmm, the variability was not so high). We had sigmoidal for hidden layers and linear for output layer. Making the transfer function for both the hidden layers linear decreased accuracy drastically for nonlinearly seperable dataset as a linear classifier cannot solve a non linear problem. However keeping one of the

layers non linear we were able to get 100% accuracy for a few combinations of hidden layers but in general the performance was not very good for most of the combinations. So, it is better to have non linear transfer functions.

Decision Region Plot:

Hidden Layer1 10.000000 Hidden Layer2 7.000000 Accuracy 100.000000 Confusion matrix 100.000000 0.000000 0.000000 0.000000 0.000000 100.000000 0.000000 0.000000 0.000000 0.000000 100.000000 0.000000 0.000000 0.000000 0.000000 100.000000

For Dataset Ib : Decision Region Plot:

Hidden Layer1 8.000000 Hidden Layer2 9.000000 Accuracy 100.000000 Confusion matrix 489.000000 0.000000 0.000000 488.000000

For Dataset Ic: Decision Region Plot:

Hidden Layer1 7.000000 Hidden Layer2 12.000000

Accuracy 84.750000 Confusion matrix 76.000000 0.000000 0.000000 87.000000 2.000000 16.000000 4.000000 2.000000

10.000000 4.000000 82.000000 0.000000

14.000000 9.000000 0.000000 94.000000

For Dataset II:


Hidden Layer1 10.000000 Hidden Layer2 12.000000 Accuracy 99.476831 Confusion matrix 278.000000 1.000000 0.000000 278.000000 0.000000 3.000000 0.000000 0.000000 0.000000 0.000000

0.000000 1.000000 277.000000 0.000000 0.000000

0.000000 0.000000 0.000000 218.000000 0.000000

1.000000 0.000000 0.000000 1.000000 280.000000

For Dataset III:


Hidden Layer1 9.000000 Hidden Layer2 12.000000 Accuracy 75.535714 Confusion matrix 136.000000 23.000000 17.000000 118.000000 13.000000 42.000000

19.000000 23.000000 169.000000

Datasets
From the output of the GMM, MLFFNN, Bayes classifier we infer that image database has a lot of overlap and the UCI benchmark dataset is non-linearly separable with very less overlap.

You might also like