machine learning assignment2
machine learning assignment2
Exercise 1: a, b)
Number of input features = 128 (No. of columns/features in Dataset except for the last one)
Number of outputs = 4 (as we only have 4 labels in Ground-Truth i.e., 0, 1, 2, 3)
Exercise 1: c)
As we increase the threshold value (theta) in fixed uncertainty sampling, the algorithm
becomes more conservative in its selection of examples. That means it selects examples that
are more certain, i.e., the maximum prediction probability of the selected examples is higher.
When we set a higher threshold value, such as theta=0.8, the algorithm selects only the most
confident examples, which are easier to learn and contribute positively to the model's
generalization ability. This helps to eliminate noise or outliers in the data, which leads to
improved accuracy. However, setting a very high threshold can result in a smaller pool of
labeled examples, leading to under-fitting or insufficient data for the algorithm to learn from.
Exercise 1: d)
Initially, with the higher initial threshold of 0.8, the model is more selective in choosing the
samples for training, which means it misses out on important samples that improves the
model performance. As the model gets trained on more samples, the model's predictions
become more accurate, and the maximum predicted probabilities become more reliable.
Therefore, the threshold is lowered, allowing the model to explore a wider range of samples
to learn from. This is why, as time progresses (time-step increases), smaller values of theta
(0.6 and 0.4) start to outperform the larger initial accuracy value of 0.8, as they allow the
model to explore a wider range of samples to learn from and improve the model's
performance.
Exercise 1: e)
Over time, the OSL algorithm gets to train on more and more examples as compared to the
OAL algorithm that selectively chooses only those examples that are most uncertain. As more
data is seen, the OSL algorithm is generalizing better, leading to better performance. The
variable uncertainty sampling strategy with an initial threshold of 0.8 is very selective in
choosing only those examples that it is very uncertain about, resulting in a very sparse and
possibly biased set of examples. This led to overfitting and poor generalization over time.
Exercise 2: a)
i. Distance matrix is calculated using Python (Code file attached)
Exercise 2: b)
If the resulting eigenvalues from PCA on a two-dimensional dataset are identical, it implies
that the two dimensions are equally significant and contribute equally to the variability in the
data. In other words, both dimensions are equally important to describe the underlying
patterns in the data.
Whether or not it is a good choice to pursue dimensionality reduction depends on the specific
situation. If the dataset contains many samples, reducing the dimensionality could be
beneficial for computational efficiency and ease of visualization. However, if the dataset is
relatively small and the two dimensions are equally important, reducing the dimensionality
could result in information loss and should be avoided.
Examples:
i. When the dataset consists of points on a line, there is only one dimension in the
data, the eigenvalues resulting from PCA would be identical, indicating that the
single dimension explains all the variability in the data. In this case, pursuing
dimensionality reduction would not be beneficial since there is only one
dimension to begin with.
ii. When the dataset has both dimensions (horizontal and vertical) and they are
equally important in describing the variability in the data, if we perform PCA on
this dataset, we expect to see identical eigenvalues. In this case PCA will result in
information loss and should be avoided.
Exercise 2: c)
If we apply PCA on a two-dimensional dataset and get the eigenvalues 6 and 2, it means that
the first principal component explains 75% of the variability in the data, while the second
principal component explains 25% of the variability in the data. If we pursue dimensionality
reduction on this dataset, we will keep only the first principal component, which explains most
of the variability in the data, and discard the second principal component. This would result
in a one-dimensional dataset that retains most of the information in the original dataset, while
reducing its dimensionality.
For a three-dimensional dataset with eigenvalues 0, 1, and 0, it means that two of the principal
components have zero variance, while the remaining principal component explains all the
variance in the data. This can happen if the three dimensions are linearly dependent, meaning
that one dimension is a linear combination of the other two dimensions. In this case, reducing
the dimensionality to one would retain all the information in the data, as the other two
dimensions provide no additional information.