Machine Learning1
Machine Learning1
Assignment_03 - 016/02/2023
In this experiment, the learning rate of 0.1 resulted in the best validation accuracy. This
suggests that the optimization algorithm was able to quickly find a good set of weights for the
neural network with this learning rate.
The learning rate of 0.1 resulted in the best validation accuracy. We can see that the
corresponding loss curve started with a relatively high value and decreased quickly, resulting
in fast convergence to an optimal set of weights. Whereas, the accuracy curve increased
steadily and reached a high value, indicating that the neural network was able to generalize
well to unseen data. Learning rate of 0.01 was also close in comparison to 0.1.
For other learning rates, we can identify that loss curve started with a high value but it was
computationally slow and took many epochs to get closer to the ground truth which resulted
in poor training performance on validation set.
Exercise 1: b)
Number of Neurons: 2 in both dense layers
Validation Accuracy: 65.3%
Exercise 1: c)
Dropout Rate: 0.2
Validation Accuracy: 72.1%
The loss curve for the model with a dropout rate of 0.2 started with a higher value and then
decreased quickly, resulting in fast convergence. The accuracy has increased steadily and then
plateaued at a high value, indicating good generalization performance. The model with a
dropout rate of 0.5 had a similar pattern, but with a lower final accuracy due to dropping out
more neurons during training.
Advantages:
• One-pass learning can be faster and more memory-efficient than traditional batch or
mini-batch gradient descent methods, as it avoids the need to store and process the
entire dataset multiple times.
• It is well-suited to large-scale datasets, as it can process data in a streaming fashion
and scale to handle very large datasets.
• It can be useful in real-time or online learning scenarios, where data arrives in a
continuous stream and the model needs to be updated in real-time.
Disadvantages:
• Since one-pass learning only sees each example once, it may not converge to the
optimal solution, especially if the data is noisy or there are outliers. Additionally, the
model may not have enough opportunities to adjust its parameters to the underlying
patterns in the data.
• One-pass learning may not be able to learn complex patterns in the data, as it has
limited opportunities to adjust its parameters to fit the data. It may be less effective
for deep learning models with many layers and parameters.
• One-pass learning requires careful selection of hyperparameters such as the learning
rate and regularization to ensure that the model learns effectively and does not
overfit.
Useful Cases:
One-pass learning is most useful in scenarios where the data is large and continuous, and
where the goal is to process the data in real-time or with limited resources. Examples include
anomaly detection in streaming data, fraud detection in financial transactions, and natural
language processing in real-time chat applications. One-pass learning may also be useful as a
pre-training step to initialize the weights of a deep learning model before fine-tuning with
more iterations or epochs.
Incremental learning is often used in scenarios where data is constantly changing or being
added, such as in online learning or real-time data processing, and it can also help to improve
efficiency and reduce the amount of time and resources needed for training models.
Advantages:
• Incremental learning can be more resource-efficient than retraining the model from
scratch each time new data is added, as it only needs to update the parameters based
on the new data. This can be especially useful in scenarios where the dataset is very
large or continuously growing.
• It allows the model to adapt to changing data distributions over time, which can be
particularly useful in scenarios where the underlying data generating process may
change over time.
• It can allow the model to adapt more quickly to changes in the data and produce faster
responses than traditional batch learning approaches.
Disadvantages:
• Incremental learning can suffer from the problem of catastrophic forgetting, where
the model forgets previous knowledge as it learns new information. This can be
particularly problematic in scenarios where the model needs to maintain its
performance on previously learned tasks.
• It requires careful selection of representative examples to avoid introducing bias into
the updated model. In addition, the order in which the examples are presented can
also impact the performance of the updated model.
• It can be susceptible to model drift, where the updated model no longer accurately
represents the underlying data distribution due to changes in the data over time.
Alternative Approaches:
Exercise 2: c) Justification:
In general, a real concept drift is a more important problem than virtual concept drift.
Real concept drift refers to changes in the underlying data distribution over time, which can
lead to a degradation in the model's performance if the model is not adapted to the new
distribution. In contrast, virtual concept drift refers to changes in the relationship between
the features and the target variable, which can occur due to changes in the feature extraction
or preprocessing methods.
Real concept drift is likely to be a more important problem than virtual concept drift, as it can
have a more significant impact on the model's performance and may be more difficult to
detect and address. However, both types of concept drift should be carefully monitored and
addressed to ensure that the model's predictions remain accurate and reliable over time.
Exercise 2: d)
Online active learning is a machine learning approach that combines online learning and active
learning to learn from streaming data while minimizing the labeling effort. In online active
learning, uncertainty-based sampling is a common strategy used to select informative
examples for labeling. Below are two widely used methods to improve an uncertainty based
sampling strategy.
Diversity sampling: In addition to selecting samples that the model is uncertain about;
diversity sampling selects samples that maximize the diversity of the labeled data. This can
help improve the performance of the model by reducing the redundancy of the labeled data,
and ensuring that the model learns a representative sample of the data distribution.
Active search: Active search is a method that focuses on sampling examples that the model is
likely to misclassify, rather than just focusing on the most uncertain examples. This can help
the model learn more quickly from the labeled data by actively searching for the most
informative examples that will help the model better discriminate between different classes.
Active search can be particularly useful when the data distribution is skewed or when there
are imbalanced classes.
Overall, both diversity sampling and active search are methods that can be used to improve
the effectiveness of an uncertainty-based sampling strategy in online active learning. By
selecting the most informative examples for labeling, these methods can help improve the
performance of the model while minimizing the labeling effort required.