Assignment 1 - Machine Learning
Assignment 1 - Machine Learning
The choice of loss function is critical because it fundamentally defines the training objective
and optimization procedure. Different loss functions have significant implications on model
generalization, robustness, complexity, probabilistic interpretation, and more. Key reasons the
choice matters:
• It determines the error surface shape for optimization algorithms, affecting training
efficiency.
• It controls model complexity tradeoffs like overfitting and underfitting.
• It affects how well the model generalizes beyond the training data.
• It determines how robust the model is to noise and outliers in the data.
• It impacts the scale invariance and feature spaces the model operates in.
• It provides meaning and calibration to the predicted probabilities.
• It affects interpretability and intuitiveness of the training objective.
Question: List and define three common loss functions for regression tasks.
• Mean Squared Error (MSE): The average of squared differences between predictions
and true values. Penalizes larger errors due to squaring. Sensitive to outliers. Intuitive scale.
1 𝑛
𝑀𝑆𝐸 = ∑ (𝑎𝑐𝑡𝑢𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 − 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒)2
𝑛 𝑖=1
• Mean Absolute Error (MAE): The average of absolute differences between predictions
and truth. Linear error penalty. Robust to outliers. Scale dependent.
𝑛
1
𝑀𝐴𝐸 = ∑ |𝑎𝑐𝑡𝑢𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 − 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒|
𝑛
𝑖=1
• Huber Loss: Combines MSE and MAE. Uses MSE for small errors and MAE for large
ones. Provides robustness while maintaining intuitiveness.
𝐻𝑢𝑏𝑒𝑟 𝐿𝑜𝑠𝑠
𝑛
1
∑. (𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑)2 𝑓𝑜𝑟 |𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑| ≤ 𝛿
𝑛
𝑖=1
= 𝑛
1 1
∑. 𝛿 (|𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑| − 𝛿) 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
{ 𝑛 2
𝑖=1
Answer:
The Detailed comparison of Mean squared error, Cross-Entropy loss and Absolute loss is
mentioned below:
• While mean squared error (MSE) and absolute loss are both commonly used for regression,
MSE has some notable advantages and disadvantages compared to absolute loss. The
squaring of errors in MSE leads to a smoothly convex error surface that is straightforward
to optimize, unlike the sharp corners when taking absolute values. However, this squaring
also causes MSE to be much more sensitive to outliers than the more robust absolute loss.
The scale produced by squaring also makes the values of MSE more intuitively
interpretable. Additionally, MSE assumes normally distributed noise which may not fit all
problems appropriately.
• Comparing MSE and cross-entropy loss, used for regression and classification respectively,
there are clear tradeoffs. MSE weights all errors equally, while cross-entropy heavily
penalizes misclassifications and false positives/negatives. The probabilistic output and
logarithmic scale of cross-entropy also provides advantages in terms of optimizing
likelihood and calibrating confidence. However, MSE has a simpler quadratic form that is
easier to optimize than cross-entropy for some models.
• The absolute loss function contrasts sharply with cross-entropy loss on classification tasks
in a few ways. Absolute loss can be used but lacks the probabilistic justification that cross-
entropy provides. Cross-entropy is also generally easier to optimize efficiently than
absolute loss due to the sharp corners in its error surface. However, absolute loss would be
less sensitive to data scaling and outliers than cross-entropy.
Conclusion:
while all three loss functions have their merits, certain distinctions make each better suited for
specific tasks and models. MSE's sensitivities lend themselves to regression problems, cross-
entropy's probabilistic nature suits classification, and absolute loss provides greater robustness
when outliers are present.
Question: Which loss function would you choose for the following tasks? Justify your
answer.
Answer:
• For the image classification task, I would select the cross-entropy loss function. Cross-
entropy loss is the most appropriate for multi-class image classification because it measures
the divergence between the predicted class probabilities and the true label distribution. By
optimizing cross-entropy, the model is trained to maximize the likelihood of the correct
class. Cross-entropy heavily penalizes confident incorrect classifications, forcing the model
to learn subtle features that distinguish classes. It also provides interpretability by operating
as a log-likelihood. Furthermore, cross-entropy enables proper calibration of predicted
probabilities from the softmax output. It is universally used for multi-class deep learning
classifiers with outstanding results. The convexity and smoothness of cross-entropy also
allows straightforward optimization with gradient descent methods.
• For the house price regression task, I would select mean squared error (MSE) as the loss
function. MSE naturally fits continuous value regression problems by quantifying the
squared magnitude of errors. This penalizes larger deviations, optimizing predictions for
Gaussian-like noise. MSE provides intuitive interpretations of error due to the squared
term. It does not make assumptions about the noise distribution like MAE does. MSE is a
standard baseline loss for regression problems that is smooth, convex, and easy to optimize
with gradient descent. The widespread use and interpretability of MSE make it the best
choice.
• For the customer prediction regression problem, I would also choose mean squared error
as the loss function. Since this is also a continuous value regression task, MSE is again the
most suitable loss for the same reasons described above. It directly optimizes for
quantitative accuracy in the numerical predictions. The intuitiveness, optimization
properties, and ubiquity of MSE make it the optimal choice over alternatives.
return math.sqrt(mse)
# Absolute loss function (Mean Absolute Error, MAE)
print("____________________________")
print("------------- | -------------")
print(f"{a:^13} | {b:^11}")
display_table(actual_values, predicted_values)
while True:
print('\n_________________________________________________________________')
print("3. MSE")
print("4. RMSE")
print("5. MAE")
print("6. Exit")
print('_________________________________________________________________')
if choice == 1:
elif choice == 2:
loss = squared_loss(actual_values, predicted_values)
elif choice == 3:
elif choice == 4:
elif choice == 5:
elif choice == 6:
break
else:
________________The END________________