0% found this document useful (0 votes)
10 views

Assignment 1 - Machine Learning

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Assignment 1 - Machine Learning

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Assignment : Machine Learning

Submitted By : Majid Khan


Submitted To : Dr. Sher Afzal
Program : MS Computer Science [Fall-23 (1st semester)]

Question: What is a loss function?


A loss function is a measure of how well a machine learning model's predictions match the
true target labels or values. It quantifies the error between predictions and ground truth to
judge model performance. Loss functions map a prediction and label to a non-negative value,
with the goal of minimizing the overall loss. The loss is minimized during training by
updating model parameters through optimization algorithms like gradient descent. The choice
of loss function greatly impacts model behaviour and performance.

Question: Why is the choice of loss function important?

The choice of loss function is critical because it fundamentally defines the training objective
and optimization procedure. Different loss functions have significant implications on model
generalization, robustness, complexity, probabilistic interpretation, and more. Key reasons the
choice matters:

• It determines the error surface shape for optimization algorithms, affecting training
efficiency.
• It controls model complexity tradeoffs like overfitting and underfitting.
• It affects how well the model generalizes beyond the training data.
• It determines how robust the model is to noise and outliers in the data.
• It impacts the scale invariance and feature spaces the model operates in.
• It provides meaning and calibration to the predicted probabilities.
• It affects interpretability and intuitiveness of the training objective.

Question: List and define three common loss functions for regression tasks.

• Mean Squared Error (MSE): The average of squared differences between predictions
and true values. Penalizes larger errors due to squaring. Sensitive to outliers. Intuitive scale.

1 𝑛
𝑀𝑆𝐸 = ∑ (𝑎𝑐𝑡𝑢𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 − 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒)2
𝑛 𝑖=1

• Mean Absolute Error (MAE): The average of absolute differences between predictions
and truth. Linear error penalty. Robust to outliers. Scale dependent.
𝑛
1
𝑀𝐴𝐸 = ∑ |𝑎𝑐𝑡𝑢𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 − 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒|
𝑛
𝑖=1

• Huber Loss: Combines MSE and MAE. Uses MSE for small errors and MAE for large
ones. Provides robustness while maintaining intuitiveness.

𝐻𝑢𝑏𝑒𝑟 𝐿𝑜𝑠𝑠
𝑛
1
∑. (𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑)2 𝑓𝑜𝑟 |𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑| ≤ 𝛿
𝑛
𝑖=1
= 𝑛
1 1
∑. 𝛿 (|𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑| − 𝛿) 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
{ 𝑛 2
𝑖=1

Question: Compare and contrast the following loss functions:

o Mean squared error (MSE)


o Cross-entropy loss
o Absolute loss

Answer:
The Detailed comparison of Mean squared error, Cross-Entropy loss and Absolute loss is
mentioned below:

• While mean squared error (MSE) and absolute loss are both commonly used for regression,
MSE has some notable advantages and disadvantages compared to absolute loss. The
squaring of errors in MSE leads to a smoothly convex error surface that is straightforward
to optimize, unlike the sharp corners when taking absolute values. However, this squaring
also causes MSE to be much more sensitive to outliers than the more robust absolute loss.
The scale produced by squaring also makes the values of MSE more intuitively
interpretable. Additionally, MSE assumes normally distributed noise which may not fit all
problems appropriately.

• Comparing MSE and cross-entropy loss, used for regression and classification respectively,
there are clear tradeoffs. MSE weights all errors equally, while cross-entropy heavily
penalizes misclassifications and false positives/negatives. The probabilistic output and
logarithmic scale of cross-entropy also provides advantages in terms of optimizing
likelihood and calibrating confidence. However, MSE has a simpler quadratic form that is
easier to optimize than cross-entropy for some models.

• The absolute loss function contrasts sharply with cross-entropy loss on classification tasks
in a few ways. Absolute loss can be used but lacks the probabilistic justification that cross-
entropy provides. Cross-entropy is also generally easier to optimize efficiently than
absolute loss due to the sharp corners in its error surface. However, absolute loss would be
less sensitive to data scaling and outliers than cross-entropy.

Conclusion:

while all three loss functions have their merits, certain distinctions make each better suited for
specific tasks and models. MSE's sensitivities lend themselves to regression problems, cross-
entropy's probabilistic nature suits classification, and absolute loss provides greater robustness
when outliers are present.

Question: Which loss function would you choose for the following tasks? Justify your
answer.

o Classifying images of cats and dogs


o Predicting the price of a house
o Predicting the number of customers who will visit a store on a given day.

Answer:

• For the image classification task, I would select the cross-entropy loss function. Cross-
entropy loss is the most appropriate for multi-class image classification because it measures
the divergence between the predicted class probabilities and the true label distribution. By
optimizing cross-entropy, the model is trained to maximize the likelihood of the correct
class. Cross-entropy heavily penalizes confident incorrect classifications, forcing the model
to learn subtle features that distinguish classes. It also provides interpretability by operating
as a log-likelihood. Furthermore, cross-entropy enables proper calibration of predicted
probabilities from the softmax output. It is universally used for multi-class deep learning
classifiers with outstanding results. The convexity and smoothness of cross-entropy also
allows straightforward optimization with gradient descent methods.

• For the house price regression task, I would select mean squared error (MSE) as the loss
function. MSE naturally fits continuous value regression problems by quantifying the
squared magnitude of errors. This penalizes larger deviations, optimizing predictions for
Gaussian-like noise. MSE provides intuitive interpretations of error due to the squared
term. It does not make assumptions about the noise distribution like MAE does. MSE is a
standard baseline loss for regression problems that is smooth, convex, and easy to optimize
with gradient descent. The widespread use and interpretability of MSE make it the best
choice.
• For the customer prediction regression problem, I would also choose mean squared error
as the loss function. Since this is also a continuous value regression task, MSE is again the
most suitable loss for the same reasons described above. It directly optimizes for
quantitative accuracy in the numerical predictions. The intuitiveness, optimization
properties, and ubiquity of MSE make it the optimal choice over alternatives.

Python Implementation (IDE: Google Colab)


OUTPUT:
Raw Code:
import math

# 0/1 loss function (binary classification)

def zero_one_loss(y_true, y_pred):

misclassifications = sum(1 for a, b in zip(y_true, y_pred) if a != b)

return misclassifications / len(y_true)

# Squared loss function

def squared_loss(y_true, y_pred):

squared_errors = [(a - b) ** 2 for a, b in zip(y_true, y_pred)]

return sum(squared_errors) / len(y_true)

# Mean Squared Error (MSE)

def mean_squared_error(y_true, y_pred):

squared_errors = [(a - b) ** 2 for a, b in zip(y_true, y_pred)]

return sum(squared_errors) / len(y_true)

# Root Mean Squared Error (RMSE)

def root_mean_squared_error(y_true, y_pred):

squared_errors = [(a - b) ** 2 for a, b in zip(y_true, y_pred)]

mse = sum(squared_errors) / len(y_true)

return math.sqrt(mse)
# Absolute loss function (Mean Absolute Error, MAE)

def absolute_loss(y_true, y_pred):

absolute_errors = [abs(a - b) for a, b in zip(y_true, y_pred)]

return sum(absolute_errors) / len(y_true)

# Display the table of actual values and predictions

def display_table(actual, predicted):

print('-Machine Learning Assignment \n-Submitted by MAJID KHAN')

print("____________________________")

print("Actual value | Prediction")

print("------------- | -------------")

for a, b in zip(actual, predicted):

print(f"{a:^13} | {b:^11}")

actual_values = [100, 90, 110, 120]

predicted_values = [110, 80, 120, 110]

display_table(actual_values, predicted_values)

while True:

print('\n_________________________________________________________________')

print("\nChoose a loss function:")

print("1. 0/1 Loss")

print("2. Squared Loss")

print("3. MSE")

print("4. RMSE")

print("5. MAE")

print("6. Exit")

print('_________________________________________________________________')

choice = int(input("Enter the number of the loss function: "))

if choice == 1:

loss = zero_one_loss(actual_values, predicted_values)

print("\n 0/1 Loss Formula: (Number of Misclassifications) / (Total Samples)")

print("0/1 Loss Input Values - Number of Misclassifications:", sum(1 for a, b in


zip(actual_values, predicted_values) if a != b))

print("Total Samples:", len(actual_values))

print("0/1 Loss Result:", loss)

elif choice == 2:
loss = squared_loss(actual_values, predicted_values)

print("Squared Loss Formula: (Σ(actual - predicted)²) / Total Samples")

print("\nSquared Loss Input Values - (Σ(actual - predicted)²):", sum((a - b) ** 2


for a, b in zip(actual_values, predicted_values)))

print("Total Samples:", len(actual_values))

print("Squared Loss Result:", loss)

elif choice == 3:

loss = mean_squared_error(actual_values, predicted_values)

print("\nMSE Formula: (Σ(actual - predicted)²) / Total Samples")

print("MSE Input Values - (Σ(actual - predicted)²):", sum((a - b) ** 2 for a, b in


zip(actual_values, predicted_values)))

print("Total Samples:", len(actual_values))

print("MSE Result:", loss)

elif choice == 4:

loss = root_mean_squared_error(actual_values, predicted_values)

mse = mean_squared_error(actual_values, predicted_values)

print("\nRMSE Formula: √MSE")

print("RMSE Input Value - MSE:", mse)

print("RMSE Result:", loss)

elif choice == 5:

loss = absolute_loss(actual_values, predicted_values)

print("\nMAE Formula: (Σ|actual - predicted|) / Total Samples")

print("MAE Input Values - (Σ|actual - predicted|):", sum(abs(a - b) for a, b in


zip(actual_values, predicted_values)))

print("Total Samples:", len(actual_values))

print("MAE Result:", loss)

elif choice == 6:

break

else:

print("Invalid choice. Please select a valid loss function.")

________________The END________________

You might also like