cst414 - Deep Learning
cst414 - Deep Learning
Where:
• σ(x) is the output of the sigmoid function.
• x is the weighted sum of inputs in a neuron.
• e is the base of the natural logarithm (~2.718).
2) Tanh (Hyperbolic Tangent)
• It is similar to the Sigmoid function, but with a key difference: it maps input values to
a range between -1 and 1, rather than between 0 and 1.
Tanh Function Formula:
Where,
• tanh(x) is the output of the Tanh function.
• x is the weighted sum of inputs to a neuron.
• e is the base of the natural logarithm (~2.718).
3) ReLU (Rectified Linear Unit)
●
One of the most commonly used activation functions in modern deep learning
models.
●
It is simple, computationally efficient, and helps address some of the
limitations of older activation functions like Sigmoid and Tanh.
●
ReLU sets all negative input values to zero and leaves positive input values
unchanged.
●
This means that if the input x is greater than or equal to 0, the output is xx. If
the input is less than 0, the output is 0.
4) Softmax activation function
●
It widely used in the output layer of neural networks, particularly for
multi-class classification problems.
●
It converts raw network outputs (called logits) into a probability
distribution across multiple classes.
●
The main goal of the softmax function is to ensure that the output
values are interpretable as probabilities, meaning that they are all
between 0 and 1 and sum to 1.
●
Given an input vector z= [z1,z2,z3,....,zn] where n represents the number
of classes and zi represents the raw score (logit) for class i, the Softmax
function for each class is defined as:
where,
●
ezi is the exponential of the i-th logit.
●
The denominator is the sum of the exponentials of all logits.
Types of Activation Functions
Activation Range Common Key Key
Function Usecase Advantage Disdvantage
Where,
●
yi: Actual value
●
y^i: Predicted value
●
N: Number of data points
2) Cross-Entropy Loss: Used for classification problems, especially binary
or multi-class classification.
Formula:
Where,
●
y is the true label (0 or 1) and y^ is the predicted probability that the label is 1.
3) Hinge Loss: Used for Support vector machines (SVM) and binary
classification.
Formula:
Where,
●
y is the true label (either +1 or -1), and y^ is the predicted value.
Risk Minimization
●
Risk minimization refers to the process of minimizing the expected loss or risk
over all possible data points and distributions. The goal is to find the best model
parameters (weights and biases) such that the model’s predictions are as accurate
as possible on unseen data.
1) True Risk: The expected loss over the entire distribution of data, expressed as:
where:
●
P(x,y) is the joint probability distribution of inputs x and outputs y,
●
f(x) is the predicted output, and
●
L(f(x),y) is the loss function.
2) Empirical Risk: The average loss over the finite training dataset D =
{(xi,yi)} expressed as: