Activation Functions
Activation Functions
Moin Mostakim
October 2023
1
Formula: σ(x) = 1+e −x
Range: (0, 1)
First-order Derivative: 1
σ ′ (x) = σ(x) · (1 − σ(x)) 0.8
σ(x)
Output: 0.6
0.4
• Shape: S-shaped curve.
0.2
• Use Cases: Binary classification, 0
sigmoid neurons in the output −5 0 5
layer. x
• Benefits: Smooth gradient, suitable
for converting network outputs to
probabilities.
e x −e −x
Formula: tanh(x) = e x +e −x
Range: (-1, 1)
First-order Derivative:
tanh′ (x) = 1 − tanh2 (x) 1
0.5
tanh(x)
Output:
0
• Shape: S-shaped curve similar to
sigmoid. −0.5
• Use Cases: Regression, −1
−2 −1 0 1 2
classification. x
• Benefits: Centered around zero,
mitigates vanishing gradient
problem, and provides smooth
gradients.
Output:
• Shape: Linear for positive values, zero for negatives.
• Use Cases: Hidden layers in most neural networks.
• Benefits: Efficient, mitigates vanishing gradient, induces sparsity.
First-order Derivative:
(
1 if x ≥ 0
LeakyReLU′ (x, α) =
α if x < 0
Output:
• Shape: Linear for positive values, non-zero slope for negatives.
• Use Cases: Alternative to ReLU to prevent ”dying
ReLU”problem.
• Benefits: Addresses ”dying ReLUı̈ssue, retains sparsity.
(
x if x ≥ 0
Formula: ELU(x, α) = x
Range : (−∞, ∞)
α(e − 1) if x < 0
First-order Derivative:
(
′ 1 if x ≥ 0
ELU (x, α) =
αe x if x < 0
Output:
• Shape: Smooth S-shaped curve with an exponential increase for
negative values.
• Use Cases: An alternative to ReLU with smoother gradients.
• Benefits: Smoother gradients, better training on negative values.
x
Formula (for class i): Softmax(x)i = Pe i xj
j e
Range: (0, 1)
First-order Derivative:
∂
∂xi Softmax(x)j = Softmax(x)i · (δij − Softmax(x)j )
Output:
• Shape: Probability distribution over classes.
• Use Cases: Used in the output layer of multi-class classification
for probability distribution over classes.
• Benefits: Converts scores to class probabilities, essential for
classification tasks.