0% found this document useful (0 votes)
40 views9 pages

Building A Tanh Activation Function

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views9 pages

Building A Tanh Activation Function

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Building

Tanh Activation Function


from Scratch in Python

Without relying on high-level libraries


1 ANSHUMAN JHA
Building Tanh Activation Function from Scratch in Python

Table of Contents
1. Introduction to Tanh Activation Function
a. What are Activation Functions?
b. Why are Activation Functions Used?
c. Types of Activation Functions:
d. Choosing the Right Activation Function:
2. What is Tanh Activation Function?
3. Comparison of ReLU, Leaky ReLU, Sigmoid and Tanh Activation Functions
4. The Structure of an Tanh Activation Function
5. Implementation Tanh Activation Function from Scratch in Python
a. Step 1: Import Necessary Libraries
b. Step 2: Define the Tanh Function
c. Step 3: Define the Tanh Function Derivative
d. Step 4: Plot the Tanh Function and Its Derivative
6. Conclusion

2 ANSHUMAN JHA
Building Tanh Activation Function from Scratch in Python
1. Introduction to Tanh Activation Function
Activation functions are a crucial component of artificial neural networks, playing a key role in enabling the
neural networks to learn complex patterns and make intelligent decisions. The Tanh (Hyperbolic Tangent)
activation function is commonly used in neural networks to introduce non-linearity into the model. It maps
input values to a range between -1 and 1.

This post will guide you through implementing the Tanh activation function from scratch in Python, exploring
the fundamental concepts and steps involved, and providing sample code with detailed explanations and
visualizations.

What are Activation Functions?

In simple terms, an activation function in a neural network decides whether a neuron should be "activated" or
not, based on the input it receives. Imagine a neuron as a light bulb – the activation function determines if the
bulb should light up or stay off.

Why are Activation Functions Used?

1. Introducing Non-linearity: Without activation functions, neural networks would essentially be performing
just linear transformations. Real-world data often exhibits non-linear relationships, meaning a straight line can't
accurately represent the patterns. Activation functions introduce non-linearity, allowing the network to model
and learn these complex relationships.

2. Decision Boundary Creation: Activation functions help neural networks create decision boundaries. For
example, in image classification, an activation function can help the network decide whether a picture is of a cat
or a dog by drawing a boundary between the two categories based on learned features.

3. Controlling Neuron Output: Activation functions control the output of a neuron, keeping it within a desired
range. This is important for stability and efficiency during training.

Types of Activation Functions:

There are various types of activation functions, each with its own characteristics and applications, including:

* Tanh: Outputs a value between 0 and 1, historically used for binary classification.
* ReLU (Rectified Linear Unit): Outputs the input directly if positive, otherwise 0. Very popular due to its
computational efficiency.
* Tanh (Hyperbolic Tangent): Similar to Tanh but outputs values between -1 and 1.
* Softmax: Used in the output layer for multi-class classification, providing probabilities for each class.

Choosing the Right Activation Function:

The choice of activation function depends on the specific task, network architecture, and other factors.
Experimentation and research are often needed to find the most suitable one for a particular problem.

In essence, activation functions are the "brain" behind a neural network's decision-making process, enabling it
to learn intricate patterns and make accurate predictions.

3 ANSHUMAN JHA
Building Tanh Activation Function from Scratch in Python
2. What is Tanh Activation Function?
The Tanh activation function, short for "hyperbolic tangent," is a popular choice in neural networks. It takes an
input (representing the weighted sum of inputs to a neuron) and squashes it to an output value between -1 and 1.

The Tanh function is calculated as follows:

tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))

where 'x' is the input value, and 'exp(x)' is the exponential function (e to the power of x).

Key Characteristics:

Output Range: The output of the Tanh function ranges from -1 to 1, making it zero-centered (unlike Sigmoid,
which ranges from 0 to 1).
Smoothness and Differentiability: Like Sigmoid, Tanh is smooth and differentiable, which is essential for
gradient-based optimization algorithms used to train neural networks.
Non-Linearity: Tanh introduces non-linearity to the model, allowing neural networks to learn complex
patterns and relationships.

Advantages of Tanh:

Zero-Centered Output: The zero-centered property of Tanh often helps in faster convergence during training
compared to Sigmoid. This is because the gradients are less likely to get stuck in one direction.
Handles Negative Inputs Better: Unlike ReLU, which "dies" for negative inputs, Tanh gracefully handles
them, providing non-zero gradients.

Disadvantages:

Vanishing Gradients: Similar to Sigmoid, Tanh can also suffer from the vanishing gradient problem for very
large or very small input values. This can slow down training, especially in deep networks.
Computational Cost: Tanh is computationally more expensive than ReLU and its variants due to the use of
exponential functions in its calculation.

Common Use Cases:

Hidden Layers: Tanh is frequently used in the hidden layers of neural networks, especially in recurrent neural
networks (RNNs).
Tasks Requiring Zero-Centered Outputs: Its zero-centered output makes it suitable for tasks where having
activations centered around zero is beneficial.

Tanh is a versatile activation function that addresses some of the limitations of Sigmoid while introducing its
own trade-offs. Its zero-centered output and ability to handle negative inputs make it a valuable tool in the deep
learning toolbox.

4 ANSHUMAN JHA
Building Tanh Activation Function from Scratch in Python
3. Comparison of ReLU, Leaky ReLU, Sigmoid and Tanh Activation Functions

Feature ReLU Leaky ReLU Sigmoid Tanh


Formula f(x) = max(0, x) f(x) = { 0.01x for x < f(x) = 1 / (1 + exp(-x)) f(x) = (exp(x) - exp(-x))
0; x for x >= 0 } / (exp(x) + exp(-x))
Range [0, ∞) (-∞, ∞) (0, 1) (-1, 1)
Derivative 1 (x > 0), 0 (x <= 0) 1 (x > 0), 0.01 (x <= f(x) * (1 - f(x)) 1 - tanh(x)^2
0)
Advantages - Simple & - Solves - Smooth & - Smooth &
computationally
vanishing differentiable differentiable
efficient gradient - Outputs - Output centered
- Solves vanishing
problem for all probabilities around 0
gradient problem
values
for positive values
- Often
performs
better than
ReLU in
practice
Disadvantages - "Dying ReLU" - Performance - Vanishing - Vanishing
problem: Neurons can be gradient problem gradient problem
can get stuck for sensitive to the for very for very
negative inputs leak coefficient large/small inputs large/small inputs
(usually 0.01) - Computationally - Computationally
more expensive more expensive
than ReLU & than ReLU &
Leaky ReLU Leaky ReLU
Common Use - Widely used in - Used when - Binary - Hidden layers in
Cases various deep dealing with classification various deep
learning tasks sparse (output layer) learning tasks
- Often preferred gradients or - Cases where - Often preferred
as the default vanishing probability over Sigmoid in
activation gradient is a outputs are hidden layers
function concern desired
Key Considerations:
o Vanishing Gradient Problem: Refers to the issue where gradients become very small during
backpropagation, hindering learning. ReLU can suffer from this for negative values, while Sigmoid and
Tanh suffer for very large/small values. Leaky ReLU attempts to mitigate this.
o Computational Cost: ReLU and Leaky ReLU are computationally cheaper than Sigmoid and Tanh due
to their simpler formulas.
o Output Interpretation:
o Sigmoid outputs probabilities, making it suitable for binary classification.
o Tanh's output is centered around 0, which can be beneficial for optimization in some cases.
o ReLU and Leaky ReLU are not bound to a specific range.
o Zero-Centered Output: Tanh's output being centered around 0 can sometimes lead to faster
convergence during training compared to Sigmoid.

5 ANSHUMAN JHA
Building Tanh Activation Function from Scratch in Python
4. The Structure of a Tanh Activation Function
This Structure includes the steps and sub-steps with appropriate labels and connections. Each step corresponds to a
function or a key part of the process described in the provided implementation.

6 ANSHUMAN JHA
Building Tanh Activation Function from Scratch in Python
5. Implementation in Python
Let's implement a simple Tanh Activation Functionin Python.

Step 1: Import Necessary Libraries


We'll start by importing the necessary libraries. For this implementation, we'll only need numpy for numerical
computations and matplotlib for visualizations.
• numpy is used for efficient numerical computations.
• matplotlib.pyplot is used for plotting graphs.
import numpy as np
import matplotlib.pyplot as plt

Step 2: Define the Tanh Function


Next, we'll define the Tanh function using the mathematical formula mentioned above.
• The tanh function takes an array-like input x and returns the Tanh of each element using np.tanh(x), which
leverages the optimized Tanh implementation in NumPy.

def Tanh (x):


"""
Compute the hyperbolic tangent of x.

Parameters:
x (array-like): Input values.

Returns:
array-like: Tanh of the input values.
"""
return np.tanh(x)

Step 3: Derivative of the Tanh Function


Next, we define the derivative of the Tanh function using its mathematical formula.
• The tanh_derivative function computes the derivative of the Tanh function for each element in x using the
formula
o 1 - np.tanh(x) ** 2.

def tanh_derivative(x):

"""
Compute the derivative of the hyperbolic tangent of x.

Parameters:
x (array-like): Input values.

Returns:
array-like: Derivative of the tanh of the input values.
"""
return 1 - np.tanh(x) ** 2

7 ANSHUMAN JHA
Building Tanh Activation Function from Scratch in Python
Step 4: Visualization
To understand the behavior of the Tanh function and its derivative, let's plot them.

# Generate input values


x = np.linspace(-10, 10, 400)

# Compute Tanh and its derivative


tanh_values = tanh(x)
tanh_derivative_values = tanh_derivative(x)

# Plot Tanh function


plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.plot(x, tanh_values, label='tanh(x)')
plt.title('Tanh Function')
plt.xlabel('x')
plt.ylabel('tanh(x)')
plt.legend()

# Plot Tanh derivative


plt.subplot(1, 2, 2)
plt.plot(x, tanh_derivative_values, label='tanh\'(x)', color='orange')
plt.title('Tanh Derivative')
plt.xlabel('x')
plt.ylabel('tanh\'(x)')
plt.legend()

plt.tight_layout()
plt.show()

8 ANSHUMAN JHA
Building Tanh Activation Function from Scratch in Python

6. Conclusion
In this post, we have implemented the Tanh activation function and its derivative from scratch in Python. We
also visualized the function and its derivative to understand their behaviors. This implementation provides a
fundamental understanding of how the Tanh activation function works and can be used as a building block in
developing neural networks.
Understanding and implementing activation functions like the Tanh from scratch helps in building a strong
foundation in machine learning and deep learning.

Constructive comments and feedback are welcomed

9 ANSHUMAN JHA

You might also like