Sigmoid is a mathematical function that maps any real-valued number into a value between 0 and 1. Its characteristic "S"-shaped curve makes it particularly useful in scenarios where we need to convert outputs into probabilities. This function is often called the logistic function.
Mathematically, sigmoid is represented as:
\sigma = \frac{1}{1 + e^{-x}}
where,
- x is the input value,
- e is Euler's number (\approx 2.718 )
Sigmoid function is used as an activation function in machine learning and neural networks for modeling binary classification problems, smoothing outputs, and introducing non-linearity into models.
Graph of Sigmoid Activation Function In this graph, the x-axis represents the input values that ranges from - \infty \ to \ +\infty and y-axis represents the output values which always lie in [0,1].
In machine learning, x could be a weighted sum of inputs in a neural network neuron or a raw score in logistic regression. If the output value is close to 1, it indicates high confidence in one class and if the value is close to 0, it indicates high confidence in the other class.
Properties of the Sigmoid Function
The sigmoid function has several key properties that make it a popular choice in machine learning and neural networks:
- Domain: The domain of the sigmoid function is all real numbers. This means that you can input any real number into the sigmoid function, and it will produce a valid output.
- Asymptotes: As x approaches positive infinity, σ(x) approaches 1. Conversely, as x approaches negative infinity, σ(x) approaches 0. This property ensures that the function never actually reaches 0 or 1, but gets arbitrarily close.
- Monotonicity: The sigmoid function is monotonically increasing, meaning that as the input increases, the output also increases.
- Differentiability: The sigmoid function is differentiable, which allows for the calculation of gradients during the training of machine learning models.
Sigmoid Function in Backpropagation
If we use a linear activation function in a neural network, the model will only be able to separate data linearly, which results in poor performance on non-linear datasets. However, by adding a hidden layer with a sigmoid activation function, the model gains the ability to handle non-linearity, thereby improving performance.
During the backpropagation, the model calculates and updates weights and biases by computing the derivative of the activation function. The sigmoid function is useful because:
- It is the only function that appears in its derivative.
- It is differentiable at every point, which helps in the effective computation of gradients during backpropagation.
Derivative of Sigmoid Function
The derivative of the sigmoid function, denoted as σ'(x), is given by σ'(x)=σ(x)⋅(1−σ(x)).
Let's see how the derivative of sigmoid function is computed.
We know that, sigmoid function is defined as:
y = \sigma(x) = \frac{1}{1 + e^{-x}}
Define:
u = 1 + e^{-x}
Rewriting the sigmoid function:
y = \frac{1}{u}
Differentiating u with respect to x:
\frac{du}{dx} = -e^{-x}
Differentiating y with respect to u:
\frac{dy}{du} = -\frac{1}{u^2}
Using the chain rule:
\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx}
\frac{dy}{dx} = (- \frac{1}{u^2}) \cdot (e^{-x})
\frac{dy}{dx} = \frac{e^{-x}}{u^2}
Since u = 1 + e^{-x}, substituting:
\frac{dy}{dx} = \frac{e^{-x}}{(1 + e^{-x})^2}
Since:
\sigma(x) = \frac{1}{1 + e^{-x}}
Rewriting:
1 - \sigma(x) = \frac{e^{-x}}{1 + e^{-x}}
Substituting:
\frac{dy}{dx} = \sigma(x) \cdot (1 - \sigma(x))
Final Result
\sigma'(x) = \sigma(x) \cdot (1 - \sigma(x))
The above equation is known as the generalized form of the derivation of the sigmoid function. The below image shows the derivative of the sigmoid function graphically.
Issue with Sigmoid Function in Backpropagation
One key issue with using the sigmoid function is the vanishing gradient problem. When updating weights and biases using gradient descent, if the gradients are too small, the updates to weights and biases become insignificant, slowing down or even stopping learning.
The shades red region highlights the areas where the derivative \sigma^{'}(x) is very small (close to 0). In these regions, the gradients used to update weights and biases during backpropagation become extremely small. As a result, the model learns very slowly or stops learning altogether, which is a major issue in deep neural networks.
Practice Problems
Problem 1: Calculate the derivative of the sigmoid function at 𝑥=0.
\sigma(0) = \frac{1}{1 + e^0} = \frac{1}{2}
\sigma'(0) = \sigma(0) \cdot (1 - \sigma(0))
= \frac{1}{2} \times \left(1 - \frac{1}{2}\right) = \frac{1}{4}
Problem 2: Find the Value of \sigma'(2)
\sigma(2) = \frac{1}{1 + e^{-2}} \approx 0.88
\sigma'(2) = \sigma(2) \cdot (1 - \sigma(2))σ′(2)=σ(2)⋅(1−σ(2))
\approx 0.88 \times (1 - 0.88) \approx 0.1056
Compute \sigma'(-1):
\sigma(-1) = \frac{1}{1 + e^1} \approx 0.2689
\sigma'(-1) = \sigma(-1) \cdot (1 - \sigma(-1))
\approx 0.2689 \times (1 - 0.2689) \approx 0.1966
Similar Reads
Machine Learning Tutorial Machine learning is a branch of Artificial Intelligence that focuses on developing models and algorithms that let computers learn from data without being explicitly programmed for every task. In simple words, ML teaches the systems to think and understand like humans by learning from the data.Do you
5 min read
Introduction to Machine Learning
Python for Machine Learning
Machine Learning with Python TutorialPython language is widely used in Machine Learning because it provides libraries like NumPy, Pandas, Scikit-learn, TensorFlow, and Keras. These libraries offer tools and functions essential for data manipulation, analysis, and building machine learning models. It is well-known for its readability an
5 min read
Pandas TutorialPandas is an open-source software library designed for data manipulation and analysis. It provides data structures like series and DataFrames to easily clean, transform and analyze large datasets and integrates with other Python libraries, such as NumPy and Matplotlib. It offers functions for data t
6 min read
NumPy Tutorial - Python LibraryNumPy (short for Numerical Python ) is one of the most fundamental libraries in Python for scientific computing. It provides support for large, multi-dimensional arrays and matrices along with a collection of mathematical functions to operate on arrays.At its core it introduces the ndarray (n-dimens
3 min read
Scikit Learn TutorialScikit-learn (also known as sklearn) is a widely-used open-source Python library for machine learning. It builds on other scientific libraries like NumPy, SciPy and Matplotlib to provide efficient tools for predictive data analysis and data mining.It offers a consistent and simple interface for a ra
3 min read
ML | Data Preprocessing in PythonData preprocessing is a important step in the data science transforming raw data into a clean structured format for analysis. It involves tasks like handling missing values, normalizing data and encoding variables. Mastering preprocessing in Python ensures reliable insights for accurate predictions
6 min read
EDA - Exploratory Data Analysis in PythonExploratory Data Analysis (EDA) is a important step in data analysis which focuses on understanding patterns, trends and relationships through statistical tools and visualizations. Python offers various libraries like pandas, numPy, matplotlib, seaborn and plotly which enables effective exploration
6 min read
Feature Engineering
Supervised Learning
Unsupervised Learning
Model Evaluation and Tuning
Advance Machine Learning Technique
Machine Learning Practice