Sigmoid Function Definition - DeepAI
Sigmoid Function Definition - DeepAI
Sigmoid Function
All sigmoid functions have the property that they map the entire number line into a small range
such as between 0 and 1, or -1 and 1, so one use of a sigmoid function is to convert a real value
into one that can be interpreted as a probability.
https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 1/15
10/9/2020 Sigmoid Function Definition | DeepAI
SUBSCRIBE
One of the most widely used sigmoid functions is the logistic function, which maps any real AdaptiveReID: Adaptive L2 Regularization in Per…
value to the range (0, 1). Note the characteristic S-shape which gave sigmoid functions their 07/15/2020 ∙ by Xingyang Ni ∙ 16
name (from the Greek letter sigma).
Sigmoid functions have become popular in deep learning because they can be used as an
Approximating Activation Functions
activation function in an arti cial neural network. They were inspired by the activation potential
in biological neural networks. 01/17/2020 ∙ by Nicholas Gerard Timmons ∙
13
Sigmoid functions are also useful for many machine learning applications where a real number
needs to be converted to a probability. A sigmoid function placed as the last layer of a machine
learning model can serve to convert the model's output into a probability score, which can be
Top-N-Rank: A Scalable List-wise Ranking Metho…
easier to work with and interpret.
12/10/2018 ∙ by Junjie Liang ∙ 10
Sigmoid functions are an important part of a logistic regression model. Logistic regression is a
modi cation of linear regression for two-class classi cation, and converts one or more real-
valued inputs into a probability, such as the probability that a customer will purchase a product. A Gated Peripheral-Foveal Convolutional Neural…
The nal stage of a logistic regression model is often set to the logistic function, which allows
12/19/2018 ∙ by Xiaodan Zhang ∙ 8
the model to output a probability.
All sigmoid functions are monotonic and have a bell-shaped rst derivative. There are several 09/18/2019 ∙ by Jiangsheng You ∙ 3
sigmoid functions and some of the best-known are presented below.
https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 2/15
10/9/2020 Sigmoid Function Definition | DeepAI
VIEW MORE
Three of the commonest sigmoid functions: the logistic function, the hyperbolic tangent, and
the arctangent. All share the same basic S shape.
One of the commonest sigmoid functions is the logistic sigmoid function. This is often referred
to as the Sigmoid Function in the eld of machine learning. The logistic sigmoid function is
de ned as follows:
Another common sigmoid function is the hyperbolic function. This maps any real-valued input to
the range between -1 and 1.
https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 3/15
10/9/2020 Sigmoid Function Definition | DeepAI
A third alternative sigmoid function is the arctangent, which is the inverse of the tangent
function.
The arctangent function maps any real-valued input to the range −π/2 to π/2.
In the below graphs we can see both the tangent curve, a well-known trigonometric function,
and the arctangent, its inverse:
Taking the logistic sigmoid function, we can evaluate the value of the function at several key
points to understand the function's form.
https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 4/15
10/9/2020 Sigmoid Function Definition | DeepAI
This is useful for the interpretation of the sigmoid as a probability in a logistic regression model,
because it shows that a zero input results in an output of 0.5, indicating equal probabilities of
both classes.
and by x = 5, the value of the sigmoid function becomes very close to 1.
In fact, in the limit of x tending towards in nity, the sigmoid function converges to 1, and
towards -1 in the case of negative in nity, but the derivative of the function never reaches zero.
These are very useful properties of the sigmoid function, as it tends towards a limit but always
has a nonzero gradient.
https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 5/15
10/9/2020 Sigmoid Function Definition | DeepAI
At 1, the tanh function has increased relatively much more rapidly than the logistic function:
And nally, by 5, the tanh function has converged much more closely to 1, within 5 decimal
places:
In fact, both the hyperbolic tangent and arctangent functions converge much more rapidly than
the logistic sigmoid function.
We can evaluate the arctangent function at the same points to see where it converges:
Note that in contrast to the other two sigmoid functions shown above, the arctangent
converges to π/2 rather than 1. Furthermore, the arctangent converges more slowly, as at x = 5 it
is not even close to its nal value. Only by quite large numbers, such as x = 5000, does the
arctangent get very close to π/2.
We can compare the key properties of the three sigmoid functions shown above in a table:
https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 7/15
10/9/2020 Sigmoid Function Definition | DeepAI
The ReLU function has several main advantages over a sigmoid function in a neural network. The
main advantage is that the ReLU function is very fast to calculate. In addition, an activation
potential in a biological neural network does not continue to change for negative inputs, so the
ReLU seems closer to the biological reality if a goal is to mimic biological systems.
the ReLU function has a constant gradient of 1, whereas a sigmoid function has a gradient that
rapidly converges towards 0. This property makes neural networks with sigmoid activation
functions slow to train. This phenomenon is known as the vanishing gradient problem. The
choice of ReLU as an activation function alleviates this problem because the gradient of the
ReLU is always 1 for positive
x and so the learning process will not be slowed down by the gradient becoming small.
However, the zero gradient for negative x can pose a similar problem, known as the zero
gradient problem, but it is possible to compensate for this by adding a small linear term in x to
give the ReLU function a nonzero slope at all points.
In logistic regression, a logistic sigmoid function is t to a set of data where the independent
variable(s) can take any real value, and the dependent variable is either 0 or 1.
https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 8/15
10/9/2020 Sigmoid Function Definition | DeepAI
For example, let us imagine a dataset of tumor measurements and diagnoses. Our aim is to
predict the probability of a tumor spreading, given its size in centimeters.
Plotting the entire dataset, we have a general trend that, the larger the tumor, the more likely it
is to have spread, although there is a clear overlap of both classes in the range 2.5 cm to 3.5 cm:
https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 9/15
10/9/2020 Sigmoid Function Definition | DeepAI
Using logistic regression, we can model the tumor status y (0 or 1) as a function of tumor
size x using the logistic sigmoid formula:
where we need to nd the optimal values m and b, which allow us to shift and stretch the
sigmoid curve to match the data.
In this case, tting the sigmoid curve gives us the following values:
We can put these values back into the sigmoid formula and plot the curve:
https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 10/15
10/9/2020 Sigmoid Function Definition | DeepAI
This means that, for example, given a tumor of size 3cm, our logistic regression model would
predict the probability of this tumor spreading as:
Intuitively, this makes sense. In the original data, we can see that the tumors around 3cm are
more or less evenly distributed between both classes.
Let us consider a tumor of size 6 cm. All tumors in the original dataset of size 4 cm or greater
had spread, so we would expect that our model would return a high likelihood of the tumor
spreading:
The model has returned a probability very close to 1, indicating the near certainty that y = 1.
This shows how sigmoid functions, and the logistic function in particular, are extremely powerful
for probability modeling.
Why is the logistic function used in logistic regression, and not another sigmoid
function?
The reason that the logistic function is used in logistic regression, and none of the other sigmoid
variants, is not just due to the fact that it conveniently returns values between 0 and 1. Logistic
regression is derived from the assumption that data in both classes is normally distributed.
https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 11/15
10/9/2020 Sigmoid Function Definition | DeepAI
Let us imagine that non-spreading tumors and spreading tumors each follow a normal
distribution. The non-spreading tumors are normally distributed with mean 1.84 cm and
standard deviation 1 cm, and the spreading tumors are normally distributed with mean 4.3 cm,
also with standard deviation 1 cm. We can plot both the probability density function of both
these normal distributions:
At each point we can calculate the odds ratio of the two distributions, which is the probability
density function of the spread tumors divided by the sum of both probability density functions
(non-spreading + spread tumors):
Plotting the odds ratio as a function of x, we can see that the result is the original logistic
sigmoid curve.
https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 12/15
10/9/2020 Sigmoid Function Definition | DeepAI
The reason that the logistic function is chosen for logistic regression is due to an assumption we
are modeling two classes which are both normally distributed, and the logistic function naturally
arises from the ratio of normal probability density functions.
An arti cial neural network consists of several layers of functions, layered on top of each other:
Each layer typically contains some weights and biases and functions like a small linear
regression. A crucial part of the layers is also the activation function.
Formula for the rst hidden layer of a feedforward neural network, with weights denoted by
W and biases by b, and activation function g.
However, if every layer in the neural network were to contain only weights and biases, but no
activation function, the entire network would be equivalent to a single linear combination of
weights and biases. In other words, the formula for the neural network could be factorized and
simpli ed down to a simple linear regression model. Such a model would be able to pick up very
simple linear dependencies but unable to perform the impressive tasks that neural networks are
renowned for, such as image and voice recognition.
Activation functions were introduced between layers in neural networks in order to introduce a
non-linearity. Originally sigmoid functions such as the logistic function, arctangent, and
hyperbolic tangent were used, and today ReLU and its variants are very popular. All activation
https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 13/15
10/9/2020 Sigmoid Function Definition | DeepAI
functions serve the same purpose: to introduce a non-linearity into the network. Sigmoid
functions were chosen as some of the rst activation functions thanks to their perceived
similarity with the activation potential in biological neural networks.
Thanks to the use of a sigmoid function at various points within a multi-layer neural network,
neural networks can be built to have successive layers pick up on ever more sophisticated
features of an input example.
In the late 1830s, the Belgian mathematician Pierre François Verhulst was experimenting with
di erent ways of modeling population growth, and wanted to account for the fact that a
population's growth is ultimately self-limiting, and does not increase exponentially forever.
Verhulst chose the logistic function as a logical adjustment to the simple exponential model, in
order to model the slowing down of a population's growth which occurs when a population
begins to exhaust its resources.
Over the next century, biologists and other scientists began to use the sigmoid function as a
standard tool for modeling population growth, from bacterial colonies to human civilizations.
In 1943, Warren McCulloch and Walter Pitts developed an arti cial neural network model using a
hard cuto as an activation function, where a neuron outputs 1 or 0 depending on whether its
input is above or below a threshold.
In 1972, the biologists Hugh Wilson and Jack Cowan at the University of Chicago were
attempting to model biological neurons computationally and published the Wilson–Cowan
model, where a neuron sends a signal to another neuron if it receives a signal greater than an
activation potential. Wilson and Cowan chose the logistic sigmoid function to model the
activation of a neuron as a function of a stimulus.
From the 1970s and 1980s onwards, a number of researchers began to use sigmoid functions in
formulations of arti cial neural networks, taking inspiration from biological neural networks. In
1998, Yann LeCun chose the hyperbolic tangent as an activation function in his groundbreaking
convolutional neural network LeNet, which was the rst to be able to recognize handwritten
https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 14/15
10/9/2020 Sigmoid Function Definition | DeepAI
In recent years, arti cial neural networks have moved away from sigmoid functions in favor of
the ReLU function, since all the variants of the sigmoid function are computationally intensive to
calculate, and the ReLU provides the necessary nonlinearity to take advantage of the depth of
the network, while also being very fast to compute.
References
Malthus, An Essay on the Principle of Population, 1798
Verhulst, Notice sur la loi que la population suit dans son accroissement (1838)
McCulloch and Pitts, A logical calculus of the ideas immanent in nervous activity (1943)
Wilson and Cowan, Excitatory and inhibitory interactions in localized populations of model
neurons (1972)
https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 15/15