0% found this document useful (0 votes)
103 views15 pages

Sigmoid Function Definition - DeepAI

The sigmoid function is a mathematical function with an S-shaped curve that is used in machine learning and neural networks. It maps real values between 0 and 1, acting as a threshold or converting values into probabilities. Common sigmoid functions include the logistic function and hyperbolic tangent, both of which output values between 0 and 1. Sigmoid functions are used as activation functions in neural networks and in the output layer of models to convert results into probabilities.

Uploaded by

sushilnamoijam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views15 pages

Sigmoid Function Definition - DeepAI

The sigmoid function is a mathematical function with an S-shaped curve that is used in machine learning and neural networks. It maps real values between 0 and 1, acting as a threshold or converting values into probabilities. Common sigmoid functions include the logistic function and hyperbolic tangent, both of which output values between 0 and 1. Sigmoid functions are used as activation functions in neural networks and in the output layer of models to convert results into probabilities.

Uploaded by

sushilnamoijam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

10/9/2020 Sigmoid Function Definition | DeepAI

RESEARCHERS DEVELOPERS PRODUCTS ⋯ LOGIN SIGNUP

Sigmoid Function

‹ Sequential Covering Rule Building Sigmoidal Nonlinearity ›

Thomas Wood 389 share edit

The world's most comprehensive


What is the Sigmoid Function? data science & arti cial intelligence
glossary
A Sigmoid function is a mathematical function which has a characteristic S-shaped curve. There
are a number of common sigmoid functions, such as the logistic function, the hyperbolic
tangent, and the arctangent

. In machine learning, the term


VIEW GLOSSARY
sigmoid function is normally used to refer speci cally to the logistic function, also called the
logistic sigmoid function.

All sigmoid functions have the property that they map the entire number line into a small range
such as between 0 and 1, or -1 and 1, so one use of a sigmoid function is to convert a real value
into one that can be interpreted as a probability.

  Get the week's most


popular data science
research in your inbox -
every Saturday
Email

https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 1/15
10/9/2020 Sigmoid Function Definition | DeepAI

SUBSCRIBE

Research that mentions


Sigmoid Function

Di erentially Private M-band Wavelet-Based M…


12/30/2019 ∙ by Kenneth Choi ∙ 26

One of the most widely used sigmoid functions is the logistic function, which maps any real AdaptiveReID: Adaptive L2 Regularization in Per…
value to the range (0, 1). Note the characteristic S-shape which gave sigmoid functions their 07/15/2020 ∙ by Xingyang Ni ∙ 16
name (from the Greek letter sigma).

Sigmoid functions have become popular in deep learning because they can be used as an
Approximating Activation Functions
activation function in an arti cial neural network. They were inspired by the activation potential
in biological neural networks. 01/17/2020 ∙ by Nicholas Gerard Timmons ∙
13
Sigmoid functions are also useful for many machine learning applications where a real number
needs to be converted to a probability. A sigmoid function placed as the last layer of a machine
learning model can serve to convert the model's output into a probability score, which can be
Top-N-Rank: A Scalable List-wise Ranking Metho…
easier to work with and interpret.
12/10/2018 ∙ by Junjie Liang ∙ 10
Sigmoid functions are an important part of a logistic regression model. Logistic regression is a
modi cation of linear regression for two-class classi cation, and converts one or more real-
valued inputs into a probability, such as the probability that a customer will purchase a product. A Gated Peripheral-Foveal Convolutional Neural…
The nal stage of a logistic regression model is often set to the logistic function, which allows
12/19/2018 ∙ by Xiaodan Zhang ∙ 8
the model to output a probability.

Sigmoid Function Formula


Data Mapping for Restricted Boltzmann Machine

All sigmoid functions are monotonic and have a bell-shaped rst derivative. There are several 09/18/2019 ∙ by Jiangsheng You ∙ 3
sigmoid functions and some of the best-known are presented below.

https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 2/15
10/9/2020 Sigmoid Function Definition | DeepAI

Robust Deep Multi-modal Learning Based on Ga…


07/17/2018 ∙ by Jaekyum Kim ∙ 2

VIEW MORE

Three of the commonest sigmoid functions: the logistic function, the hyperbolic tangent, and
the arctangent. All share the same basic S shape.

Logistic Sigmoid Function Formula

One of the commonest sigmoid functions is the logistic sigmoid function. This is often referred
to as the Sigmoid Function in the eld of machine learning. The logistic sigmoid function is
de ned as follows:

Mathematical de nition of the logistic sigmoid function, a common sigmoid function


The logistic function takes any real-valued input, and outputs a value between zero and one.

Hyperbolic Tangent Function Formula

Another common sigmoid function is the hyperbolic function. This maps any real-valued input to
the range between -1 and 1.
https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 3/15
10/9/2020 Sigmoid Function Definition | DeepAI

Mathematical de nition of the hyperbolic tangent

Arctangent Function Formula

A third alternative sigmoid function is the arctangent, which is the inverse of the tangent
function.

The arctangent function

The arctangent function maps any real-valued input to the range −π/2 to π/2. 

In the below graphs we can see both the tangent curve, a well-known trigonometric function,
and the arctangent, its inverse:

Calculating the Sigmoid Function


Example Calculation of Logistic Sigmoid Function

Taking the logistic sigmoid function, we can evaluate the value of the function at several key
points to understand the function's form.

At x = 0, the logistic sigmoid function evaluates to:

https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 4/15
10/9/2020 Sigmoid Function Definition | DeepAI

This is useful for the interpretation of the sigmoid as a probability in a logistic regression model,
because it shows that a zero input results in an output of 0.5, indicating equal probabilities of
both classes.

At x = 1, we nd a slightly larger value:

and by x = 5, the value of the sigmoid function becomes very close to 1.

In fact, in the limit of x tending towards in nity, the sigmoid function converges to 1, and
towards -1 in the case of negative in nity, but the derivative of the function never reaches zero.
These are very useful properties of the sigmoid function, as it tends towards a limit but always
has a nonzero gradient.

Example Calculation of Hyperbolic Tangent Function


Similarly, we can calculate the value of the tanh function at these key points. Rather than being
centered around 0.5, the tanh function is centered at 0.

https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 5/15
10/9/2020 Sigmoid Function Definition | DeepAI

At 1, the tanh function has increased relatively much more rapidly than the logistic function:

And nally, by 5, the tanh function has converged much more closely to 1, within 5 decimal
places:

In fact, both the hyperbolic tangent and arctangent functions converge much more rapidly than
the logistic sigmoid function. 

Example calculation of the Arctangent Function

We can evaluate the arctangent function at the same points to see where it converges:

Note that in contrast to the other two sigmoid functions shown above, the arctangent
converges to π/2 rather than 1. Furthermore, the arctangent converges more slowly, as at x = 5 it
is not even close to its nal value. Only by quite large numbers, such as x = 5000, does the
arctangent get very close to  π/2.

Summary of three sigmoid functions


https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 6/15
10/9/2020 Sigmoid Function Definition | DeepAI

We can compare the key properties of the three sigmoid functions shown above in a table:

Sigmoid function Logistic function tanh arctan

Value in the limit x →-∞ 0 -1 -π/2

Value at x = 0 0.5 0 0

Value in the limit x →∞ 1 1 π/2

Converges Fast Very fast Very slow

Sigmoid Function vs. ReLU


In modern arti cial neural networks, it is common to see in place of the sigmoid function, the
recti er, also known as the recti ed linear unit, or ReLU, being used as the activation function.
The ReLU is de ned as:

De nition of the recti er activation function

https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 7/15
10/9/2020 Sigmoid Function Definition | DeepAI

Graph of the ReLU function

The ReLU function has several main advantages over a sigmoid function in a neural network. The
main advantage is that the ReLU function is very fast to calculate. In addition, an activation
potential in a biological neural network does not continue to change for negative inputs, so the
ReLU seems closer to the biological reality if a goal is to mimic biological systems.

In addition, for positive x

 the ReLU function has a constant gradient of 1, whereas a sigmoid function has a gradient that
rapidly converges towards 0. This property makes neural networks with sigmoid activation
functions slow to train. This phenomenon is known as the vanishing gradient problem. The
choice of ReLU as an activation function alleviates this problem because the gradient of the
ReLU is always 1 for positive

x and so the learning process will not be slowed down by the gradient becoming small.

However, the zero gradient for negative x can pose a similar problem, known as the zero
gradient problem, but it is possible to compensate for this by adding a small linear term in x to
give the ReLU function a nonzero slope at all points.

Applications of Sigmoid Function


Logistic sigmoid function in logistic regression
A key area of machine learning where the sigmoid function is essential is a logistic regression
model. A logistic regression model is used to estimate the probability of a binary event, such as
dead vs alive, sick vs well, fraudulent vs honest transaction, etc. It outputs a probability value
between 0 and 1.

In logistic regression, a logistic sigmoid function is t to a set of data where the independent
variable(s) can take any real value, and the dependent variable is either 0 or 1.
https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 8/15
10/9/2020 Sigmoid Function Definition | DeepAI

For example, let us imagine a dataset of tumor measurements and diagnoses. Our aim is to
predict the probability of a tumor spreading, given its size in centimeters.

Some measurements of tumor dimensions and outcomes

Plotting the entire dataset, we have a general trend that, the larger the tumor, the more likely it
is to have spread, although there is a clear overlap of both classes in the range 2.5 cm to 3.5 cm:

https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 9/15
10/9/2020 Sigmoid Function Definition | DeepAI

A plot of tumor outcomes versus tumor dimensions

Using logistic regression, we can model the tumor status y (0 or 1) as a function of tumor
size x using the logistic sigmoid formula:

where we need to nd the optimal values m and b, which allow us to shift and stretch the
sigmoid curve to match the data.

In this case, tting the sigmoid curve gives us the following values:

We can put these values back into the sigmoid formula and plot the curve:

https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 10/15
10/9/2020 Sigmoid Function Definition | DeepAI

This means that, for example, given a tumor of size 3cm, our logistic regression model would
predict the probability of this tumor spreading as:

Intuitively, this makes sense. In the original data, we can see that the tumors around 3cm are
more or less evenly distributed between both classes.

Let us consider a tumor of size 6 cm. All tumors in the original dataset of size 4 cm or greater
had spread, so we would expect that our model would return a high likelihood of the tumor
spreading:

The model has returned a probability very close to 1, indicating the near certainty that y = 1.

This shows how sigmoid functions, and the logistic function in particular, are extremely powerful
for probability modeling.

Why is the logistic function used in logistic regression, and not another sigmoid
function?

The reason that the logistic function is used in logistic regression, and none of the other sigmoid
variants, is not just due to the fact that it conveniently returns values between 0 and 1. Logistic
regression is derived from the assumption that data in both classes is normally distributed.
https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 11/15
10/9/2020 Sigmoid Function Definition | DeepAI

Let us imagine that non-spreading tumors and spreading tumors each follow a normal
distribution. The non-spreading tumors are normally distributed with mean 1.84 cm and
standard deviation 1 cm, and the spreading tumors are normally distributed with mean 4.3 cm,
also with standard deviation 1 cm. We can plot both the probability density function of both
these normal distributions:

At each point we can calculate the odds ratio of the two distributions, which is the probability
density function of the spread tumors divided by the sum of both probability density functions
(non-spreading + spread tumors):

Plotting the odds ratio as a function of x, we can see that the result is the original logistic
sigmoid curve.

https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 12/15
10/9/2020 Sigmoid Function Definition | DeepAI

The reason that the logistic function is chosen for logistic regression is due to an assumption we
are modeling two classes which are both normally distributed, and the logistic function naturally
arises from the ratio of normal probability density functions.

Sigmoid function as activation function in arti cial neural networks

An arti cial neural network consists of several layers of functions, layered on top of each other:

A feedforward neural network with two hidden layers

Each layer typically contains some weights and biases and functions like a small linear
regression. A crucial part of the layers is also the activation function.

Formula for the rst hidden layer of a feedforward neural network, with weights denoted by
W and biases by b, and activation function g.

However, if every layer in the neural network were to contain only weights and biases, but no
activation function, the entire network would be equivalent to a single linear combination of
weights and biases. In other words, the formula for the neural network could be factorized and
simpli ed down to a simple linear regression model. Such a model would be able to pick up very
simple linear dependencies but unable to perform the impressive tasks that neural networks are
renowned for, such as image and voice recognition.

Activation functions were introduced between layers in neural networks in order to introduce a
non-linearity. Originally sigmoid functions such as the logistic function, arctangent, and
hyperbolic tangent were used, and today ReLU and its variants are very popular. All activation
https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 13/15
10/9/2020 Sigmoid Function Definition | DeepAI

functions serve the same purpose: to introduce a non-linearity into the network. Sigmoid
functions were chosen as some of the rst activation functions thanks to their perceived
similarity with the activation potential in biological neural networks.

Thanks to the use of a sigmoid function at various points within a multi-layer neural network,
neural networks can be built to have successive layers pick up on ever more sophisticated
features of an input example.

Sigmoid Function History


In 1798, the English cleric and economist Thomas Robert Malthus published a book under a
pseudonym called An Essay on the Principle of Population, asserting that the population was
increasing in a geometric progression (doubling every 25 years) while food supplies were
increasing arithmetically, and that the di erence between the two was due to cause widespread
famine.

In the late 1830s, the Belgian mathematician Pierre François Verhulst was experimenting with
di erent ways of modeling population growth, and wanted to account for the fact that a
population's growth is ultimately self-limiting, and does not increase exponentially forever.
Verhulst chose the logistic function as a logical adjustment to the simple exponential model, in
order to model the slowing down of a population's growth which occurs when a population
begins to exhaust its resources.

Over the next century, biologists and other scientists began to use the sigmoid function as a
standard tool for modeling population growth, from bacterial colonies to human civilizations.

In 1943, Warren McCulloch and Walter Pitts developed an arti cial neural network model using a
hard cuto as an activation function, where a neuron outputs 1 or 0 depending on whether its
input is above or below a threshold.

In 1972, the biologists Hugh Wilson and Jack Cowan at the University of Chicago were
attempting to model biological neurons computationally and published the Wilson–Cowan
model, where a neuron sends a signal to another neuron if it receives a signal greater than an

activation potential. Wilson and Cowan chose the logistic sigmoid function to model the
activation of a neuron as a function of a stimulus.

From the 1970s and 1980s onwards, a number of researchers began to use sigmoid functions in
formulations of arti cial neural networks, taking inspiration from biological neural networks. In
1998, Yann LeCun chose the hyperbolic tangent as an activation function in his groundbreaking
convolutional neural network LeNet, which was the rst to be able to recognize handwritten
https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 14/15
10/9/2020 Sigmoid Function Definition | DeepAI

digits to a practical level of accuracy.

In recent years, arti cial neural networks have moved away from sigmoid functions in favor of
the ReLU function, since all the variants of the sigmoid function are computationally intensive to
calculate, and the ReLU provides the necessary nonlinearity to take advantage of the depth of
the network, while also being very fast to compute.

References
Malthus, An Essay on the Principle of Population, 1798

Verhulst, Notice sur la loi que la population suit dans son accroissement (1838)

McCulloch and Pitts, A logical calculus of the ideas immanent in nervous activity (1943)

Wilson and Cowan, Excitatory and inhibitory interactions in localized populations of model
neurons (1972)

https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function 15/15

You might also like