Normalizing Constant
Normalizing Constant
In probability theory, a normalizing constant or normalizing factor is used to reduce any probability
function to a probability density function with total probability of one.
For example, a Gaussian function can be normalized into a probability density function, which gives the
standard normal distribution. In Bayes' theorem, a normalizing constant is used to ensure that the sum of all
possible hypotheses equals 1. Other uses of normalizing constants include making the value of a Legendre
polynomial at 1 and in the orthogonality of orthonormal functions.
A similar concept has been used in areas other than probability, such as for polynomials.
Definition
In probability theory, a normalizing constant is a constant by which an everywhere non-negative function
must be multiplied so the area under its graph is 1, e.g., to make it a probability density function or a
probability mass function.[1][2]
Examples
If we start from the simple Gaussian function
Now if we use the latter's reciprocal value as a normalizing constant for the former, defining a function
as
then the function is a probability density function.[3] This is the density of the standard normal
distribution. (Standard, in this case, means the expected value is 0 and the variance is 1.)
and consequently
is a probability mass function on the set of all nonnegative integers.[4] This is the probability mass function
of the Poisson distribution with expected value λ.
Note that if the probability density function is a function of various parameters, so too will be its
normalizing constant. The parametrised normalizing constant for the Boltzmann distribution plays a central
role in statistical mechanics. In that context, the normalizing constant is called the partition function.
Bayes' theorem
Bayes' theorem says that the posterior probability measure is proportional to the product of the prior
probability measure and the likelihood function. Proportional to implies that one must multiply or divide by
a normalizing constant to assign measure 1 to the whole space, i.e., to get a probability measure. In a simple
discrete case we have
where P(H0 ) is the prior probability that the hypothesis is true; P(D|H0 ) is the conditional probability of the
data given that the hypothesis is true, but given that the data are known it is the likelihood of the hypothesis
(or its parameters) given the data; P(H0 |D) is the posterior probability that the hypothesis is true given the
data. P(D) should be the probability of producing the data, but on its own is difficult to calculate, so an
alternative way to describe this relationship is as one of proportionality:
Since P(H|D) is a probability, the sum over all possible (mutually exclusive) hypotheses should be 1,
leading to the conclusion that
is the normalizing constant.[5] It can be extended from countably many hypotheses to uncountably many
by replacing the sum by an integral.
For concreteness, there are many methods of estimating the normalizing constant for practical purposes.
Methods include the bridge sampling technique, the naive Monte Carlo estimator, the generalized harmonic
mean estimator, and importance sampling.[6]
Non-probabilistic uses
The Legendre polynomials are characterized by orthogonality with respect to the uniform measure on the
interval [−1, 1] and the fact that they are normalized so that their value at 1 is 1. The constant by which
one multiplies a polynomial so its value at 1 is a normalizing constant.
The constant 1/√2 is used to establish the hyperbolic functions cosh and sinh from the lengths of the
adjacent and opposite sides of a hyperbolic triangle.
See also
Normalization (statistics)
Notes
1. Continuous Distributions at University of Alabama.
2. Feller, 1968, p. 22.
3. Feller, 1968, p. 174.
4. Feller, 1968, p. 156.
5. Feller, 1968, p. 124.
6. Gronau, Quentin (2020). "bridgesampling: An R Package for Estimating Normalizing
Constants" (https://cran.r-project.org/web/packages/bridgesampling/vignettes/bridgesamplin
g_paper.pdf) (PDF). The Comprehensive R Archive Network. Retrieved September 11,
2021.
References
Continuous Distributions (http://www.math.uah.edu/stat/dist/Continuous.xhtml) at
Department of Mathematical Sciences: University of Alabama in Huntsville
Feller, William (1968). An Introduction to Probability Theory and its Applications (volume I).
John Wiley & Sons. ISBN 0-471-25708-7.