0% found this document useful (0 votes)
151 views3 pages

Statistical Distance

Statistical distance quantifies the difference between two statistical objects like random variables or probability distributions. It measures the distance between probability distributions and how dependent or independent random variables are. Statistical distances are not always metrics as they may not satisfy properties like symmetry or the triangle inequality. Common statistical distances include total variation distance, Hellinger distance, and Kullback-Leibler divergence.

Uploaded by

watson191
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
151 views3 pages

Statistical Distance

Statistical distance quantifies the difference between two statistical objects like random variables or probability distributions. It measures the distance between probability distributions and how dependent or independent random variables are. Statistical distances are not always metrics as they may not satisfy properties like symmetry or the triangle inequality. Common statistical distances include total variation distance, Hellinger distance, and Kullback-Leibler divergence.

Uploaded by

watson191
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Statistical distance

In statistics, probability theory, and information theory, a statistical distance quantifies the distance
between two statistical objects, which can be two random variables, or two probability distributions or
samples, or the distance can be between an individual sample point and a population or a wider sample of
points.

A distance between populations can be interpreted as measuring the distance between two probability
distributions and hence they are essentially measures of distances between probability measures. Where
statistical distance measures relate to the differences between random variables, these may have statistical
dependence,[1] and hence these distances are not directly related to measures of distances between
probability measures. Again, a measure of distance between random variables may relate to the extent of
dependence between them, rather than to their individual values.

Statistical distance measures are not typically metrics, and they need not be symmetric. Some types of
distance measures, which generalize squared distance, are referred to as (statistical) divergences.

Terminology
Many terms are used to refer to various notions of distance; these are often confusingly similar, and may be
used inconsistently between authors and over time, either loosely or with precise technical meaning. In
addition to "distance", similar terms include deviance, deviation, discrepancy, discrimination, and
divergence, as well as others such as contrast function and metric. Terms from information theory include
cross entropy, relative entropy, discrimination information, and information gain.

Distances as metrics

Metrics

A metric on a set X is a function (called the distance function or simply distance) d : X × X → R+ (where
R+ is the set of non-negative real numbers). For all x, y, z in X, this function is required to satisfy the
following conditions:

1. d(x, y) ≥ 0     (non-negativity)


2. d(x, y) = 0   if and only if   x = y     (identity of indiscernibles. Note that condition 1 and 2
together produce positive definiteness)
3. d(x, y) = d(y, x)     (symmetry)
4. d(x, z) ≤ d(x, y) + d(y, z)     (subadditivity / triangle inequality).

Generalized metrics
Many statistical distances are not metrics, because they lack one or more properties of proper metrics. For
example, pseudometrics violate property (2), identity of indiscernibles; quasimetrics violate property (3),
symmetry; and semimetrics violate property (4), the triangle inequality. Statistical distances that satisfy (1)
and (2) are referred to as divergences.

Statistically close
The variation distance of two distributions and over a finite domain , (often referred to as statistical
difference[2] or statistical distance[3] in cryptography) is defined as

We say that two probability ensembles and are statistically close if is a


negligible function in .

Examples

Metrics
Total variation distance (sometimes just called "the" statistical distance)
Hellinger distance
Lévy–Prokhorov metric
Wasserstein metric: also known as the Kantorovich metric, or earth mover's distance
Mahalanobis distance

Divergences
Kullback–Leibler divergence
Rényi divergence
Jensen–Shannon divergence
Bhattacharyya distance (despite its name it is not a distance, as it violates the triangle
inequality)
f-divergence: generalizes several distances and divergences
Discriminability index, specifically the Bayes discriminability index, is a positive-definite
symmetric measure of the overlap of two distributions.

See also
Probabilistic metric space
Randomness extractor
Similarity measure
Zero-knowledge proof

Notes
1. Dodge, Y. (2003)—entry for distance
2. Goldreich, Oded (2001). Foundations of Cryptography: Basic Tools (1st ed.). Berlin:
Cambridge University Press. p. 106. ISBN 0-521-79172-3.
3. Reyzin, Leo. (Lecture Notes) Extractors and the Leftover Hash Lemma (http://www.cs.bu.ed
u/~reyzin/teaching/s11cs937/notes-leo-1.pdf)

External links
Distance and Similarity Measures (Wolfram Alpha) (http://reference.wolfram.com/mathematic
a/guide/DistanceAndSimilarityMeasures.html)

References
Dodge, Y. (2003) Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9

Retrieved from "https://en.wikipedia.org/w/index.php?title=Statistical_distance&oldid=1164800311"

You might also like