0% found this document useful (0 votes)

112 views27 pages

Cram Er-Rao Lower Bound and Information Geometry: 1 Introduction and Historical Background

1) C.R. Rao published a seminal paper in 1945 that established a fundamental lower bound on the variance of statistical estimators, known as the Cramér-Rao lower bound (CRLB). 2) Rao also introduced differential geometric modeling ideas to statistics, opening up the new field of information geometry. 3) The CRLB establishes that there is a theoretical limit on how accurately statistical parameters can be estimated from sample data. It answers the question of what the best possible accuracy of an estimator can be.

Uploaded by

Lokingrobert

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

112 views27 pages

Cram Er-Rao Lower Bound and Information Geometry: 1 Introduction and Historical Background

Uploaded by

Lokingrobert

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Cramér-Rao Lower Bound and

Information Geometry∗
arXiv:1301.3578v2 [cs.IT] 24 Jan 2013

Frank Nielsen
Sony Computer Science Laboratories Inc., Japan
École Polytechnique, LIX, France

1 Introduction and historical background

This article focuses on an important piece of work of the world renowned
Indian statistician, Calyampudi Radhakrishna Rao. In 1945, C. R. Rao (25
years old then) published a pathbreaking paper [43], which had a profound
impact on subsequent statistical research. Roughly speaking, Rao obtained a
lower bound to the variance of an estimator. The importance of this work can
be gauged, for instance, by the fact that it has been reprinted in the volume
Breakthroughs in Statistics: Foundations and Basic Theory [32]. There have
been two major impacts of this work:

• First, it answers a fundamental question statisticians have always been

interested in, namely, how good can a statistical estimator be? Is there
a fundamental limit when estimating statistical parameters?

• Second, it opens up a novel paradigm by introducing differential geo-

metric modeling ideas to the field of Statistics. In recent years, this
To appear in Connected at Infinity II: On the work of Indian mathematicians (R.
∗

Bhatia and C.S. Rajan, Eds.), special volume of Texts and Readings In Mathematics
(TRIM), Hindustan Book Agency, 2013. http://www.hindbook.com/trims.php

1
contribution has led to the birth of a flourishing field of Information
Geometry [6].

It is interesting to note that H. Cramér [20] (1893-1985) also dealt with

the same problem in his classic book Mathematical Methods of Statistics,
published in 1946, more or less at the same time Rao’s work was published.
The result is widely acknowledged nowadays as the Cramér-Rao lower bound
(CRLB). The lower bound was also reported independently1 in the work of
M. Fréchet [27] (uniparameter case) and G. Darmois [22] (multi-parameter
case). The Fréchet-Darmois work were both published in French, somewhat
limiting its international scientific exposure. Thus the lower bound is also
sometimes called the Cramér-Rao-Fréchet-Darmois lower bound.
This review article is organized as follows: Section 2 introduces the two
fundamental contributions in C. R. Rao’s paper:

• The Cramér-Rao lower bound (CRLB), and

• The Fisher-Rao Riemannian geometry.

Section 3 concisely explains how information geometry has since evolved

into a full-fledged discipline. Finally, Section 4 concludes this review by
discussing further perspectives of information geometry and hinting at the
future challenges.

2 Two key contributions to Statistics

To begin with, we describe the two key contributions of Rao [43], namely a
lower bound to the variance of an estimator and Rao’s Riemannian informa-
tion geometry.
1
The author thanks F. Barbaresco for bringing the historical references to his attention.

2
2.1 Rao’s lower bound for statistical estimators
For a fixed integer n ≥ 2, let {X1 , ..., Xn } be a random sample of size n
on a random variable X which has a probability density function (pdf) (or,
probability mass function (pmf)) p(x). Suppose the unknown distribution
p(x) belongs to a parameterized family F of distributions

F = {pθ (x) | θ ∈ Θ},

where θ is a parameter vector belonging to the parameter space Θ. For

example, F can be chosen as the family FGaussian of all normal distributions
with parameters θ = (µ, σ) (with θ ∈ Θ = R × R+ ):
The unknown distribution p(x) = pθ∗ (x) ∈ F is identified by a unique
parameter θ∗ ∈ Θ. One of the major problems in Statistics is to build an
“estimator” of θ∗ on the basis of the sample observations {X1 , . . . , Xn }.
There are various estimation procedures available in the literature, e.g.,
the method of moments and the method of maximum likelihood; for a more
comprehensive account on estimation theory, see e.g., [33]. From a given
sample of fixed size n, one can get several estimators of the same parameter.
A natural question then is: which estimator should one use and how their
performance compare to each other. This is related precisely with C. R.
Rao’s first contribution in his seminal paper [43]. Rao addresses the following
question:
What is the accuracy attainable in the estimation of statistical parameters?
Before proceeding further, it is important to make some observations on
the notion of likelihood, introduced by Sir R. A. Fisher [26]. Let {X1 , . . . , Xn }
be a random vector with pdf (or, pmf) pθ (x1 , . . . , xn ), θ ∈ Θ, where for
1 ≤ i ≤ n, xi is a realization of Xi . The function

L(θ; x1 , . . . , xn ) = pθ (x1 , . . . , xn ),

considered as a function of θ, is called the likelihood function. If X1 , . . . , Xn

are independent and identically distributed random variables with pdf (or,

3
pmf) pθ (x) (for instance, if X1 , . . . , Xn is a random sample from pθ (x)), the
likelihood function is
n
Y
L(θ; x1 , . . . , xn ) = pθ (xi ).
i=1

The method of maximum likelihood estimation consists of choosing an

estimator of θ, say θ̂ that maximizes L(θ; x1 , . . . , xn ). If such a θ̂ exists, we
call it a maximum likelihood estimator (MLE) of θ. Maximizing the likeli-
hood function is mathematically equivalent to maximizing the log-likelihood
function since the logarithm function is a strictly increasing function. The
log-likelihood function is usually simpler to optimize. We shall write l(x, θ)
to denote the log-likelihood function with x = (x1 , . . . , xn ). Finally, we recall
the definition of an unbiased estimator. Let {pθ , θ ∈ Θ} be a set of probabil-
ity distribution functions. An estimator T is said to be an unbiased estimator
of θ if the expectation of T ,

Eθ (T ) = θ, for all θ ∈ Θ.

Consider probability distributions with pdf (or, pmf) satisfying the fol-
lowing regularity conditions:

• The support {x | pθ (x) > 0} is identical for all distributions (and thus
does not depend on θ),
R
• pθ (x)dx can be differentiated under the integral sign with respect to
θ,

• The gradient ∇θ pθ (x) exists.

We are now ready to state C. R. Rao’s fundamental limit of estimators.

2.1.1 Rao’s lower bound: Single parameter case

Let us first consider the case of uni-parameter distributions like Poisson dis-
tributions with mean parameter λ. These families are also called order-1

4
families of probabilities. The C. R. Rao lower bound in the case of uni-
parameter distributions can be stated now.

Theorem 1 (Rao lower bound (RLB)) Suppose the regularity condi-

tions stated above hold. Then the variance of any unbiased estimator θ̂,
based on an independent and identically distributed (IID) random sample of
1
size n, is bounded below by nI(θ ∗ ) , where I(θ) denotes the Fisher information

in a single observation, defined as

2
d2 l(x; θ)
Z
d l(x; θ)
I(θ) = −Eθ = − pθ (x)dx.
dθ2 dθ2

As an illustration, consider the family of Poisson distributions with pa-

rameter θ = λ. One can check that the regularity conditions hold. For a
x
Poisson distribution with parameter λ, l(x; λ) = −λ + log λx! and thus,

x
l′ (x; λ) = −1 + ,
λ
′′ x
l (x; λ) = − 2 .
λ
The first derivative is technically called the score function. It follows that
2
d l(x; λ)
I(λ) = −Eλ ,
dλ2
1 1
= 2
Eλ [x] =
λ λ
since E[X] = λ for a random variable X following a Poisson distribution
with parameter λ: X ∼ Poisson(λ). What the RLB theorem states in plain
words is that for any unbiased estimator λ̂ based on an IID sample of size n
of a Poisson distribution with parameter θ∗ = λ∗ , the variance of λ̂ cannot
1 ∗
go below nI(λ ∗ ) = λ /n.

The Fisher information, defined as the variance of the score, can be geo-
metrically interpreted as the curvature of the log-likelihood function. When
the curvature is low (log-likelihood curve is almost flat), we may expect some

5
large amount of deviation from the optimal θ∗ . But when the curvature is
high (peaky log-likelihood), we rather expect a small amount of deviation
from θ∗ .

2.1.2 Rao’s lower bound: Multi-parameter case

For d-dimensional multi-parameter2 distributions, the Fisher information ma-

trix I(θ) is defined as the symmetric matrix with the following entries [6]:

∂ ∂
[I(θ)]ij = Eθ log pθ (x) log pθ (x) , (1)
∂θi ∂θj
Z
∂ ∂
= log pθ (x) log pθ (x) pθ (x)dx. (2)
∂θi ∂θj

Provided certain regularity conditions are met (see [6], section 2.2), the
Fisher information matrix can be written equivalently as:
∂2

[I(θ)]ij = −Eθ log pθ (x) ,
∂θi ∂θj
or as:
∂ p ∂ p
Z
[I(θ)]ij = 4 pθ (x) pθ (x)dx.
x∈X ∂θi ∂θj
In the case of multi-parameter distributions, the lower bound on the ac-
curacy of unbiased estimators can be extended using the Löwner partial
ordering on matrices defined by A B ⇔ A − B 0, where M 0 means
M is positive semidefinite [11] (We similarly write M ≻ 0 to indicate that
M is positive definite).
The Fisher information matrix is always positive semi-definite [33]. It
can be shown that the Fisher information matrix of regular probability dis-
tributions is positive definite, and therefore always invertible. Theorem 1 on
the lower bound on the inaccuracy extends to the multi-parameter setting as
follows:
2
Multi-parameter distributions can be univariate like the 1D Gaussians N (µ, σ) or
multivariate like the Dirichlet distributions or dD Gaussians.

6
Theorem 2 (Multi-parameter Rao lower bound (RLB)) Let θ be a
vector-valued parameter. Then for an unbiased estimator θ̂ of θ∗ based on
a IID random sample of n observations, one has V [θ̂] n−1 I −1 (θ∗ ), where
V [θ̂] now denotes the variance-covariance matrix of θ̂ and I −1 (θ∗ ) denotes the
inverse of the Fisher information matrix evaluated at the optimal parameter
θ∗ .

As an example, consider a IID random sample of size n from a normal

population N(µ∗ , σ ∗ 2 ), so that θ∗ = (µ∗ , σ ∗ 2 ). One can then verify that the
Fisher information matrix of a normal distribution N(µ, σ 2 ) is given by
" #
1
2 0
I(θ) = σ .
0 2σ1 4

Therefore, " #
n−1 σ ∗ 2 0
V [θ̂] n−1 I(θ∗ )−1 = −1 ∗4
.
0 2n σ
There has been a continuous flow of research along the lines of the CRLB,
including the case where the Fisher information matrix is singular (positive
semidefinite, e.g. in statistical mixture models). We refer the reader to the
book of Watanabe [47] for a modern algebraic treatment of degeneracies in
statistical learning theory.

2.2 Rao’s Riemannian information geometry

What further makes C. R. Rao’s 1945 paper a truly impressive milestone
in the development of Statistics is the introduction of differential geometric
methods for modeling population spaces using the Fisher information matrix.
Let us review the framework that literally opened up the field of information
geometry [6].
Rao [43] introduced the notions of the Riemannian Fisher information
metric and geodesic distance to the Statisticians. This differential ge-
ometrization of Statistics gave birth to what is known now as the field of

7
information geometry [6]. Although there were already precursor geomet-
ric work [35, 12, 36] linking geometry to statistics by the Indian commu-
nity (Professors Mahalanobis and Bhattacharyya), none of them studied the
differential concepts and made the connection with the Fisher information
matrix. C. R. Rao is again a pioneer in offering Statisticians the geometric
lens.

2.2.1 The population space

Consider a family of parametric probability distribution pθ (x) with x ∈ Rd

and θ ∈ RD denoting the D-dimensional parameters of distributions (order
of the probability family). The population parameter space is the space
Z
D
Θ= θ∈R pθ (x)dx = 1 .

A given distribution pθ (x) is interpreted as a corresponding point indexed by

θ ∈ RD . θ also encodes a coordinate system to identify probability models:
θ ↔ pθ (x).
Consider now two infinitesimally close points θ and θ+dθ. Their probabil-
ity densities differ by their first order differentials: dp(θ). The distribution
of dp over all the support aggregates the consequences of replacing θ by
θ + dθ. Rao’s revolutionary idea was to consider the relative discrepancy dp p
and to take the variance of this difference distribution to define the following
quadratic differential form:
D X
X D
ds2 (θ) = gij (θ)dθi dθj ,
i=1 j=1

= (∇θ)⊤ G(θ)∇θ,

with the matrix entries of G(θ) = [gij (θ)] as

1 ∂p 1 ∂p
gij (θ) = Eθ = gji (θ).
p(θ) ∂θi p(θ) ∂θj
∂
In differential geometry, we often use the symbol ∂i as a shortcut to ∂θi
.

8
The elements gij (θ) form the quadratic differential form defining the el-
ementary length of Riemannian geometry. The matrix G(θ) = [gij (θ)] ≻ 0
is positive definite and turns out to be equivalent to the Fisher information
matrix: G(θ) = I(θ). The information matrix is invariant to monotonous
transformations of the parameter space [43] and makes it a good candidate
for a Riemannian metric.
We shall discuss later more on the concepts of invariance in statistical
manifolds [18, 38].
In [43], Rao proposed a novel versatile notion of statistical distance in-
duced by the Riemannian geometry beyond the traditional Mahalanobis D-
squared distance [35] and the Bhattacharyya distance [12]. The Mahalanobis
D-squared distance [35] of a vector x to a group of vectors with covariance
matrix Σ and mean µ is defined originally as

DΣ2 (x, µ) = (x − µ)⊤ Σ−1 (x − µ).

p
The generic Mahalanobis distance DM (p, q) = (p − q)⊤ M(p − q) (with M
positive definite) generalizes the Euclidean distance (M chosen as the identity
matrix).
The Bhattacharyya distance [12] between two distributions indexed by
parameters θ1 and θ2 is defined by
Z p
B(θ1 , θ2 ) = − log pθ1 (x)pθ2 (x)dx.
x∈X

Although the Mahalanobis distance DM is a metric (satisfying the triangle

inequality and symmetry), the symmetric Bhattacharyya distance fails the
triangle inequality. Nevertheless, it can be used to define the Hellinger metric
distance H whose square is related the Bhattacharyya distance as follows

1
Z p p
2
H (θ1 , θ2 ) = ( pθ1 (x) − pθ2 (x))2 dx = 1 − e−B(θ1 ,θ2 ) ≤ 1 (3)
2

9
2.2.2 Rao’s distance: Riemannian distance between two popula-
tions

Let P1 and P2 be two points of the population space corresponding to the

distributions with respective parameters θ1 and θ2 . In Riemannian geom-
etry, the geodesics are the shortest paths. For example, the geodesics on
the sphere are the arcs of great circles. The statistical distance between the
two populations is defined by integrating the infinitesimal element lengths ds
along the geodesic linking P1 and P2 . Equipped with the Fisher information
matrix tensor I(θ), the Rao distance D(·, ·) between two distributions on a
statistical manifold can be calculated from the geodesic length as follows:
Z 1 p
D(pθ1 (x), pθ2 (x)) = min (∇θ)⊤ I(θ)∇θ dt
θ(t)
0
θ(0)=θ1 ,θ(1)=θ2

Therefore we need to calculate explicitly the geodesic linking pθ1 (x) to pθ2 (x)
to compute Rao’s distance. This is done by solving the following second
order ordinary differential equation (ODE) [6]:

gki θ¨i + Γk,ij θ˙i θ˙j = 0,

where Einstein summation [6] convention has been used to simplify the math-
ematical writing by removing the leading sum symbols. The coefficients Γk,ij
are the Christoffel symbols of the first kind defined by:

1 ∂gik ∂gkj ∂gij
Γk,ij = + − .
2 ∂θj ∂θi ∂θk

For a parametric statistical manifold with D parameters, there are D 3

Christoffel symbols. In practice, it is difficult to explicitly compute the
geodesics of the Fisher-Rao geometry of arbitrary models, and one needs
to perform a gradient descent to find a local solution for the geodesics [41].
This is a drawback of the Rao’s distance as it has to be checked manually
whether the integral admits a closed-form expression or not.

10
To give an example of the Rao distance, consider the smooth manifold of
univariate normal distributions, indexed by the θ = (µ, σ) coordinate system.
The Fisher information matrix is
" #
1
2 0
I(θ) = σ ≻ 0. (4)
0 σ22

The infinitesimal element length is:

ds2 = (∇θ)⊤ I(θ)∇θ,

dµ2 2dσ 2
= + 2 .
σ2 σ
After the minimization of the path length integral, the Rao distance be-
tween two normal distributions [43, 8] θ1 = (µ1 , σ1 ) and θ2 = (µ2 , σ2 ) is given
by:  √
σ2
 2 log σ1 if µ1 = µ2 ,


|µ1 −µ2 |
D(θ1 , θ2 ) = if σ1 = σ2 = σ,
 √ σ tan a1
 2 log
 2
a2 otherwise.
tan 2
σ1 σ2
where a1 = arcsin b12
, a2 = arcsin b12
and

(µ1 − µ2 )2 − 2(σ22 − σ12 )

b12 = σ12 + .
8(µ1 − µ2 )2

For univariate normal distributions, Rao’s distance amounts to computing

the hyperbolic distance for H( √12 ), see [34].
Statistical distances play a key role in tests of significance and classifica-
tion [43]. Rao’s distance is a metric since it is a Riemannian geodesic dis-
tance, and thus satisfies the triangle inequality. Rao’s Riemannian geometric
modeling of the population space is now commonly called the Fisher-Rao
geometry [37]. One drawback of the Fisher-Rao geometry is the computer
tractability of dealing with Riemannian geodesics. The following section
concisely reviews the field of information geometry.

11
3 A brief overview of information geometry
Since the seminal work of Rao [6] in 1945, the interplay of differential ge-
ometry with statistics has further strengthened and developed into a new
discipline called information geometry with a few dedicated monographs
[5, 40, 30, 6, 46, 7]. It has been proved by Chentsov and published in his Rus-
sian monograph in 1972 (translated in English in 1982 by the AMS [18]) that
the Fisher information matrix is the only invariant Riemannian metric for
statistical manifolds (up to some scalar factor). Furthermore, Chentsov [18]
proved that there exists a family of connections, termed the α-connections,
that ensures statistical invariance.

3.1 Statistical invariance and f -divergences

A divergence is basically a smooth statistical distance that may not be sym-
metric nor satisfy the triangle inequality. We denote by D(p : q) the di-
vergence from distribution p(x) to distribution q(x), where the “:” notation
emphasizes the fact that this dissimilarity measure may not be symmetric:
D(p : q) 6= D(q : p).
It has been proved that the only statistical invariant divergences [6, 42]
are the Ali-Silvey-Csiszár f -divergences Df [1, 21] that are defined for a
functional convex generator f satisfying f (1) = f ′ (1) = 0 and f ′′ (1) = 1 by:

q(x)
Z
Df (p : q) = p(x)f dx.
x∈X p(x)

Indeed, under an invertible mapping function (with dim X = dim Y = d):

m: X →Y
x 7→ y = m(x)

a probability density p(x) is converted into another density q(y) such that:

p(x)dx = q(y)dy, dy = |M(x)|dx,

12
where |M(x)| denotes the determinant of the Jacobian matrix [6] of the
transformation m (i.e., the partial derivatives):
 
∂y1 ∂y1
. . . ∂xd
 .∂x1 .
M(x) =  .. . . ...

 .

∂yd ∂yd
∂x1
. . . ∂xd

It follows that
q(y) = q(m(x)) = p(x)|M(x)|−1 .
For any two densities p1 and p2 , we have the f -divergence on the transformed
densities q1 and q2 that can be rewritten mathematically as

q2 (y)
Z
Df (q1 : q2 ) = q1 (y)f dy,
y∈Y q1 (y)

p2 (x)
Z
−1
= p1 (x)|M(x)| f |M(x)|dx,
x∈X p1 (x)
= Df (p1 : p2 ).

Furthermore, the f -divergences are the only divergences satisfying the re-
markable data-processing theorem [24] that characterizes the property of
information monotonicity [4]. Consider discrete distributions on an alphabet
X of d letters. For any partition B = X1 ∪ ...Xb of X that merge alphabet
letters into b ≤ d bins, we have

0 ≤ Df (p̄1 : p̄2 ) ≤ Df (p1 : p2 ),

where p̄1 and p̄2 are the discrete distribution induced by the partition B on
X . That is, we loose discrimination power by coarse-graining the support of
the distributions.
The most fundamental f -divergence is the Kullback-Leibler divergence
[19] obtained for the generator f (x) = x log x:

p(x)
Z
KL(p : q) = p(x) log dx.
q(x)

13
The Kullback-Leibler divergence between two distributions p(x) and q(x) is
equal to the cross-entropy H × (p : q) minus the Shannon entropy H(p):

p(x)
Z
KL(p : q) = p(x) log dx,
q(x)
= H × (p : q) − H(p).

with
Z
×
H (p : q) = −p(x) log q(x)dx,
Z
H(p) = −p(x) log p(x)dx = H × (p : p).

The Kullback-Leibler divergence KL(p̃ : p) [19] can be interpreted as the

distance between the estimated distribution p̃ (from the samples) and the
true hidden distribution p.

3.2 Information and sufficiency

In general, statistical invariance is characterized under Markov morphisms
[38, 42] (also called sufficient stochastic kernels [42]) that generalizes the de-
terministic transformations y = m(x). Loosely speaking, a geometric para-
metric statistical manifold F = {pθ (x)|θ ∈ Θ} equipped with a f -divergence
must also provide invariance by:

Non-singular parameter reparameterization. That is, if we choose a

different coordinate system, say θ′ = f (θ) for an invertible transforma-
tion f , it should not impact the intrinsic distance between the underly-
ing distributions. For example, whether we parametrize the Gaussian
manifold by θ = (µ, σ) or by θ′ = (µ3 , σ 2 ), it should preserve the dis-
tance.

Sufficient statistic. When making statistical inference, we use statistics

T : Rd → Θ ⊆ RD (e.g., the mean statistic Tn (X) = n1 ni=1 Xi is used
P

14
for estimating the parameter µ of Gaussians). In statistics, the concept
of sufficiency was introduced by Fisher [26]:
“... the statistic chosen should summarize the whole of the relevant
information supplied by the sample. ”
Mathematically, the fact that all information should be aggregated in-
side the sufficient statistic is written as

Pr(x|t, θ) = Pr(x|t).

It is not surprising that all statistical information of a parametric dis-

tribution with D parameters can be recovered from a set of D statis-
tics. For example, the univariate Gaussian with d = dim X = 1 and
D = dim Θ = 2 (for parameters θ = (µ, σ)) is recovered from the mean
and variance statistics. A sufficient statistic is a set of statistics that
compress information without loss for statistical inference.

3.3 Sufficiency and exponential families

The distributions admitting finite sufficient statistics are called the exponen-
tial families [31, 14, 6], and have their probability density or mass functions
canonically rewritten as

pθ (x) = exp(θ⊤ t(x) − F (θ) + k(x)),

where k(x) is an auxiliary carrier measure, t(x) : Rd → RD is the sufficient

statistics, and F : RD → R a strictly convex and differentiable function,
called the cumulant function or the log normalizer since,
Z
F (θ) = log exp(θ⊤ t(x) + k(x))dx.
x∈X

See [6] for canonical decompositions of usual distributions (Gaussian, multi-

nomial, etc.). The space Θ for which the log-integrals converge is called the
natural parameter space.
For example,

15
• Poisson distributions are univariate exponential distributions of order
1 (with X = N∗ = {0, 1, 2, 3, ...} and dim Θ = 1) with associated
probability mass function:
λk e−λ
,
k!
for k ∈ N∗ .
The canonical exponential family decomposition yields

– t(x) = x: the sufficient statistic,

– θ = log λ: the natural parameter,
– F (θ) = exp θ: the cumulant function,
– k(x) = − log x!: the carrier measure.

• Univariate Gaussian distributions are distributions of order 2 (with

X = R, dim X = 1 and dim Θ = 2), characterized by two parameters
θ = (µ, σ) with associated density:
1 1 x−µ 2
√ e− 2 ( σ ) ,
σ 2π
for x ∈ R.
The canonical exponential family decomposition yields:

– t(x) = (x, x2 ): the sufficient statistic,

– θ = (θ1 , θ2 ) = ( σµ2 , − 2σ12 ): the natural parameters,

θ2
– F (θ) = − 4θ12 + 21 log − θπ2 : the cumulant function,
– k(x) = 0: the carrier measure.

Exponential families provide a generic framework in Statistics, and are

universal density approximators [2]. That is, any distribution can be arbi-
trarily approximated closely by an exponential family. An exponential family
is defined by the functions t(·) and k(·), and a member of it by a natural

16
parameter θ. The cumulant function F is evaluated by the log-Laplace trans-
form.
To illustrate the generic behavior of exponential families in Statistics [14],
let us consider the maximum likelihood estimator for a distribution belonging
to the exponential family. We have the MLE θ̂:
n
!
X 1
θ̂ = (∇F )−1 t(xi ) ,
i=1
n

where (∇F )−1 denotes the reciprocal gradient of F : (∇F )−1 ◦ ∇F = ∇F ◦

(∇F )−1 = Id, the identity function on RD . The Fisher information matrix
of an exponential family is

I(θ) = ∇2 F (θ) ≻ 0,

the Hessian of the log-normalizer, always positive-definite since F is strictly

convex.

3.4 Dual Bregman divergences and α-Divergences

The Kullback-Leibler divergence between two distributions belonging to the
same exponential families can be expressed equivalently as a Bregman diver-
gence on the swapped natural parameters defined for the cumulant function
F of the exponential family:

KL(pF,θ1 (x) : pF,θ2 (x)) = BF (θ2 : θ1 ),

= F (θ2 ) − F (θ1 ) − (θ2 − θ1 )⊤ ∇F (θ1 )

As mentioned earlier, the “:” notation emphasizes that the distance is not a
metric: It does not satisfy the symmetry nor the triangle inequality in gen-
eral. Divergence BF is called a Bregman divergence [13], and is the canonical
distances of dually flat spaces [6]. This Kullback-Leibler divergence on den-
sities ↔ divergence on parameters relies on the dual canonical parameteri-
zation of exponential families [14]. A random variable X ∼ pF,θ (x), whose

17
distribution belongs to an exponential family, can be dually indexed by its
expectation parameter η such that
Z
⊤
η = E[t(X)] = xeθ t(x)−F (θ)+k(x) dx = ∇F (θ).
x∈X

For example, the η-parameterization of Poisson distribution is: η = ∇F (θ) =

eθ = λ = E[X] (since t(x) = x).
In fact, the Legendre-Fenchel convex duality is at the heart of information
geometry: Any strictly convex and differentiable function F admits a dual
convex conjugate F ∗ such that:

F ∗ (η) = max θ⊤ η − F (θ).

θ∈Θ

The maximum is attained for η = ∇F (θ) and is unique since F (θ) is strictly
convex (∇2 F (θ) ≻ 0). It follows that θ = ∇F −1 (η), where ∇F −1 denotes
the functional inverse gradient. This implies that:

F ∗ (η) = η ⊤ (∇F )−1 (η) − F ((∇F )−1(η)).

The Legendre transformation is also called slope transformation since it maps

θ → η = ∇F (θ), where ∇F (θ) is the gradient at θ, visualized as the slope
of the support tangent plane of F at θ. The transformation is an involution
for strictly convex and differentiable functions: (F ∗ )∗ = F . It follows that
gradient of convex conjugates are reciprocal to each other: ∇F ∗ = (∇F )−1 .
Legendre duality induces dual coordinate systems:

η = ∇F (θ),
θ = ∇F ∗ (η).

Furthermore, those dual coordinate systems are orthogonal to each other

since,
∇2 F (θ)∇2 F ∗ (η) = Id,
the identity matrix.

18
The Bregman divergence can also be rewritten in a canonical mixed co-
ordinate form CF or in the θ- or η-coordinate systems as

BF (θ2 : θ1 ) = F (θ2 ) + F ∗ (η1 ) − θ2⊤ η1 = CF (θ2 , η1 ) = CF ∗ (η1 , θ2 ),

= BF ∗ (η1 : η2 ).

Another use of the Legendre duality is to interpret the log-density of an

exponential family as a dual Bregman divergence [9]:

log pF,t,k,θ (x) = −BF ∗ (t(x) : η) + F ∗ (t(x)) + k(x),

with η = ∇F (θ) and θ = ∇F ∗ (η).

The Kullback-Leibler divergence (a f -divergence) is a particular di-
vergence belonging to the 1-parameter family of divergences, called α-
divergences (see [6], p. 57). The α-divergences are defined for α 6= ±1
as
4
Z
1−α 1+α
Dα (p : q) = 1 − p(x) 2 q(x) 2 dx .
1 − α2
It follows that Dα (q : p) = D−α (p : q), and in the limit case, we have:

p(x)
Z
D−1 (p : q) = KL(p : q) = p(x) log dx.
q(x)

Divergence D1 is also called the reverse Kullback-Leibler divergence, and

divergence D0 is four times the squared Hellinger distance mentioned earlier
in eq. 3

Z p
p
D0 (p : q) = D0 (q : p) = 4 1 − p(x) q(x)dx = 4H 2(p, q).

In the sequel, we denote by D the divergence D−1 corresponding to the

Kullback-Leibler divergence.

19
3.5 Exponential geodesics and mixture geodesics
Information geometry as further pioneered by Amari [6] considers dual affine
geometries introduced by a pair of connections: the α-connection and −α-
connection instead of taking the Levi-Civita connection induced by the Fisher
information Riemmanian metric of Rao. The ±1-connections give rise to
dually flat spaces [6] equipped with the Kullback-Leibler divergence [19].
The case of α = −1 denotes the mixture family, and the exponential family
is obtained for α = 1. We omit technical details in this expository paper,
but refer the reader to the monograph [6] for details.
For our purpose, let us say that the geodesics are defined not anymore as
shortest path lengths (like in the metric case of the Fisher-Rao geometry) but
rather as curves that ensures the parallel transport of vectors [6]. This defines
the notion of “straightness” of lines. Riemannian geodesics satisfy both the
straightness property and the minimum length requirements. Introducing
dual connections, we do not have anymore distances interpreted as curve
lengths, but the geodesics defined by the notion of straightness only.
In information geometry, we have dual geodesics that are expressed for
the exponential family (induced by a convex function F ) in the dual affine
coordinate systems θ/η for α = ±1 as:

γ12 : L(θ1 , θ2 ) = {θ = (1 − λ)θ1 + λθ2 | λ ∈ [0, 1]},

∗
γ12 : L∗ (η1 , η2 ) = {η = (1 − λ)η1 + λη2 | λ ∈ [0, 1]}.

Furthermore, there is a Pythagorean theorem that allows one to define

information-theoretic projections [6]. Consider three points p, q and r such
∗
that γpq is the θ-geodesic linking p to q, and γqr is the η-geodesic linking q
to r. The geodesics are orthogonal at the intersection point q if and only if
the Pythagorean relation is satisfied:

D(p : r) = D(p : q) + D(q : r).

In fact, a more general triangle relation (extending the law of cosines) exists:

D(p : q) + D(q : r) − D(p : r) = (θ(p) − θ(q))⊤ (η(r) − η(q)).

20
∗
Note that the θ-geodesic γpq and η-geodesic γqr are orthogonal with respect
to the inner product G(q) defined at q (with G(q) = I(q) being the Fisher
information matrix at q). Two vectors u and v in the tangent place Tq at q
are said to be orthogonal if and only if their inner product equals zero:

u ⊥q v ⇔ u⊤ I(q)v = 0.

Observe that in any tangent plane Tx of the manifold, the inner product
induces a squared Mahalanobis distance:

Dx (p, q) = (p − q)⊤ I(x)(p − q).

Since I(x) ≻ 0 is positive definite, we can apply Cholesky decomposition

on the Fisher information matrix I(x) = L(x)L⊤ (x), where L(x) is a lower
triangular matrix with strictly positive diagonal entries.
By mapping the points p to L(p)⊤ in the tangent space Tp , the
squared Mahalanobis amounts to computing the squared Euclidean distance
DE (p, q) = kp − qk2 in the tangent planes:

Dx (p, q) = (p − q)⊤ I(x)(p − q),

= (p − q)⊤ L(x)L⊤ (x)(p − q),
= DE (L⊤ (x)p, L⊤ (x)q).

It follows that after applying the “Cholesky transformation” of objects into

the tangent planes, we can solve geometric problems in tangent planes as one
usually does in the Euclidean geometry.
Information geometry of dually flat spaces thus extend the traditional
self-dual Euclidean geometry, obtained for the convex function F (x) = 12 x⊤ x
(and corresponding to the statistical manifold of isotropic Gaussians).

4 Conclusion and perspectives

Rao’ s paper [43] has been instrumental for the development of modern statis-
tics. In this masterpiece, Rao introduced what is now commonly known as

21
the Cramér-Rao lower bound (CRLB) and the Fisher-Rao geometry. Both
the contributions are related to the Fisher information, a concept due to
Sir R. A. Fisher, the father of mathematical statistics [26] that introduced
the concepts of consistency, efficiency and sufficiency of estimators. This
paper is undoubtably recognized as the cornerstone for introducing differen-
tial geometric methods in Statistics. This seminal work has inspired many
researchers and has evolved into the field of information geometry [6]. Ge-
ometry is originally the science of Earth measurements. But geometry is
also the science of invariance as advocated by Felix Klein Erlang’s program,
the science of intrinsic measurement analysis. This expository paper has
presented the two key contributions of C. R. Rao in his 1945 foundational
paper, and briefly presented information geometry without the burden of
differential geometry (e.g., vector fields, tensors, and connections). Informa-
tion geometry has now ramified far beyond its initial statistical scope, and is
further expanding prolifically in many different new horizons. To illustrate
the versatility of information geometry, let us mention a few research areas:

• Fisher-Rao Riemannian geometry [37],

• Amari’s dual connection information geometry [6],

• Infinite-dimensional exponential families and Orlicz spaces [16],

• Finsler information geometry [45],

• Optimal transport geometry [28],

• Symplectic geometry, Kähler manifolds and Siegel domains [10],

• Geometry of proper scoring rules [23],

• Quantum information geometry [29].

Geometry with its own specialized language, where words like distances,
balls, geodesics, angles, orthogonal projections, etc., provides “thinking

22
tools” (affordances) to manipulate non-trivial mathematical objects and no-
tions. The richness of geometric concepts in information geometry helps one
to reinterpret, extend or design novel algorithms and data-structures by en-
hancing creativity. For example, the traditional expectation-maximization
(EM) algorithm [25] often used in Statistics has been reinterpreted and fur-
ther extended using the framework of information-theoretic alternative pro-
jections [3]. In machine learning, the famous boosting technique that learns
a strong classifier by combining linearly weak weighted classifiers has been
revisited [39] under the framework of information geometry. Another strik-
ing example, is the study of the geometry of dependence and Gaussianity for
Independent Component Analysis [15].

References
[1] Ali, S.M. and Silvey, S. D. (1966). A general class of coefficients of
divergence of one distribution from another. J. Roy. Statist. Soc. Series
B 28, 131–142.

[2] Altun, Y., Smola, A. J. and Hofmann, T. (2004). Exponential families

for conditional random fields. In Uncertainty in Artificial Intelligence
(UAI), pp. 2–9.

[3] Amari, S. (1995). Information geometry of the EM and em algorithms

for neural networks. Neural Networks 8, 1379 – 1408.

[4] Amari, S. (2009). Alpha-divergence is unique, belonging to both f -

divergence and Bregman divergence classes. IEEE Trans. Inf. Theor.
55, 4925–4931.

[5] Amari, S., Barndorff-Nielsen, O. E., Kass, R. E., Lauritzen, S.. L. and
Rao, C. R. (1987). Differential Geometry in Statistical Inference. Lecture
Notes-Monograph Series. Institute of Mathematical Statistics.

23
[6] Amari, S. and Nagaoka, H. (2000). Methods of Information Geometry.
Oxford University Press.

[7] Arwini, K. and Dodson, C. T. J. (2008). Information Geometry: Near

Randomness and Near Independence. Lecture Notes in Mathematics #
1953, Berlin: Springer.

[8] Atkinson, C. and Mitchell, A. F. S. (1981). Rao’s distance measure.

Sankhyā Series A 43, 345–365.

[9] Banerjee, A., Merugu, S., Dhillon, I. S. and Ghosh, J. (2005). Clustering
with Bregman divergences. J. Machine Learning Res. 6, 1705–1749.

[10] Barbaresco, F. (2009). Interactions between symmetric cone and infor-

mation geometries: Bruhat-Tits and Siegel spaces models for high res-
olution autoregressive Doppler imagery. In Emerging Trends in Visual
Computing (F. Nielsen, Ed.) Lecture Notes in Computer Science # 5416,
pp. 124–163. Berlin / Heidelberg: Springer.

[11] Bhatia, R. and Holbrook, J. (2006). Riemannian geometry and matrix

geometric means. Linear Algebra Appl. 413, 594 – 618.

[12] Bhattacharyya, A. (1943). On a measure of divergence between two

statistical populations defined by their probability distributions. Bull.
Calcutta Math. Soc. 35, 99–110.

[13] Bregman, L. M. (1967). The relaxation method of finding the common

point of convex sets and its application to the solution of problems in
convex programming. USSR Computational Mathematics and Mathe-
matical Physics 7, 200–217.

[14] Brown, L. D. (1986). Fundamentals of Statistical Exponential Families:

with Applications in Statistical Decision Theory. Institute of Mathemat-
ical Statistics, Hayworth, CA, USA.

24
[15] Cardoso, J. F. (2003). Dependence, correlation and Gaussianity in inde-
pendent component analysis. J. Machine Learning Res. 4, 1177–1203.

[16] Cena, A. and Pistone, G. (2007). Exponential statistical manifold. Ann.

Instt. Statist. Math. 59, 27–56.

[17] Champkin, J. (2011). C. R. Rao. Significance 8, 175–178.

[18] Chentsov, N. N. (1982). Statistical Decision Rules and Optimal Infer-

ences. Transactions of Mathematics Monograph, # 53 (Published in
Russian in 1972).

[19] Cover, T. M. and Thomas, J. A. (1991). Elements of Information The-

ory. New York: Wiley.

[20] Cramér, H. (1946). Mathematical Methods of Statistics. NJ, USA:

Princeton University Press.

[21] Csiszár, I. (1967). Information-type measures of difference of probability

distributions and indirect observation. Studia Scientia. Mathematica.
Hungarica 2, 229–318.

[22] Darmois, G. (1945). Sur les limites de la dispersion de certaines estima-

tions. Rev. Internat. Stat. Instt. 13.

[23] Dawid, A. P. (2007). The geometry of proper scoring rules. Ann. Instt.
Statist. Math. 59, 77–93.

[24] del Carmen Pardo, M. C. and Vajda, I. (1997). About distances of dis-
crete distributions satisfying the data processing theorem of information
theory. IEEE Trans. Inf. Theory 43, 1288–1293.

[25] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum

likelihood from incomplete data via the EM algorithm. J. Roy. Statist.
Soc. Series B 39, 1–38.

25
[26] Fisher, R. A. (1922). On the mathematical foundations of theoretical
statistics. Phil. Trans. Roy. Soc. London, A 222, 309–368.

[27] Fréchet, M. (1943). Sur l’extension de certaines évaluations statistiques

au cas de petits échantillons. Internat. Statist. Rev. 11, 182–205.

[28] Gangbo, W. and McCann, R. J. (1996). The geometry of optimal trans-

portation. Acta Math. 177, 113–161.

[29] Grasselli, M. R. and Streater, R. F. (2001). On the uniqueness of the

Chentsov metric in quantum information geometry. Infinite Dimens.
Anal., Quantum Probab. and Related Topics 4, 173–181.

[30] Kass, R. E. and Vos, P. W. (1997). Geometrical Foundations of Asymp-

totic Inference. New York: Wiley.

[31] Koopman, B. O. (1936). On distributions admitting a sufficient statistic.

Trans. Amer. Math. Soc. 39, 399–409.

[32] Kotz, S. and Johnson, N. L. (Eds.) (1993). Breakthroughs in Statistics:

Foundations and Basic Theory, Volume I. New York: Springer.

[33] Lehmann, E. L. and Casella, G. (1998). Theory of Point Estimation 2nd

ed. New York: Springer.

[34] Lovric, M., Min-Oo, M. and Ruh, E. A. (2000). Multivariate normal

distributions parametrized as a Riemannian symmetric space. J. Multi-
variate Anal. 74, 36–48.

[35] Mahalanobis, P. C. (1936). On the generalized distance in statistics.

Proc. National Instt. Sci., India 2, 49–55.

[36] Mahalanobis, P. C. (1948). Historical note on the D 2 -statistic. Sankhyā

9, 237–240.

[37] Maybank, S., Ieng, S. and Benosman, R. (2011). A Fisher-Rao metric

for paracatadioptric images of lines. Internat. J. Computer Vision, 1–19.

26
[38] Morozova, E. A. and Chentsov, N. N. (1991). Markov invariant geometry
on manifolds of states. J. Math. Sci. 56, 2648–2669.

[39] Murata, N., Takenouchi, T., Kanamori, T. and Eguchi, S. (2004). Infor-
mation geometry of U-boost and Bregman divergence. Neural Comput.
16, 1437–1481.

[40] Murray, M. K. and Rice, J. W. (1993). Differential Geometry and Statis-

tics. Chapman and Hall/CRC.

[41] Peter, A. and Rangarajan, A. (2006). A new closed-form information

metric for shape analysis. In Medical Image Computing and Computer
Assisted Intervention (MICCAI) Volume 1, pp. 249–256.

[42] Qiao, Y. and Minematsu, N. (2010). A study on invariance of f -

divergence and its application to speech recognition. Trans. Signal Pro-
cess. 58, 3884–3890.

[43] Rao, C. R. (1945). Information and the accuracy attainable in the esti-
mation of statistical parameters. Bull. Calcutta Math. Soc. 37, 81–89.

[44] Rao, C. R. (2010). Quadratic entropy and analysis of diversity. Sankhyā,

Series A, 72, 70–80.

[45] Shen, Z. (2006). Riemann-Finsler geometry with applications to infor-

mation geometry. Chinese Annals of Mathematics 27B, 73–94.

[46] Shima, H. (2007). The Geometry of Hessian Structures. Singapore:

World Scientific.

[47] Watanabe, S. (2009). Algebraic Geometry and Statistical Learning The-

ory. Cambridge: Cambridge University Press.

Theory of Estimation by P.G.dixit, Nirali Publication
No ratings yet
Theory of Estimation by P.G.dixit, Nirali Publication
186 pages
Chapter 7. Statistical Estimation 7.7: Properties of Estimators II
No ratings yet
Chapter 7. Statistical Estimation 7.7: Properties of Estimators II
6 pages
STA 303 Theory of Estimation 9th Lecture-1
No ratings yet
STA 303 Theory of Estimation 9th Lecture-1
7 pages
Chapter10 Solutions
No ratings yet
Chapter10 Solutions
62 pages
Module02B Slides Print 1
No ratings yet
Module02B Slides Print 1
59 pages
Main Parameterestimation PDF
No ratings yet
Main Parameterestimation PDF
73 pages
Differentiable Manifolds
No ratings yet
Differentiable Manifolds
94 pages
Fisher Information and Cram Er-Rao Bound: Lecturer: Songfeng Zheng
No ratings yet
Fisher Information and Cram Er-Rao Bound: Lecturer: Songfeng Zheng
12 pages
Maximum Likelihood An Introduction: L. Le Cam
No ratings yet
Maximum Likelihood An Introduction: L. Le Cam
31 pages
Differential Geometry and Its Application 2nd Edition-1
No ratings yet
Differential Geometry and Its Application 2nd Edition-1
266 pages
7 Mle
No ratings yet
7 Mle
31 pages
2009 Paninsky Nonparametric Estimation of Entropy and Distributions
No ratings yet
2009 Paninsky Nonparametric Estimation of Entropy and Distributions
34 pages
Asymptotic Theory and Parametric Inference
No ratings yet
Asymptotic Theory and Parametric Inference
32 pages
Lectura 1 Point Estimation
No ratings yet
Lectura 1 Point Estimation
47 pages
Estimator Properties
No ratings yet
Estimator Properties
17 pages
Lecture 1.4
No ratings yet
Lecture 1.4
13 pages
Chapter 8 Estimation of Parameters and Fitting of Probability Distributions
No ratings yet
Chapter 8 Estimation of Parameters and Fitting of Probability Distributions
20 pages
Lecture Notes - 1
No ratings yet
Lecture Notes - 1
56 pages
Likelihood, Bayesian, and Decision Theory
No ratings yet
Likelihood, Bayesian, and Decision Theory
50 pages
Classics: 76 Resonance
No ratings yet
Classics: 76 Resonance
15 pages
Psp-Unit-6 Estimation Theory PDF
No ratings yet
Psp-Unit-6 Estimation Theory PDF
38 pages
Chapter4 Estimation
No ratings yet
Chapter4 Estimation
28 pages
Estimation Theory
100% (1)
Estimation Theory
8 pages
Stat 463 Estimation 1: Ch. 6.1 - 6.2 Estimation:: N Iid
No ratings yet
Stat 463 Estimation 1: Ch. 6.1 - 6.2 Estimation:: N Iid
12 pages
Stat-Review Xid-8243919 1
No ratings yet
Stat-Review Xid-8243919 1
24 pages
Cramer Raoh and Out 08
No ratings yet
Cramer Raoh and Out 08
13 pages
Crrao
No ratings yet
Crrao
7 pages
(Mathematical Surveys and Monographs 154) Andreas Cap and Jan Slovak - Parabolic Geometries. I-American Mathematical Society (2009)
100% (1)
(Mathematical Surveys and Monographs 154) Andreas Cap and Jan Slovak - Parabolic Geometries. I-American Mathematical Society (2009)
643 pages
Classical Estimation
No ratings yet
Classical Estimation
11 pages
Lecture 22
No ratings yet
Lecture 22
7 pages
Week 1 1720465962 Estimation Hour 2
No ratings yet
Week 1 1720465962 Estimation Hour 2
14 pages
Slides Estimation PDF
No ratings yet
Slides Estimation PDF
17 pages
Last Week: 4.2 Cramer-Rao Lower Bound: 2 2 Fisher Bilgisi CRB
No ratings yet
Last Week: 4.2 Cramer-Rao Lower Bound: 2 2 Fisher Bilgisi CRB
9 pages
Chapter 7. Statistical Estimation: 7.7: Properties of Estimators II
No ratings yet
Chapter 7. Statistical Estimation: 7.7: Properties of Estimators II
6 pages
Lecture15 Fisherinfo
No ratings yet
Lecture15 Fisherinfo
4 pages
Lecture Note 16
No ratings yet
Lecture Note 16
4 pages
Fisher Information and Cramer-Rao Bound
No ratings yet
Fisher Information and Cramer-Rao Bound
13 pages
Point Estimation: Institute of Technology of Cambodia
No ratings yet
Point Estimation: Institute of Technology of Cambodia
22 pages
牛颖Introduction to M-estimator
No ratings yet
牛颖Introduction to M-estimator
4 pages
Entropy Formula For Ricci Flow Perelman
100% (1)
Entropy Formula For Ricci Flow Perelman
39 pages
Introecon Estimators Properties
No ratings yet
Introecon Estimators Properties
8 pages
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
No ratings yet
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
16 pages
Basic Stats Estimation
No ratings yet
Basic Stats Estimation
8 pages
Module 4
No ratings yet
Module 4
3 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
No ratings yet
Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
18 pages
Differential Geometry of Manifolds 3r27n5kdr9
No ratings yet
Differential Geometry of Manifolds 3r27n5kdr9
4 pages
Chap - 2point - Estimation
No ratings yet
Chap - 2point - Estimation
11 pages
Week 7 In-Class Problems
No ratings yet
Week 7 In-Class Problems
2 pages
Statistical Methods
No ratings yet
Statistical Methods
25 pages
Philosophy of Geometry From Riemann To Poincaré
No ratings yet
Philosophy of Geometry From Riemann To Poincaré
473 pages
Lectura 2 Point Estimator Basics
No ratings yet
Lectura 2 Point Estimator Basics
11 pages
Debre Berhan University: College of Natural and Computational Science Department of Statistics
No ratings yet
Debre Berhan University: College of Natural and Computational Science Department of Statistics
9 pages
LN Estimation Theory
No ratings yet
LN Estimation Theory
11 pages
Estimation EMV
No ratings yet
Estimation EMV
37 pages
Mvue Notes
No ratings yet
Mvue Notes
5 pages
The Geometry of Physics An Introduction 2nd Edition Frankel Instant Download
100% (1)
The Geometry of Physics An Introduction 2nd Edition Frankel Instant Download
61 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
11 pages
Notes On The Cram Er-Rao Inequality: Kimball Martin February 8, 2012
No ratings yet
Notes On The Cram Er-Rao Inequality: Kimball Martin February 8, 2012
6 pages
Advanced Statistical Inference
No ratings yet
Advanced Statistical Inference
7 pages
STA 303 Lec 1
No ratings yet
STA 303 Lec 1
5 pages
Weyl and The Problem of Space From Science To Philosophy by Bernard, Julien, Lobo, Carlos
No ratings yet
Weyl and The Problem of Space From Science To Philosophy by Bernard, Julien, Lobo, Carlos
433 pages
Visser - Notes On Differential Geometry, Victoria University of Wellington, New Zealand - 2004
No ratings yet
Visser - Notes On Differential Geometry, Victoria University of Wellington, New Zealand - 2004
210 pages
3 The Rao-Blackwell Theorem: 3.1 Mean Squared Error
No ratings yet
3 The Rao-Blackwell Theorem: 3.1 Mean Squared Error
2 pages
(Probability and Its Applications) Mu-Fa Chen - Eigenvalues, Inequalities, and Ergodic Theory (Probability and Its Applications) (2004, Springer) - Libgen - Li PDF
100% (1)
(Probability and Its Applications) Mu-Fa Chen - Eigenvalues, Inequalities, and Ergodic Theory (Probability and Its Applications) (2004, Springer) - Libgen - Li PDF
239 pages
The Many Faces of Gauss-Bonnet Theorem
No ratings yet
The Many Faces of Gauss-Bonnet Theorem
12 pages
Introduction To Modern Finsler Geometry by Yi-Bing Shen
No ratings yet
Introduction To Modern Finsler Geometry by Yi-Bing Shen
406 pages
CBCS M.SC Math 2019
No ratings yet
CBCS M.SC Math 2019
39 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
bAppM 2021 deRidderL
No ratings yet
bAppM 2021 deRidderL
62 pages
Sy 36
No ratings yet
Sy 36
34 pages
The Ricci Flow Techniques and Applications Bennett Chow Sunchin Chu David Glickenstein Christine Guenther James Isenberg Tom Ivey Dan Knopf Peng Lu Feng Luo Lei Ni PDF Download
No ratings yet
The Ricci Flow Techniques and Applications Bennett Chow Sunchin Chu David Glickenstein Christine Guenther James Isenberg Tom Ivey Dan Knopf Peng Lu Feng Luo Lei Ni PDF Download
89 pages
Comparison Finsler Geometry Shinichi Ohta Download
No ratings yet
Comparison Finsler Geometry Shinichi Ohta Download
87 pages
Geometry of Differential Forms Shigeyuki Morita Instant Download
No ratings yet
Geometry of Differential Forms Shigeyuki Morita Instant Download
85 pages
Friedman, Michael - Geometry As A Branch of Physics. Background and Context For Eisnstein's 'Geometry and Experience'
No ratings yet
Friedman, Michael - Geometry As A Branch of Physics. Background and Context For Eisnstein's 'Geometry and Experience'
19 pages
Instant Download Tensors For Data Processing. Theory, Methods, and Applications Yipeng Liu PDF All Chapter
100% (2)
Instant Download Tensors For Data Processing. Theory, Methods, and Applications Yipeng Liu PDF All Chapter
57 pages
EJP v8-139
No ratings yet
EJP v8-139
14 pages
Heinonen J. - Geometric Embeddings of Metric Spaces-Draft (2003)
No ratings yet
Heinonen J. - Geometric Embeddings of Metric Spaces-Draft (2003)
44 pages
V.OPROIU, N.PAPAPGHIUC - On The Geometry of Tangent Bundle of A (Pseudo-) Riemannian Manifold
No ratings yet
V.OPROIU, N.PAPAPGHIUC - On The Geometry of Tangent Bundle of A (Pseudo-) Riemannian Manifold
17 pages
Riemannian Diffusion Models
No ratings yet
Riemannian Diffusion Models
34 pages
DG 2009
No ratings yet
DG 2009
2 pages
1607.06072-Kruglikov-On The Symmetry Algebras of 5-Dimensional CR-manifolds
No ratings yet
1607.06072-Kruglikov-On The Symmetry Algebras of 5-Dimensional CR-manifolds
26 pages
Optimization Techniques On Riemannian Manifolds
No ratings yet
Optimization Techniques On Riemannian Manifolds
24 pages
Pages From Notes09
No ratings yet
Pages From Notes09
10 pages
Well-Posedness of The Hodge Wave Equation On A Compact Manifold
No ratings yet
Well-Posedness of The Hodge Wave Equation On A Compact Manifold
6 pages
Study On Linear Connections of Manifolds
No ratings yet
Study On Linear Connections of Manifolds
6 pages
An Introduction to Linear Algebra and Tensors
From Everand
An Introduction to Linear Algebra and Tensors
M. A. Akivis
1/5 (1)
Basic Methods of Linear Functional Analysis
From Everand
Basic Methods of Linear Functional Analysis
John D. Pryce
No ratings yet
Recursive Analysis
From Everand
Recursive Analysis
R. L. Goodstein
No ratings yet