A Modern Gauss-Markov Theorem
A Modern Gauss-Markov Theorem
Bruce E. Hansen*
University of Wisconsin†
December, 2020
Revised: December 2021
Abstract
This paper presents finite sample efficiency bounds for the core econometric prob-
lem of estimation of linear regression coefficients. We show that the classical Gauss-
Markov Theorem can be restated omitting the unnatural restriction to linear estima-
tors, without adding any extra conditions. Our results are lower bounds on the vari-
ances of unbiased estimators. These lower bounds correspond to the variances of the
the least squares estimator and the generalized least squares estimator, depending on
the assumption on the error covariances. These results show that we can drop the label
“linear estimator” from the pedagogy of the Gauss-Markov Theorem. Instead of refer-
ring to these estimators as BLUE, they can legitimately be called BUE (best unbiased
estimators).
* Research support from the NSF and the Phipps Chair are gratefully acknowledged. I posthumously
thank Gary Chamberlain for encouraging me to study finite sample semi-parametric efficiency, Jack
Porter for persuading me to write these results into a paper, and Yuzo Maruyama for catching an error
in the proof. I also thank Roger Koenker, Stephen Portnoy, and three referees for thoughtful comments
and suggestions.
†
Department of Economics, 1180 Observatory Drive, University of Wisconsin, Madison WI 53706.
1
1 Introduction
Three central results in core econometric theory are BLUE, Gauss-Markov, and Aitken’s.
The BLUE theorem states that the best (minimum variance) linear unbiased estimator
of a population expectation is the sample mean. The Gauss-Markov theorem states that
in a linear homoskedastic regression model the minimum variance linear unbiased esti-
mator of the regression coefficient is the least squares estimator. Aitken’s generalization
states that in a linear regression model with a general covariance matrix structure the
minimum variance linear unbiased estimator is the generalized least squares estimator.
These results are straightforward to prove and interpret, and thus are taught in intro-
ductory through advanced courses. The theory, however, has a gaping weakness. The
restriction to linear estimators is unnatural. There is no justifiable reason for modern
econometrics to restrict estimation to linear methods. This leaves open the question if
nonlinear estimators could possibly do better than least squares.
One possible answer lies in the theory of uniform minimum variance unbiased (UMVU)
estimation (see, e.g., Chapter 2 of Lehmann and Casella (1998)). Lehmann and Casella
(1998, Example 4.2) demonstrate that the sample mean is UMVU for the class of distri-
butions having a density. The latter restriction is critical for their demonstration, does
not generalize to distributions without densities, and it is unclear if the approach applies
to regression models.
A second possible answer is provided by the Cramér-Rao theorem. In the normal re-
gression model the minimum variance unbiased estimator of the regression coefficient
is least squares. This result removes the restriction to linearity. But the result is limited
to normal regression and so does not provide a complete answer.
A third possible answer is provided by the local asymptotic minimax theorem (see
Hajek (1972) and van der Vaart (1998, Chapter 8)) which states that in parametric mod-
els, estimation mean squared error cannot be asymptotically smaller than the Cramér-
Rao lower bound. This removes the restriction to linear and unbiased estimators, but is
focused on a parametric asymptotic framework.
A fourth approach to the problem is semi-parametric asymptotic efficiency, which
includes Stein (1956), Levit (1975), Begun, Hall, Huang, and Wellner (1983), Chamber-
lain (1987), Ritov and Bickel (1990), Newey (1990), Bickel, Klaassen, Ritov, and Wellner
(1993), and van der Vaart (1998, Chapter 25). This literature develops asymptotic ef-
ficiency bounds for estimation in semi-parametric models including linear regression.
2
This theory removes the restriction to linear unbiased estimators and parametric mod-
els, but only provides asymptotic efficiency bounds, not finite sample bounds. This lit-
erature leaves open the possibility that reduced estimation variance might be achieved
in finite samples by alternative estimators.
A fifth approach is adaptive efficiency under an independence or symmetry assump-
tion. If the regression error is independent of the regressors and/or symmetrically dis-
tributed about zero, efficiency improvements may be possible. If the regression er-
ror is fat-tailed, these improvements can be substantial. This literature includes the
quantile regression estimator of Koenker and Bassett (1978), the adaptive regression
estimator of Bickel (1982), and the generalized t estimator of McDonald and Newey
(1988). These improvements are only obtained under the validity of the imposed in-
dependence/symmetry assumptions; otherwise the estimators are inconsistent.
Our paper extends the above literatures by providing finite sample variance lower
bounds for unbiased estimation of linear regression coefficients without the restriction
to linear estimators and without the restriction to parametric models. Our results are
semi-parametric, imposing no restrictions on distributions beyond the existence of the
first two moments and no restriction on estimators beyond unbiasedness. Our lower
bounds generalize the classical BLUE and Gauss-Markov lower bounds, as we show that
the same bounds hold in finite samples without the restriction to linear estimators. Our
lower bounds also update the asymptotic semi-parametric lower bounds of Chamber-
lain (1987), as we show that the same bounds hold in finite samples for unbiased esti-
mators.
The results in this paper are a finite-sample version of the insight by Stein (1956)
that the supremum of Cramér-Rao bounds over all regular parametric submodels is a
lower bound on the asymptotic estimation variance. Our twist turns Stein’s insight into
a finite-sample argument, thereby constructing a lower bound on the finite-sample vari-
ance. Stein’s insight lies at the core of semi-parametric efficiency theory. Thus, our result
provides a bridge between finite-sample and semi-parametric efficiency theory.
Our primary purpose is to generalize the Gauss-Markov Theorem, providing a finite-
sample yet semi-parametric efficiency justification for least squares estimation. A by-
product of our result is the observation that it is impossible to achieve lower variance
than least squares without incurring estimation bias. Consequently, the simultaneous
goals of unbiasedness and low variance are incompatible. If estimators are low variance
(relative to least squares) they must be biased. This is not an argument against non-
3
parametric, shrinkage, or machine learning estimation, but rather is a statement that
these estimation methods should be acknowledged as biased and the latter is necessary
to achieve variance reductions.
Our results (similarly to BLUE, Gauss-Markov, Aitken, and Cramér-Rao) focus on un-
biased estimators, and thereby are restricted to the special context where unbiased esti-
mators exist. Indeed, the existence of an unbiased estimator is a necessary condition for
a finite variance bound. Doss and Sethuraman (1989) showed that when no unbiased
estimator exists, then any sequence of estimators with bias tending to zero will have
variance tending to infinity. A related literature (Zyskind and Martin (1969), Harville
(1981)) concerns conditions for linear estimators to be unbiased when allowing for gen-
eral covariance matrices.
A caveat is that the class of nonlinear unbiased estimators is small. As shown by
Koopmann (1982) and discussed in Gnot, Knautz, Trenkler, and Zmyslony (1992), any
unbiased estimator of the regression coefficient can be written as a linear-quadratic
function of the dependent variable Y . Koopmann’s result shows that while nonlinear
unbiased estimators exist, they constitute a narrow class.
The literature contains papers which generalize the Gauss-Markov theorem to allow
nonlinear estimators, but all are restrictive on the class of allowed nonlinearity, and all
are restrictive on the class of allowed error distributions. For example, Kariya (1985) al-
lows for estimators where the nonlinearity can be written in terms of the least squares
residuals. Berk and Hwang (1989) and Kariya and Kurata (2002) allow for nonlinear es-
timators which fall within certain equivariant classes. Each of these papers restricts the
error distributions to satisfy a form of spherical symmetry. In contrast, the results pre-
sented in this paper do not impose any restrictions on the estimators other than unbi-
asedness, and do not impose any restrictions on the error distributions.
The proof of our main result (presented in Section 6) is not inherently difficult, but
is not elementary either. It might be described as nuanced. It is based on a trick used
by Newey (1990, Appendix B) in his development of an asymptotic semi-parametric ef-
ficiency bound for estimation of a population expectation.
2 Gauss-Markov Theorem
Let Y be an n × 1 random vector and X an n × m full-rank regressor matrix with
m < n. We will treat X as fixed, though all the results apply to random regressors by
4
conditioning on X .
The linear regression model is
Y = Xβ+e (1)
E [e] = 0 (2)
var [e] = E ee 0 = σ2 Σ < ∞
£ ¤
(3)
where e is the n × 1 vector of regression errors. It is assumed that the n × n matrix Σ > 0
is known while the scalar σ2 > 0 is unknown.
Let F2 be the set of joint distributions F of random vectors Y satisfying (1)-(3). This
is the set of random vectors whose expectation is a linear function of X and has a finite
covariance matrix. Equivalently, F2 consists of all distributions which satisfy a linear
regression.
The homoskedastic and serially uncorrelated linear regression model adds the as-
sumption
Σ = I n. (4)
Let F02 ⊂ F2 be the set of joint distributions satisfying (1)-(4). The standard estimator of
β in model F02 is least squares
¢−1 ¡ 0 ¢
βbols = X 0 X
¡
X Y .
For all F ∈ F2 , βbols is unbiased for β, and for all F ∈ F02 , βbols has variance var βbols =
£ ¤
¢−1
σ2 X 0 X
¡
. The question of efficiency is whether there is an alternative unbiased esti-
mator with reduced variance.
The classical Gauss-Markov Theorem applies to linear estimators of β, which are
estimators that can be written as βb = A(X )Y , where A(X ) is an m × n function of X .
Linearity in this context means “linear in Y ”.
¢−1
var βb ≥ σ2 X 0 X
£ ¤ ¡
In words, no unbiased linear estimator has a finite sample covariance matrix smaller
5
than the least squares estimator. As this is the exact variance of the least squares esti-
mator, it follows that in the homoskedastic linear regression model, least squares is the
minimum variance linear unbiased estimator.
Part of the beauty of the Gauss-Markov Theorem is its simplicity. The only assump-
tions on the distribution concern the first and second moments of Y . The only assump-
tions on the estimator are linearity and unbiasedness. The statement in the theorem
that βb “is unbiased for all F ∈ F2 ” clarifies the context under which the estimator is re-
quired to be unbiased. The requirement that βb must be unbiased for any distribution
means that we are excluding estimators such as βb = 0, which is “unbiased” when the
true value satisfies β = 0. The estimator βb = 0 is not unbiased in the general set of linear
regression models F2 so is not unbiased in the sense of the theorem.
An unsatisfying feature of the Gauss-Markov Theorem is that it restricts attention to
linear estimators. This is unnatural as there is no reason to exclude nonlinear estimators.
Consequently, when the Gauss-Markov Theorem is taught it is typically followed by the
Cramér-Rao Theorem.
φ
Let F2 ⊂ F02 be the set of joint distributions satisfying (1)-(4) plus e ∼ N(0, I n σ2 ).
φ
Theorem 2 (Cramér-Rao). If βb is unbiased for all F ∈ F2 , then
¢−1
var βb ≥ σ2 X 0 X
£ ¤ ¡
φ
for all F ∈ F2 .
The Cramér-Rao Theorem shows that the restriction to linear estimators is unnec-
essary in the class of normal regression models. To obtain this result, in addition to
the Gauss-Markov assumptions, the Cramér-Rao Theorem adds the assumption that
the observations are independent and normally distributed. The normality assumption
is restrictive, however, so neither the Gauss-Markov nor Cramér-Rao Theorem is fully
satisfactory. Consequently, the two are typically taught as a pair with the joint goal of
¢−1
justifying the variance lower bound σ2 X 0 X
¡
and hence least squares estimation.
Closely related to the Gauss-Markov Theorem is the generalization by Aitken (1935)
to the context of general covariance matrices. In the linear regression model with non-
scalar covariance matrix Σ, Aitken’s generalized least squares (GLS) estimator is
¢−1 ¡ 0 −1 ¢
βbgls = X 0 Σ−1 X X Σ Y .
¡
6
¢−1
For all F ∈ F2 , βbgls is unbiased for β and has variance var βbgls = σ2 X 0 Σ−1 X
£ ¤ ¡
. The
question of efficiency is whether there is an alternative unbiased estimator with smaller
variance. Aitken’s Theorem follows Gauss-Markov in restricting attention to linear esti-
mators.
¢−1
var βb ≥ σ2 X 0 Σ−1 X
£ ¤ ¡
for all F ∈ F2 .
Aitken’s Theorem is less celebrated than the traditional Gauss-Markov Theorem, but
perhaps is more illuminating. It shows that, in general, the variance lower bound equals
the covariance matrix of the GLS estimator. Thus, in the general linear regression model,
generalized least squares is the minimum variance linear unbiased estimator. Aitken’s
theorem, however, rests on the restriction to linear estimators just as the Gauss-Markov
Theorem. In the context of independent observations, Aitken’s bound corresponds to
the asymptotic semi-parametric efficiency bound established by Chamberlain (1987).
The development of least squares and the Gauss-Markov Theorem involved a series
of contributions from some of the most influential probabilists of the nineteenth thru
early twentieth centuries. The method of least squares was introduced by Adrien Marie
Legendre (1805) as essentially an algorithmic solution to the problem of fitting coeffi-
cients when there are more equations than unknowns. This was quickly followed by
Carl Friedrich Gauss (1809), who provided a probabilistic foundation. Gauss proposed
that the equation errors be treated as random variables, and showed that if their den-
sity takes the form we now call “normal” or “Gaussian” then the maximum likelihood
estimator of the coefficient equals the least squares estimator. Shortly afterward, Pierre
Simon Laplace (1811) justified this choice of density function by showing that his central
limit theorem implied that linear estimators are approximately normally distributed in
large samples, and that in this context the lowest variance estimator is the least squares
estimator. Gauss (1823) synthesized these results and showed that the core result only
relies on the first and second moments of the observations and holds in finite samples.
Andreı̆ Andreevich Markov (1912) provided a textbook treatment of the theorem, and
clarified the central role of unbiasedness, which Gauss had only assumed implicitly. Fi-
nally, Alexander Aitken (1935) generalized the theorem to cover the case of arbitrary but
7
known covariance matrices. This history, and other details, are documented in Plackett
(1949) and Stigler (1986).
3 Modern Gauss-Markov
We now present our main result. We are interested if Aitken’s version of the Gauss-
Markov Theorem holds without the restriction to linear estimators.
¢−1
var βb ≥ σ2 X 0 Σ−1 X
£ ¤ ¡
for all F ∈ F2 .
¢−1
var βb ≥ σ2 X 0 X
£ ¤ ¡
8
Assume that the elements of Y have a common expectation µ with covariance matrix
Σσ . Equivalently, assume E [Y ] = 1n µ and var [Y ] = Σσ2 , where 1n is a vector of ones.
2
Let G2 be the set of joint distributions F of random vectors Y satisfying these conditions,
and let G02 be the subset with Σ = I n . G02 is the set of uncorrelated random variables
with a common variance. The standard estimator of µ is the sample mean Y , which is
unbiased and has variance var[Y ] = σ2 /n for F ∈ G02 .
Theorem 6 If µ
b is unbiased for all F ∈ G2 , then var µ
b ≥ σ2 /n for all F ∈ G02 .
£ ¤
As the lower bound σ2 /n equals var[Y ], we deduce that the sample mean is the
MVUE of µ. Equivalently, the sample mean is the best unbiased estimator (BUE) – there
is no need for the classical “linear” modifier.
Essentially, Theorems 4, 5, and 6 show that we can drop the label “linear estima-
tor” from the pedagogy of the Gauss-Markov Theorem. Instead, GLS, OLS, and sample
means are the best unbiased estimators of their population counterparts.
all β ∈ B and y ∈ Y . For such values of β, define the auxiliary density function
Under the assumptions, 0 ≤ f β (y) ≤ 2 f (y), f β (y) has support Y , and Y f β (y)d y = 1. To
R
see the later, observe that Y y f (y)d y = X β0 = 0 under the normalization β0 = 0, and
R
thus Z Z Z
f β (y)d y = f (y)d y + f (y)y 0 d yΣ−1 X β = 1
Y Y Y
R
because Y f (y)d y = 1. Thus f β is a parametric family of density functions with an as-
sociated distribution function F β . Evaluated at β0 we see that f 0 = f , which means that
9
F β is a correctly-specified parametric family with true parameter value β0 = 0.
To illustrate, take the case of a single observation with X = 1. Figure 1(a) displays an
example density f (y) = (3/4)(1 − y 2 ) on [−1, 1] with auxiliary density f β (y) = f (y) 1 + y .
¡ ¢
We can see how the auxiliary density is a tilted version of the original density f (y).
F2
fβ(y)
f(y) Fβ
F0
β0 β
Figure 1: Illustrations
Z Z Z
Eβ [Y ] = y f β (y)d y = y f (y)d y + y y 0 f (y)d yΣ−1 X β = X β.
Y Y Y
10
The likelihood score of the auxiliary density function is
∂ ∂ ¡
¯ ¯
0 −1
log f (Y ) + log 1 + Y Σ X β ¯¯ = X 0 Σ−1 Y .
¯ ¡ ¢¢¯
S= log f β (Y )¯
¯ = (6)
∂β β=0 ∂β β=0
¢−1
var βb ≥ I −1 = X 0 Σ−1 X
£ ¤ ¡
.
11
normal regression model with covariance matrix Σ. This was achieved by constructing
(5) to be proportionate to the normal regression score.
5 Conclusion
A core question in econometric methodology is: Why do we use specific estimators?
Why not others? A standard answer is efficiency: the estimators are best (in some sense)
among all estimators (in a class) for all data distributions (in some set). The Gauss-
Markov Theorem is a core efficiency result but restricts attention to linear estimators –
and this is an inherently uninteresting restriction. The present paper lifts this restric-
tion without imposing additional cost. Henceforth, least squares should be described
as the “best unbiased estimator” of the regression coefficient; the “linear” modifier is
unnecessary.
6 Proof of Theorem 4
We provide a proof of Theorem 4. Theorems 5 and 6 are special cases, so follow as
corollaries.
Proof of Theorem 4: Our approach is to calculate the Cramér-Rao bound for a carefully
crafted parametric model. This is based on an insight of Newey (1990, Appendix B) for
the simpler context of a population expectation.
Without loss of generality, assume that the true coefficient equals β0 = 0 and that
σ = 1. These are merely normalizations which simplify the notation.
2
and
¤ def
E Y ψc (Y )0 = E Y Y 0 1 {kY k ≤ c} = Σc .
£ ¤ £
12
As c → ∞, Σc → E Y Y 0 = Σ. Pick c sufficiently large so that Σc > 0, which is feasible
£ ¤
because Σ > 0.
Define the auxiliary joint distribution function F β (y) by the Radon-Nikodym deriva-
tive ¡ ¢
d Fβ y
¡ ¢ = 1 + ψc (y)0 Σ−1
c Xβ
dF y
for parameters β in the set
½ ¾
m
° −1 ° 1
B c = β ∈ R : Σc X β ≤
° ° . (9)
4c
The Schwarz inequality and the bounds (8) and (9) imply that for β ∈ B c and all y
This implies that F β has the same support as F and satisfies the bounds
¡ ¢
1 d Fβ y 3
≤ ¡ ¢ ≤ . (10)
2 dF y 2
We calculate that
Z Z Z
d F y + ψc (y)0 Σ−1 βd
¡ ¢ ¡ ¢ ¡ ¢
d Fβ y = c X F y
¤0
= 1 + E ψc (Y ) Σ−1
c Xβ
£
=1 (11)
the last equality because E ψc (Y ) = 0. Together, these facts imply that F β is a valid
£ ¤
13
model is
Z
Eβ [Y ] =
¡ ¢
yd F β y
Z Z
= yd F y + yψc (y)0 Σ−1 c X βd F y
¡ ¢ ¡ ¢
= E [Y ] + E Y ψc (Y )0 Σ−1
c Xβ
£ ¤
= Xβ (12)
regression coefficient β.
The bound (10) implies
° y ° d Fβ y ≤ 3 ° y ° d F y = 3 E kY k2 = 3 tr (Σ) < ∞.
Z Z
° °2 ° °2
Eβ kY k2 =
£ ¤ ¡ ¢ ¡ ¢ £ ¤
2 2 2
∂ d F β (Y ) ¯
¯
S= log ¯
∂β d F (Y ) ¯β=0
∂
¯
0 −1
log 1 + ψc (Y ) Σc X β ¯¯
¡ ¢¯
=
∂β β=0
= X 0 Σ−1
c ψc (Y ).
Ic = E SS 0
£ ¤
= X 0 Σ−1 0
¤ −1
E ψ Σc X
£
c c (Y )ψ c (Y )
≤ X 0 Σ−1
c X, (13)
By assumption, the estimator βb is unbiased for β for all F ∈ F2 , which implies that it
is unbiased for all F ∈ F β . The model F β is regular (it is correctly specified as it contains
14
the true distribution F , the support of Y does not depend on β, and the true value β0 = 0
lies in the interior of B c ). Thus by the Cramér-Rao Theorem (see, for example, Theorem
10.6 of Hansen (2022))
¢−1
var βb ≥ Ic−1 ≥ X 0 Σ−1
£ ¤ ¡
c X
where the second inequality is (13). Because this holds for all c, and Σc → Σ as c → ∞,
¢−1 ¡ 0 −1 ¢−1
var βb ≥ lim sup X 0 Σ−1 = X Σ X
£ ¤ ¡
c X .
c→∞
15
References
[1] Aitken, Alexander C. (1935): “On least squares and linear combinations of observa-
tions,” Proceedings of the Royal Statistical Society, 55, 42-48.
[2] Begun, Janet M., W. J. Hall, Wei-Min Huang, and Jon A. Wellner (1983): “Informa-
tion and asymptotic efficiency in parametric-nonparametric models,” The Annals
of Statistics, 11, 432-452.
[3] Berk, Robert and Jiunn T. Hwang (1989): “Optimality of the least squares estimator,”
Journal of Multivariate Analysis, 3, 245-254.
[4] Bickel, Peter J. (1982): “On adaptive estimation,” Annals of Statistics, 647-671.
[5] Bickel, Peter J., Chris A. J. Klaassen, Ya’acov Ritov, and Jon A. Wellner (1993): Effi-
cient and Adaptive Estimation for Semiparametric Models, Johns Hopkins Univer-
sity Press.
[7] Doss, Hani and Jayaram Sethuraman (1989): “The price of bias reduction when
there is no unbiased estimate,” The Annals of Statistics, 17, 440-442.
[8] Gauss, Carl Friedrich (1809): Theoria motus corporum celestium. Hamburg:
Perthese et Besser.
[9] Gauss, Carl Friedrich (1823): Theoria Comationis Observationum Erroribus Min-
imis Obnoxiae. Göttingen: Dieterich.
[10] Gnot, S., H. Knautz, G. Trenkler, and R. Zmyslony (1992): “Nonlinear unbiased es-
timation in linear models,” Statistics, 23, 5-16.
[11] Hajek, Jaroslav (1972): “Local asymptotic minimax and admissibility in estima-
tion,” Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and
Probability, 1, 175-194.
[12] Hansen, Bruce E. (2022): Probability and Statistics for Economists, Princeton Uni-
versity Press, forthcoming.
16
[13] Harville, David A. (1981): “Unbiased and minimum-variance unbiased estimation
of estimable functions for fixed linear models with arbitrary covariance structure,”
The Annals of Statistics, 9, 633-637.
[14] Kariya, Takeaki (1985): “A nonlinear version of the Gauss-Markov theorem,” Journal
of the American Statistical Association, 80, 476-477.
[15] Kariya, Takeaki and Hiroshi Kurata (2002): “A maximal extension of the Gauss-
Markov theorem and its nonlinear version,” Journal of Multivariate Analysis, 83,
37-55.
[16] Koenker, Roger, and Gilbert Bassett (1978): “Regression quantiles,” Econometrica,
46, 33-50.
[18] Laplace, Pierre Simon (1811): “Mémoire sur les integrales définies et leur applica-
tion aux probabilités, et specialement à la recherche du milieu qu’il faut choisir
entre les resultats des observations,” Mémoires de l’Académie des sciences de Paris,
279-347.
[19] Legendre, Adrien Marie (1805): Nouvelles méthodes pour la détermination des or-
bites des cométes. Paris: Courcier.
[20] Lehmann, Erich L. and George Casella (1998): Theory of Point Estimation, Second
Edition, Springer.
[21] Levit, B. Y. (1975): “On the efficiency of a class of nonparametric estimates,” Theory
of Probability and its Applications, 20, 723-740.
[23] McDonald, James B. and Whitney K. Newey (1988): “Partially adaptive estimation
of regression models via the generalized t distribution,” Econometric Theory, 4, 428-
457.
17
[25] Ritov, Ya’acov and Peter J. Bickel (1990): “Achieving information bounds in non and
semiparametric models,” The Annals of Statistics, 18, 925-938.
[26] Stein, Charles (1956): “Efficient nonparametric testing and estimation,” Berkeley
Symposium on Mathematical Statistics and Probability, 187-195.
[27] Stigler, Stephen M. (1986): The History of Statistics: The Measurement of Uncertainty
before 1900. Harvard University Press.
[28] van der Vaart, A.W. (1998): Asymptotic Statistics, Cambridge University Press.
[29] Zyskind, George, and Frank B. Martin (1969): “On best linear estimation and a gen-
eral Gauss-Markov theorem in linear models with arbitrary nonnegative covariance
structure,” SIAM Journal on Applied Mathematics, 17, 1190-1202.
18