Stat Mining 53
Stat Mining 53
Stat Mining 53
Notice, that when the sample size taken is large, its bias is:
⎛ 1 ⎞
lim ⎜ − σ 2 ⎟ = 0
n→∞ ⎝ n ⎠
Therefore, this estimator is asymptotically unbiased for this reason. It can be applied when
the sample size is large; practically n > 30.
By analysing the bias of the investigated estimator, a conclusion can be formulated that it
gives estimates of the variance that are too low.
It is easy to prove that an estimator:
1 n
sˆ2 ( ) ∑(
n − 1 i =1
i )2 (1.95)
of the unknown variance of the general population is an unbiased one. It can be applied in
any sample size.
If a sample of N elements is given and it consists of k groups of size ni; i = 1, 2, …, k, the
following equation holds:
∑ ∑
k 2 k
sn
i =1 i i
( xi x )2 ni
s2
x si 2
+ s (x ) =
2
i + i =1
(1.96)
N N
2
where: si —the estimate of variance within the i-th group
x—the arithmetic mean of the whole sample
xi —the arithmetic mean of the i-th group
s 2 ( xi ) —the variance between groups.
By looking more carefully at pattern (1.96), a simple conclusion can be drawn—the more
differentiated the population, the greater the value of its variance. The relationship (1.96) is
called the ‘variation identity’ (Sobczyk 1996 p. 46). This pattern is useful in the calculation of
some combined machinery systems (Czaplicki 2010a p. 230) as well as in calculations con-
nected with the homogeneity of shovel-truck systems (ibidem p. 266).
The second important property that must be investigated before the application of a given
estimator is its consistency.
It is said that estimator Tn of the parameter θ is consistent if it converges in probability
(stochastic convergence) to the true value of the parameter, i.e. the following equation holds:
Looking at this relationship, one may easily come to the conclusion that enlarging the sam-
ple size leads to a situation in which the estimates that are obtained will be closer and closer
to the real value of the unknown parameter θ.
Suppose one has a sequence of observations {x1, x2, …} from a normal N(μ, σ) distribu-
tion. In order to estimate the unknown expected value μ, one uses the sample mean deter-
mined by formula (1.93). Now assume that every element of the sample is a random variable.
If so, the estimator (1.93) becomes a random variable. Denote it by Tn. From the properties
of the normal distribution, we know that Tn is itself normally distributed with the mean μ
and the variance σ2/n. Equivalently, the random variable (Tn − μ)/(σ/ n ) has a standard nor-
mal distribution. Therefore, the following relationship holds: