Introduction To State Estimation: What This Course Is About
Introduction To State Estimation: What This Course Is About
Introduction To State Estimation: What This Course Is About
v w
? ? • u: known input
u SYSTEM z
- -
• z: measured output
State x
• v: process noise
• w: sensor noise
- ESTIMATOR
• x: internal state
x̂, estimate of state x
-
Applications
• Generally, estimation is the dual of control. State feedback control : given state x(k)
determine an input u(k). So one class of estimation problems is any state feedback control
problem where the state x(k) is not available. This is a very large class of problems.
• Estimation without closing the loop: state estimates of interest in their own right (for
example, system health monitoring, fault diagnosis, aircraft localization based on radar
measurements, economic development, medical health monitoring).
1
Resulting algorithms
Will adopt a probabilistic approach.
• Underlying technique: Bayesian Filtering
2. Probability Review
Engineering approach to probability: rigorous, but not the most general. We will not go into
measure theory, for example.
• A man has M pairs of trousers and L shirts in his wardrobe. Over a long period of time, we
observe the trousers/shirt combination he chooses. In particular, out of N observations:
• Define
pts (i, j) := nts (i, j)/N , the likelihood of wearing trousers i with shirt j,
pt (i) := nt (i)/N , the likelihood of wearing trousers i,
ps (j) := ns (j)/N , the likelihood of wearing shirt j.
∑
M ∑
M
nt (i) N
Note that pt (i) ≥ 0, pt (i) = = = 1. The same holds for ps (j).
N N
i=1 i=1
∑
M
ns (j) = nts (i, j), all the ways in which he chose shirt j.
i=1
2
∑
L ∑
M
Therefore pt (i) = pts (i, j) and ps (j) = pts (i, j).
j=1 i=1
This is called the marginalization, or sum rule.
• Define
pt|s (i|j) := nts (i, j)/ns (j), the likelihood of wearing trousers i given that he is wearing
shirt j
ps|t (j|i) := nts (i, j)/nt (i), the likelihood of wearing shirt j given that he is wearing
trousers i
Then it holds that
nts (i, j) nts (i, j) nt (i)
pts (i, j) = = = ps|t (j|i) pt (i)
N nt (i) N
nts (i, j) ns (j)
= = pt|s (i|j) ps (j) .
ns (j) N
This is called the conditioning, or product rule.
• Everything we do in this class stems from these two simple rules. Understand them well.
• “Frequentist” approach to probability: captured by this example. Intuitive. Relative
frequency in a large number of trials. Great way to think about probability for physical
processes such as tossing coins, rolling dice, and other phenomena where the physical
process is essentially random.
• “Bayesian” approach. Probability is about beliefs and uncertainty. Measure of the state
of knowledge.
• The function px and the set X define the discrete random variable (DRV) x.
• The PDF can be used to define the notion of probability: the probability that a random
variable x is equal to some value x̄ ∈ X is px (x̄). This is written as Pr(x = x̄) = px (x̄).
• In order to simplify notation, we often use x to denote a DRV and a specific value the
DRV can take. For example we write p(x) to describe the probability that the discrete
random variable x takes the value x. The fact that this is a statement about the random
variable x is inferred from context.
While this is convenient, it may confuse you at first. If so, we encourage you to use the
more cumbersome notation until you are comfortable with the shorthand notation.
3
Examples
• X = {1, 2, 3, 4, 5, 6}, p(x) = 16 , ∀x ∈ X , captures a fair die. The formal, longhand notation
corresponding to this is px (x̄) = 16 , ∀x̄ ∈ X .
{
1 − h for x = 0 (”tails”),
• X = {0, 1}, p(x) =
h for x = 1 (”heads”),
where 0 ≤ h ≤ 1, captures the flipping of a biased coin. The coin bias is parametrized by
h.
∑
Given pxy (x̄, ȳ), define px (x̄) := pxy (x̄, ȳ).
ȳ∈Y
– This is a definition: px is fully defined by pxy . (Recall trousers & shirts example.)
– This is a definition. px|y (x̄|ȳ) can be thought of as a function of x̄, with ȳ fixed. It is
easy to verify that it is a valid PDF in x. “Given that the DRV y takes the value ȳ,
what is the probability that the DRV x takes the value x̄?” (Recall trousers & shirts
example.)
– Short form: p(x|y)
– Usually written as p(x, y) = p(x|y) p(y) = p(y|x) p(x).
• We can combine these to give us our first theorem, the Total Probability Theorem:
∑
px (x̄) = px|y (x̄|ȳ) py (ȳ). A weighted sum of probabilities.
ȳ∈Y
4
• Multi-variable generalizations:
Sometimes x is used to denote a collection (or vector) of random variables x = (x1 , x2 , ..., xN ).
So when we write p(x) we implicitly mean p(x1 , x2 , ..., xN ). Note that p(x1 , x2 , . . . , xN ) is
always a scalar.
∑
Marginalization: p(x) = p(x, y) short form for
y∈Y
∑
p(x1 , x2 , ..., xN ) = p(x1 , ..., xN , y1 , ..., yL ).
(y1 ,...,yL )∈Y
1. px (x̄) ≥ 0, ∀x̄ ∈ X ,
∫
2. px (x̄) dx̄ = 1.
X
• Relation to probability: it does not make sense to say that the probability of the CRV
x taking the value x̄ is px (x̄). As an illustration, we can think of the following sequence
of DRVs that approach a uniformly distributed CRV on [0, 1]. Consider the integers
{1, 2, . . . , N } divided by N ; that is, the numbers {1/N, 2/N, . . . , N/N }, which are in the
interval [0, 1]. Assume that all are of equal probability 1/N . As N goes to infinity, the
probability of any specific value i/N goes to 0.
So instead we talk about probability of being in an interval:
∫b
Pr(x ∈ [a, b]) := px (x̄) dx̄.
a
∑
• All other
∫ definitions, properties, etc. stated for DRVs apply to CRVs, just replace “ ”
by “ ”. This holds in particular for the marginalization and conditioning.
5
∫1 {
1
2 for x = 0,
– p(x) = p(x, y) dy = 1
2 for x = 1.
0
p(x, y)
– p(x|y) = = p(x, y).
p(y)
p(x, y)
– p(y|x) = = 2p(x, y).
p(x)