0% found this document useful (0 votes)
125 views

An Introduction To Nonlinear Filtering

An Introduction to Nonlinear Filtering

Uploaded by

Shafayat Abrar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views

An Introduction To Nonlinear Filtering

An Introduction to Nonlinear Filtering

Uploaded by

Shafayat Abrar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

AN INTRODUCTION TO NONLINEAR FILTERING

M.H.A. Davis
Department of Electrical Engineering
Imperial College, London SW7 2BT, England.

Steven I. Marcus
Department of Electrical Engineering
The University of Texas at Austin
Austin, Texas 78712, U.S.A.

ABSTRACT
In this paper we provide an introduction to nonlinear
filtering from two points of view: the innovations approach
and the approach based upon an unnormalized conditional density.
The filtering problem concerns the estimation of an unobserved
stochastic process {x t } given observations of a related process
{Yt}; the classic problem is to calculate, for each t, the
conditional distribution of xt given {ys ,O.s.s.s. t}. First, a
brief review of key results on martingales and Markov and
diffusion processes is presented. Using the inn6vations approach,
stochastic differential equations for the evolution of conditional
statistics and of the conditional measure of xt given {ys,O.s.s.s.t}
are given; these equations are the analogs for the filtering
problem of the Kolmogorov forward equations. Several examples
are discussed. Finally, a .less complicated evolution equation is
derived by considering an "unnormalized" conditional measure.

53
M. Hazewinkel and J. C. Willems (eds.), Stochastic Systems: The Mathematics of Filtering and Identification
and Applications, 53-75.
Copyright Iii> 1981 by D. Reidel Publishing Company.
S4 M. H. A. DAVIS AND S. I. MARCUS

I. INTRODUCTION
Filtering problems concern "estimating" something about an
unobserved stochastic process {x t } given observations of a related
process {Yt}; the classic problem is to calculate, for each t,
the conditional distribution of xt given {Ys'O~s~ tL This was
solved in the context of linear system theory by Kalman and Bucy
[1],[2] in 1960,1961, and the resulting "Kalman filter" has of
course enjoyed immense success in a wide variety of applications.
Attempts were soon made to generalize the results to systems with
nonlinear dynamics. This is an essentially more difficult problem,
being in general infinite-dimensional, but nevertheless equations
describing the evolution of conditional distributions were
obtained by several authors in the mid-sixties; for example, Bucy
[3], Kushner [4], Shiryaev [5], Stratonovich [6], Wonham [7]. In
1969 Zakai [81 obtained these equations in substantially simpler
form us i ng the so-called "reference probabil i ty" method (see Wong
[9]) .

In 1968 Kailath [10] introduced the "innovations approach" to


linear filtering, and the significance for nonlinear filtering was
immediately appreciated [11], namely that the filtering problem
ought to be formulated in the context of martingale theory. The
definitive treatment from t.his point of view was given in 1972 by
Fujisaki, Kallianpur and Kunita [12]. Textbook accounts including
all the mathematical background can be found in Liptser and
Shiryaev [13] and Kallianpur [14].
More recent work on nonlinear filtering has concentrated on
the following areas (this list and the references are not intended
to be exhaustive):
(i) Rigorous formulation of the theory of stochastic partial
differential equations (Pardoux [15], Krylov and Rozovskii [16]);
(ii) Introduction of Lie algebraic and differential
geometric methods (Brockett [17]);
(iii) Discovery of finite dimensional nonlinear filters
(Benes [18]);
(iv) Development of "robust" or "pathwise" solutions of the
filtering equations (Davis [19]);
(v) Functional integration and group representation methods
(Mitter [30)).
All of these topics are dealt with in this volume and all of
them use the basic equations of nonlinear filtering theory: the
Fujisaki, et.al., equation [12] and/or the Zakai equation [8].
These equations can be derived in a quick and self-contained way,
modulo some technical results, the statements of which are
readily appreciated and the details of which can be found in the
AN INTRODUCTION TO NONLINEAR FILTERING 55

references [13],[14]. This is the purpose of the present article.


The general problem can be described as follows. The signal
or state process {x t } is a stochastic process which cannot be
observed directly. Information concerning {x t } is obtained from
the observation process {Yt}, which we will assume is given by

fo zsds +wt
t
Yt = ( 1)

where {Zt} is a process "related" to {x t } (e.g., Zt = h(x t )) and


{w t } is a Brownian motion process. The process {Yt} is to be
thought of as noisy nonlinear observations of the signal {x t }.
The objective is to compute least squares estimates of functions
of the signal xt given the "past" observations {ys ,0.:::. s.:::. t} --
i.e. to compute quantities of the form E[CP(xt)IYs,O.:::.s.:::.t]. In
addition, it is desired that this computation be done recursively
in terms of a statistic {TIt} which can be updated using only new
observations:
(2)

and from which estimates can be calculated in a "pointwise" or


"memoryless" fashion:
E[CP(Xt)IYS,O':::'S.:::.t] = 6(t,Yt,TI t )· ( 3)

In general, TIt will be closely related to the conditional


distribution of xt given {ys,O.:::.s.:::.t}, but in certain special
cases TIt will be computable with a finite set of stochastic
differential equations driven by {Yt} (see [20] for some examples).
In order to obtain specific results, additional structure
wi 11 be assumed for the process {x t }; we will assume throughout
that {x t } is a semimartingale (see Section II), but more detailed
results will be derived under the assumption that {x t } is a Markov
process or in particular a vector diffusion process of the form

xt = xO+
t
fo
f(xs)ds+
t
f0
G(x s )df3 s ' (4)

where Xt€lR n and f3 t €lR m is a vector of independent Brownian


56 M. H. A. DAVIS AND S. I. MARCUS

motion processes. General terminology and precise assumptions


will be presented in Section II. In Section III, Markov processes
of the form (4) will be studied, and Kolmogorov's equations for
the evolution of the unconditional distribution (i.e. without
observations) of the process {x t } will be presented. The
corresponding equations for the conditional distribution of xt
given {ys,O,::,s,::,t} will be derived in Section IV using the
"innovations approach". Finally, in Section V we derive a less
complex set of equations for an unnormalized conditional
distribution of xt ' in the form given by Zakai [8].

II. TERMINOLOGY AND ASSUMPTIONS


In this section we review certain notions concerning
stochastic processes and martingales; for further tutorial
material on martingale integrals and stochastic calculus, the
reader is referred to the tutorial of R. Curtain in this volume
and the paper of Davis [21] (see also [9],[13],[22]-[24]). All
stochastic processes will be defined on a fixed probability space
(D,F,P) and a finite time interval [O,T], on which there is
defined an increasing family of a-fields {Ft,O,::,t,::,T}. It is
assumed that each process {x t } is adapted to Ft--i.e. xt is Ft -
measurable for all t. The a-field generated by {x s ,0 -< s -< t} is
denoted by Xt = a{xs,O,::,s,::,t}. (xt,F t ) is a martingale if xt is
adapted to Ft , Elxtl <00, and E[xtIFs] = Xs for t~s. (xt,F t ) is
a supermartingale if E[xtIFs] ,::, Xs and a submartingale if
E[xtIFs] ~ xs' The process (xt,F t ) is a semimartingale if it has
a decomposition xt=xO+at+m t , where (mt,Ft ) is a martingale and
{at} is a process of bounded variation. Given two square
integrable martingales (mt,F t ) and (nt,F t ), one can define the
predictable quadratic covariation «m,n>t,Ft ) to be the unique
"predictable process of integrable variation" such that
(mtnt-<m,n>t,F t ) is a martingale [29, p.34]. For the purposes of
this paper, however, the only necessary facts concerning <m,n>
are that (a) <m,n>t = 0 if mtn t is a martingale; and (b) if S is a
standard Brownian motion process, then

<S,(3)t = t and <


tllsdS s ' It llsdS s > t = It llsllsds.
I0 1 2 1 2
00
AN INTRODUCTION TO NONLINEAR FILTERING 57

In this tutorial exposition, the following hypotheses will


be assumed for all nonlinear estimation problems:
HI. {Yt} is a real-valued process;
H2. {w t } is a standard Brownian motion process;

H3. E[ I
T 2
zsds] < 00

o
H4. {Zt} is independent of {w t }.
Hypotheses (HI) and (H4) can be weakened, but the calculations
become more involved [8],[12],[13, Chapter 8]. Similar results
to those derived here can also be derived in the case that {w t }
is replaced by the sum of a Brownian motion and a counting process
[25]. Hypotheses on the process {x t } and the relationship between
xt and wt will be imposed as they are needed in the sequel.
Finally, we will need two special cases of Ito's differential
rule. Suppose that (~~,Ft)' i=I,2, are semimartingales of the
form
iii i ( 5)
~t = ~O + at + mt ,

where {m~}, i=I,2, are square integrable martingales


and {a~} sample continuous. Then

tt 0 0 0 s So s s
f t
~1 ~2 = ~1 ~2 + ~1 d~2 + ~2 d~1 + <ml m2> •
' t
Jt (6a)

Also, if ¢ is a twice continuously differentiable funct i on of a


process x of the form (4), then

1jJ(x t ) = 1jJ(x o) +
n
L ~
.
(x )dx 1 + 2
1 n
L
Jt
i = 1 ax 1 s s i , j =1 0

where A(x) [aij(x)]:=G(x)G'(x) and xi denotes the ith component


of x.

III. MARKOV AND DIFFUSION PROCESSES


A very clear account of the material in this section can be
found in Wong's book [9]. A stochastic process {xt,t€ [O,T]} is
a Markov process if for any 02. s 2. t 2. T and any Borel set B of the
state space S,
P(xteB/\) = P(xt€B/x s )·
58 M. H. A. DAVIS AND S. I. MARCUS

For any Markov process {x t }, we can define the transition


probability function
P(s,x,t,B):= P(xteBlxs=x),
which can easily be shown to satisfy the Chapman-Kolmogorov
equation: for any 0 ~ s~ u ~ t~ T,

P( s ,x , t ,B) = LS P( u,y , t ,B) P( s ,x ,u ,dy) . (7)

In addition, all finite dimensional distributions of a Markov


process are determined by its initial distribution and transition
probabili~y function. A Markov process {x t } is homogeneous if
P(s+u,x,t+u,B) = P(s,x,t,B) for all O~s~t~T and O~s+u~t+u~T.
For a homogeneous Markov process {x t } and feB(S) (i.e. f is
a bounded measurable real-valued function on S), define

Ttf(x) = Ex[f(x t )]:= ~ f(y)P(O,x,t,dy).

The Chapman-Kolmogorov equation then implies that Tt is a


semigroup of operators acting on B(s); i.e. Tt+sf(x) = Tt(Tsf)(x)
for t,s.:.. o. The generator L of Tt (or, of {x t }) is the operator
acting on a domain D(L) C B(S) given by
L¢ = 1i m 1. (T ¢ - ¢) ,
t +0 t t
the limit being uniform in xeS and D(L) consisting of all
functions such that this limit exists. It is immediate from this
and the semi group property that

and (8) is, in abstract form, the backward equation for the
process. Writing it out in integral form and recalling the
definition of Tt gives the Dynkin formula:

Ex [¢(x t )] - ¢(x) = Ex 1ot L¢(xs)ds. (9)

This implies, using the Markov property again, that the process
Mt defined for ¢eD(L) by
AN INTRODUCTION TO NONLINEAR FILTERING S9
t
Mt = <p(X t ) - <p(X) - Lo L<p(xS)ds ( 10)

is a martingale [26, p.4]. This property can be used as a


definition of L; this is the approach pioneered by Stroock and
Varadhan [26]. Then L is known as the extended genepatop of {x t },
since there may be functions <p for which Mt is a martingale but
which are not in D(L) as previously defined.
There is another semi group of operators associated with {x t },
namely the operators which transfer the initial distribution of
the process into the distributions at later times t. More
precisely, let M(S) be the set of probability measures on Sand
denote
<<P,~> = 1S <p(x)~( dx)
for <peS(S), ~eM(S). Suppose Xo has distribution 7feM(S); then
the distribution of xt is given by
Ut 7f(A) = P[xteA] = E(IA(x t )) = <T t I A,7f>·
This shows that Ut is adjoint to Tt in that
<<p, Ut 7f> = <T t <p, 7f> (=E<p(x t ))
for <peS(S), 7feM(S). Thus the generator of Ut is L*, the adjoint
of L, and 7f t :=U t 7f satisfies

( 11)

This is the fOPWaPd equation of xt in that it describes the


evolution of the distribution 7f t of xt . The objective of
filtering theory is to obtain a similar description of the
conditionaZ distribution of xt given {ys,s.s.tL
In order to get these results in more explicit form we
consider in the remainder of this section a process {x t }
satisfying a stochastic differential equation of the form (4),
where {St} is an mm-va1ued standard Brownian ,motion process
independent of xo. For simplicity we assume that f'and G do not
depend explicitly on t (this is no loss of generality, since the
60 M. H. A. DAVIS AND S. I. MARCUS

"process" T(t) = t can be accommodated by augmenting (4) with the


equation dT/dt = 1, T(O) = 0). Under the usual Lipschitz and
growth assumptions which guarantee existence and uniqueness of
(strong) solutions of (4), the following results can be proved
[9], [22]- [24].
Theorem 1: The solution of (4) is a homogeneous Markov
process with infinitesimal generator
n i ail n a2
L = L f (x) - . + 2" L a lJ(x) (12)
i=1 axl i,j=1 axiaxj
where A(x) = [aij(x)]:=G(x)G'(x), and fi and xi denote the ith
components of f and x, respectively.
Hence Ito's rule (6b) in this case can be written as

ljJ(x t ) = ljJ(x o) + 1ot LIjJ(xs)ds + J0t VIjJ' (xs)G(xs)dS s '


emphasizing again that Mt (see (10)) is a martingale (here VIjJ is
the gradient of IjJ with respect to x, expressed as a column vector).
It can also be shown [24] that the solution of (4) satisfies the
Feller and strong Markov properties, and is a diffusion process
with drift vector f and diffusion matrix A. If this process has
a smooth density then the abstract equations (8) and (11)
translate into Kolmogorov's backward and forward equations for
the transition density.
Theorem 2 [24, p.l04]: Assume that the solution {x t } of (4)
has a transition density:

p(s,x,t,B) = 1B p(s,x,t,y)dy
satisfying
a) for t-s > 0 > 0, p(s,x,t,y) is continuous and bounded in s, t,
and x;
i 2
b) the partial derivatives lP., 4, a. p. exist.
as ax 1 ax 1 ax J
Then for 0 < s < t, p satisfies the Kolmogorov backward equation

asa p(s,x,t,y) + Lp(s,x,t,y) =0 ( 13)


AN INTRODUCTION TO NONLINEAR FILTERING 61

with lim p(s,x,t,y) = o(x-y) and L given by (12); i.e. p is the


st t
fundamental solution of (13).
Outline of Proof: From (7), we have
p(s+h,x,t,y) - p(s,x,t,y)
= fp(S,X,S+h,Z)[P(S+h,x,t,y) - p(s+h,z,t,y)]dz.

Dividing both sides by h and letting h-+O yields (13) by using


the definition of L.
More relevant to filtering problems is the Kolmogorov
forward equation.
Theorem 3 [24, p.102]: Assume that {x t } satisfying (4) has
2
a transition density p(s,x,t,y), and that af., aA., ~ A. '¥-t'
ax 1 ax 1 ax 1 ax J
~ a2 exist. Then for
ay , and ~ 0<s< t, p satisfies the Kolmogorov
ay
forward equation
a
1% (s,x,t,y) =- L -ai
n i
(f (y)p(s,x,t,y))
i=l ay

+ i i,j=l
I ayiayj
a2 (aij(y)p(s,x,t,y)) (14)

:= L* p(s,x,t,y)
where L* is the formal adjoint of L. Also, the initial condition
is lim p(s,x,t,y) = o(y-x).
t +s
Outline of Proof: Assume, for simplicity of notation, that
{x t } is a scalar diffusion (n=l). From (9), we have

;t fp(s,x,t,Z)¢(Z)dZ = fp(s,x,t,Z)L¢(z)dZ (15)

for some twice continuously differentiable function which vanishes


outside some finite interval. The derivative and integral on the
left-hand side of (15) can be interchanged, and an integration by
parts then yields
62 M. H. A. DAVIS AND S. l. MARCUS

Jp(S.x.t.z)f(z) ~: (z)dz = -J4>(Z) aaz (f(z)p(s.x.t.z))dz.

a2 (z)dz = J4>(z) -2
f. p(s.x.t.z)g 2(z) -2
az
cp a2 (g 2(z)p(s.x.t.z))dz.
az
hence

f{~ (s.x.t.z) + a; [f(z)p(s.x.t.z)]

1 a2 2
-2-2 [g (z)p(s.x.t.z)J) 4>(z)dz = o.
az
Since the expression in curly brackets is continuous and 4>(z) is
an arbitrary twice differentiable function vanishing outside a
finite interval. (14) follows.
We note that if Xo has distribution PO' then the density of
xt is p(t,y) = !p(O,x,t,y)PO(dx), and p(t,y) also satisfies (14).
Conditions for the existence of a density satisfying the
differentiability hypotheses of Theorems 2 and 3 are given in
[24, pp.96-99] (see also Pardoux [15]).

IV. THE INNOVATIONS APPROACH TO NONLINEAR FILTERING


In this section we derive stochastic differential equations
for the evolution of conditional statistics and of the conditional
density for nonlinear filtering problems of the types discussed
in Sections I and II; the equations will be the analogs of (9)
and the Kolmogorov forward equation for the filtering problem.
We will follow the innovations approach, as presented in [12] and
[13]; this approach was originally suggested by Kailath [10] (for
linear filtering) and Frost and Kailath [11].
Assume that the observations have the form (1) and that
(H1)-(H4) hold. Define Yt:=a{ys,O~s~t}; for any process nt
we use the notation nt:=E[ntIYt]. Now introduce the innovations
proaess:
t
\\:=Yt-l zsds. (16)
o
The incremental innovations vt+h-v t represent the "new
information" concerning the process {Zt} available from the
observations between t and t+h. in the sense that vt+h-v t is
AN INTRODUCTION TO NONLINEAR FILTERING 63

independent of Yt . The following properties of the innovations


are crucial.
Lemma 1: The process (vt,Y t ) is a standard Brownian motion
process. Furthermore, Ys and a{vu-vt,02.S2.t<U2.T} are
independent.
Proof: From (16) we have for s < t,
t
E[vtIY s ] = vs+E[f (zu-zu)du+wt-wsIYs]· ( 17)
s
The second term on the right-hand side of (17) is zero; here we
have used the fact that wt-w s is independent of Ys . Hence
(vt,Y t ) is a martingale. Consider now the quadratic variation
of {v t }: for t€ [O,T] fix an integer n and define

Q~= L n [v((k+l)/2 n)_v(k/2 n)]2.


O<k<2 t
The almost sure limit (as n-+oo) of Q~:=Qt is the quadratic
variation of v t . It is easy to see that the quadratic variation
of f° (z u-z u)du
t
is zero, so that the quadratic variation of v t
is the same as that of wt ' or Qt=t. But by a theorem of Doob
[12, Lemma 2.1], a square integrable martingale with continuous
sample paths and quadratic variation t is a standard Brownian
motion, and the lemma follows.
Notice that the very specific conclusions of Lemma 1
regarding the structure of the innovations process are valid
without any restrictions on the distributions of Zt. The next
lemma is related to Kailath's "innovations conjecture". By
definition v t is Yt-measurable and a{vs,02.S2.t}cYt. The
innovations conjecture is that YtCa{vs,02.S2.t}, and hence that
the two a-fields are equal; i.e. the observations and innovations
processes contain the same information. At the time that [12]
was written, the answer to this question was not known under very
general conditions on {Zt}; recently, it has been shown in [27]
that the conjecture is true under the conditions (Hl)-(H4). It
is a well-known fact [13, Theorem 5.6] that all martingales of
Brownian motion are stochastic integrals, and the point of a
64 M. H. A. DAVIS AND S.1. MARCUS

positive answer to the innovations conjecture is that it enables


any Yt-martingale to be written as a stochastic integral with
respect to the innovations process {v t }. The essential
contribution of Fujisaki, Kallianpur and Kunita [12] was to show
that this representation holds whether or not the innovations
conjecture is valid. Specifically, they showed:
Lemma 2: Every square integrable martingale (mt,Y t ) with
respect to the observation a-fields Yt is sample continuous and
has the representation

mt = E[m O] +
t
nsdvs Jo ( 18)

where fT E[n s2 ] ds <00 and {nt} is jOintly measurable and adapted


o
to Yt . In other words, mt can be written as a stochastic integral
with respect to the innovations process. (But note that {nt} is
adapted to Yt and not necessarily to F~.)
In order to obtain a general filtering equation, let us
consider a real-valued Ft-semimartingale St and derive an equation
satisfied by gt. We have in mind semimartingales ~(Xt) where ~
is some smooth real-valued function and {x t } is the signal process,
but it is just as easy to consider a general semimartingale of the
form
t
St = So + fo Cl.sds + nt (19)

where (nt,Ft ) is a martingale.


Theorem 4: Assume that {St} and {Yt} are given by (19) and
(1), respectively, and that <n,w>t = O. Then {~t} satisfies the
stochastic differential equation

Lo asds + J0 [Ys-€SZs]dvs'
t t
gt = €o + (20)

Proof: First we define

Lo &sds
t
~t: = €t - €O -
AN INTRODUCTION TO NONLINEAR FILTERING 65

and show that (]Jt'Y t ) is a martingale. Now, for s < t,

E[€t-€sIY s ] = E[~t-~sIYs]

= E[Ist audulYs] + E[nt-nsIYs ]


= E[It E[a IY ]duIYs]+E[E[nt-nsIFs]IYs]'
s u u
(21)

The last term in (21) is zero, since (nt,F t ) is a martingale;


thus (21) proves that (]Jt'Y t ) is a martingale. Hence,

( 22)

where the last term in (22) follows from Lemma 2.


It remains only to identify the precise form of nt, using
Ito's differential rule (6a) and an idea introduced by Wong [28].
From (1) and (19), and since <n,w>t = 0,

~tYt = ~oYO + f.ot ~s(zsds + dws ) + L0t ys(asds + dn s )' (23)

Also, from (16) and (22),


t t t
~tYt = ~oYO + L ~s(zsds + dv s ) + L ys(asds + ns dv s ) + L nsds •
o 0 0
(24)
Now it follows immediately from properties of conditional
expectations that for t~s,
E[~tYt-~tYtIYs] = O.

Calculating this from (23),(24) we see that


(25)
66 M. H. A. DAVIS AND S. l. MARCUS

Inserting (25) into (22) gives the desired result (20).


Formula (20) is not very useful as it stands (it is not a
recursive equation for ~t), but we can use it to obtain more
explicit results for filtering of Markov processes.
Theorem 5: Assume that {x t } is a homogeneous Markov process
with infinitesimal generator L, that {Yt} is given by (1) with
Zt = h{x t ), and that {x t } and {w t } are independent. Then for any
<P e D(L), 1T t (<p) :=E[<p(x t ) IY t ] satisfies

1T t (<p) = 1T 0 (<P) + fot 1Ts (L<P)ds + i0t [1Ts (hcp)-1T s (h)1Ts (<P) ]dvs · (26)

Proof: Notice that (Mt,F t ) (see (10)) is a martingale, so


that ~t:=<P(Xt) is of the form (19) with at:=L<P(x t ), nt:=M~.
Also, it is shown in [12, Lemma 4.2] that the independence of
{x t } and {w t } implies <M<P'W>t=O. The theorem then follows
immediately from Theorem 4.
Remarks: (i) Since {1T t (<P): <peD(L)} determines a measured
valued stochastic process 1T t , (26) can be regarded as a recursive
(infinite-dimensional) stochastic differential equation for the
conditional measure 1Tt of xt given Yt , and 1T t (<P) is a conditional
statistic computed from 1Tt in a memoryless fashion (see (2)-(3)).
In general, however, it is not possible to derive a finite
dimensionaZ recursive filter, even for the conditional mean xt ;
some special cases in which finite dimensional recursive filters
exist are given in Examples 1 and 3 below.
(ii) If wt in (1) were multiplied by r~ with r>O, one would
suspect that as r-+ oo the observations would become infinitely
noisy, thus giving no information about the state; i.e. 1T t (<P)
would reduce to the unconditional expectation E[<P(X t )]. In fact,
in this case the last term in (26) is multiplied by r- 1 , so (26)
reduces to (9) as r-+ oo •
Example 1 [7]: Let {x t } be a finite state Markov process
taking values in S = {sl' ... ,sNL Let Pti be the probability that
AN INTRODUCTION TO NONLINEAR FILTERING 67
1 N ,
xt=si' and assume that Pt:=[Pt""'Pt] satisfies

(This is the forward equation for {x t }; cf.(ll).) Given the


observations (1), the conditional distribution of xt given Yt can
be determined from (26) as follows. Let <p(x) = [<P1(x),,,,,<p N(x)]',
where
x = S.1
<p.(X) = ~ 1,
1
~ 0, X f si .

Then applying (26) to each <Pi yields the following: let


B = diag(h(sl), ... ,h(sN)) and let b = [h(sl),,,.,h(sN)]' . Then
if pi = P[Xt=si!Y t ] and Pt = [pi,···,p~]', we have
t t
Pt = 15 0 + fo APsds+ fo [B-(b'ps)I] ps(dYs-(b'ps)ds).

In this case, the conditional distribution is determined


recursively by N stochastic differential equations.
Example 2: Assume that {x t } is a diffusion process given by
(4) with infinitesimal generator (12) and that the conditional
distribution of xt given Yt has a density p(t,x). Then under
appropriate differentiability hypotheses [13, Theorem 8.6], one
can do an integration by parts in (26) (precisely as in Theorem 3
above) to obtain the stochastic partial differential equation
dp(t,x) = L*p(t,x)dt+p(t,x) [h(x)-'1T t (h)]dv t ( 27)
where
(28)

This is a recursive equation for the computation of p(t,x); it is


not only infinite dimensional but has a complicated structure due
to the presence of the integral in (28). Equation (27) is the
analog of the Kolmogorov forward equation; in fact, (27) reduces
to (13) as the observation noise approaches (see Remark (ii)).
00
68 M. H. A. DAVIS AND S. I. MARCUS

The conditional mean cannot in general be computed with a


finite dimensional recursive filter, as is seen by letting
~(x) = x in (26):

xt = Xo+ Iot 7fs (f}ds+ I0t [7fs (hx)-7fs (h)X s ]dvs · (29)

Hence, 7ft(f}, 7f t (hx), and 7f t (h) are all necessary for the
computation of xt ' etc. One case in which this calculation ~
possible is given in the next example.

where Xo is Gaussian and independent of {w t } and {v t }. Then


(29) yields

o
t
xt = Xo + J. axsds + c It [7fs (x 2)-xs2] [dYs-cXsdS]
0 . .
(t
= Xo + J. axs + c
o
10t Pt(dys-cxsds) ( 30)

where Pt:=E[(Xt-Xt)2jYt] is the conditional error covariance.


However, since {x t } and {Yt} are jointly Gaussian, Pt is
nonrandom and constitutes a "gain" process which can be
precomputed and stored. Pt satisfies the differential equation
(derived from (26) by noticing that the third central moment of
a Gaussian distribution is zero):
d 2 22
dt Pt = 2aP t + b - CPt'

Since Pt is nonrandom and the differential equation for xt


AN INTRODUCTION TO NONLINEAR FILTERING 69

involves no other conditional statistics, it constitutes a


recursive one-dimensional filter (the Ka1man-Bucy filter) for
the computation of the conditional mean.

v. THE UNNORMALIZED EQUATIONS


Throughout this section it will be assumed that {x t } is a
homogeneous Markov process with infinitesimal generator L, {Yt}
is given by (1) with Zt = h(x t ), and {x t } and {w t } are
independent. In this case, the conditional measure TIt satisfies
the equation (26), but it is often more convenient to work with
a less complicated equation which is obtained by considering an
lunnorma1ized" version of TIt. The unnorma1ized equations are
derived in [9, Chapter 6] and [8]; the use of measure
transformations will follow these references, but we will use a
shorter derivation of the unnorma1ized equations, via (26) and
Ito's rule.
The first step is to define a new measure Po on the
measurable space (~,F) by

for all A€ F, where

is the Radon-Nikodym derivative of Po with respect to P.


Lemma 3 [9, p.232]: Po has the following properties:
(a) Po is a probability measure -- i.e. Po(~) = 1;
(b) Under PO' {Yt} is a standard Brownian motion;
(c) Under Po, {x t } and {Yt} are independent;
(d) {x t } has the same distributions under Po as under P;
(e) P is absolutely continuous with respect to Po with Radon-
Nikodym derivative
70 M. H. A. DAVIS AND S.l. MARCUS

It can also be shown [13. Section 6.2] that

is a martingale with respect to Ft and PO' so that


dP
At = EO [(jj5""" 1 Ft ],
o
where EO is the expectation with respect to PO. It can be shown
[9. p.234] that

(31 )

Hence conditional statistics of xt given Yt , in terms of the


original measure p. can be calculated in terms of conditional
statistics under the measure PO. We now proceed to derive a
recursive equation for the measure crt; an approach to solving
(31) by a path integration of the numerator and denominator is
pursued in some other papers in this volume.
Since crt(q,) = cr t (1) ·1T t (<jl) and we have the equation (26)
for 1Tt(<jl) , an equation for crt(<jl) is derived by finding a
stochastic differential equation for cr t (l):= EO[AtlYt] and
applying Ito's rule.
Lemma 4: EO[AtlYt] is given by the formula

( 32)

Proof: By Ito's rule, At satisfies


t
At = 1+ fo As h(x s ) dy s . ( 33)
AN INTRODUCTION TO NONLINEAR FILTERING 71

It follows as in the proof of Theorem 4 that At is a martingale


with respect to Yt . Since {Yt} is a Brownian motion under PO'
there must exist a Yt-adapted process {nt} such that
[13, Theorem 5.6]

(34)

We identify nt by the same technique as in Theorem 4: from (33)


and Ito's rule,

(35)

From (34) and Ito's rule,


t t t
AtY t = ~ AsdYs + ~ Ysnsdy s + ~ nsds . (36)

Now E[AtYt-AtYtIYs] = a for t~s, and calculating this from

----
(35) and (36) yields

nt = Ath(x t ):= EO[Ath(xt)IYt ]·

But from (31),


(37}

so (34) becomes

(38)

However, this has the unique solution

and the lemma is proved.


72 M. H. A. DAVIS AND S. I. MARCUS

Theorem 6: For any cP € D(L), 0 t (cp) satisfies

0 t (cp) = 0 0 (cp) + fot 0 s (LCP)ds+ f0t 0 s (hCP) dys· ( 39)

Proof: By Ito's rule, we have from (26) and (38):

d(A t 1f t (CP)) = At [1f t (LCP)dt + (1ft(hCP)-1ft(h)1ft(cP))(dYt-1ft(h)dt))

+ 1f t (CP) [A t 1f t (h)dy t ) + [1f t (hCP)-1f t (h)1ft(CP) )A t 1f t (h)dt

= At 1f t (LCP)dt + At 1f t (hCP)dy t ,

which gives (39) since 0t (cp) = At 1f t (CP).


The remarks following Theorem 5 are also applicable here.
In addition, we note that the Stratonovich version of (39), which
is utilized in a number of papers in this volume, is:

( 40)

where
-
Lcp (x) = Lcp (x) - 2"1 h2 (x) cP ( x)

and denotes a Stratonovich (symmetric) stochastic integral


0

(9),[22).
Example 4: Under the assumptions of Example 2, we can
derive a stochastic differential equation for q(t,x):= Atp(t,x);
this is interpreted as an unnormalized conditional density, since
then
-() g( t,x)
p t,x = Jq(t,x)dx.

As in Example 2, an integration by parts in (39) yields the


stochastic partial differential equation:
dq(t,x) = L*q(t,x)dt + h(x)q(t,x)dy t . (41 )
AN INTRODUCTION TO NONLINEAR FILTERING 73

Notice that (41) has a much simpler structure than (27): it does
not involve an integral such as TIt(h) , and it is a bilinear
stochastic differential equation with {Yt} as its input. This
structure is utilized by a number of papers in this volume.

ACKNOWLEDGMENT
The work of S. I. Marcus was supported in part by the U.S.
National Science Foundation under grant ENG-76-11106.

REFERENCES
1. R. E. Kalman, "A new approach to linear filtering and
prediction problems," J. Basic Eng. ASME, 82, 1960,
pp. 33-45.
2. R. E. Kalman and R. S. Bucy, "New results in linear
filtering and prediction theory," J. Basic Engr. ASME
Series D, 83, 1961, pp. 95-108.
3. R. S. Bucy, "Nonlinear filtering," IEEE Trans. Automatic
Control, AC-I0, 1965, p. 198.
4. H. J. Kushner, "On the differential equations satisfied by
conditional probability densities of Markov processes," SIAM
J. Control, 2, 1964, pp. 106-119.
5. A. N. Shiryaev, "Some new results in the theory of controlled
stochastic processes [Russian]," Trans. 4th Prague Conference
on Information Theory~ Czech. Academy of Sciences, Prague,
1967.
6. R. L. Stratonovich, Conditional Markov Processes and Their
Application to the Theory of Optimal Control. New York:
Elsevier, 1968.
7. W. M. Wonham, "Some applications of stochastic differential
equations to optimal nonlinear filtering," SIAM J. Control,
2, 1965, pp. 347-369.
8. M. Zakai, "On the optimal filtering of diffusion processes,"
Z. Wahr. Verw. Geb., 11, 1969, pp. 230-243.
9. E. Wong, Stochastic Processes in Information and Dynamical
Systems. New York: McGraw-Hill, 1971.
74 M. H. A. DAVIS AND S. I. MARCUS

10. T. Kailath, "An innovations approach to least-squares


estimation -- Part I: Linear fil tering in additive white
noise," IEEE Trans. Automatic Control, AC-13, 1968,
pp. 646-655.
11. P. A. Frost and T. Kailath, "An innovations approach to
least-squares estimation III," IEEE Trans. Automatic Control,
AC-16, 1971, pp. 217-226.
12. M. Fujisaki, G. Kallianpur, and H. Kunita, "Stochastic
differential equations for the nonlinear filtering problem,"
Osaka J. Math., 1, 1972, pp. 19-40.
13. R. S. Liptser and A. N. Shiryaev, Statistics of Random
Processes I. New York: Springer-Verlag, 1977.
14. G. Kallianpur, Stochastic Filtering Theory. Berlin-
Heidelberg-New York: Springer-Verlag, 1980.
15. E. Pardoux, "Stochastic partial differential equations and
filtering of diffusion processes," Stochastics, 2, 1979,
pp. 127-168 [see also Pardoux' article in this volume].
16. N. V. Krylov and B. L. Rozovskii, "On the conditional
distribution of diffusion processes [Russian]," Izvestia
Akad. Nauk SSSR, Math Series 42, 1978, pp. 356-378.
17. R. W. Brockett, this volume.
18. V. E. Benes, "Exact finite dimensional filters for certain
diffusions with nonlinear drift," Stochastics, to appear.
19. M. H. A. Davis, this volume.
20. J. H. Van Schuppen, "Stochastic filtering theory: A
discussion of concepts, methods, and results," in
Stochastic Control Theory and Stochastic Differential
Systems, M. Kohlmann and W. Vogel, eds. New York:
Springer-Verlag, 1979.
21. M. H. A. Davis, "Martingale integrals and stochastic
calculus," in Communication Systems and Random Process
Theory, J. K. Skwirzynski, ed. Leiden: Noordhoff, 1978.
22. L. Arnold, Stochastic Differential Equations. New York:
Wiley, 1974.
23. A. Friedman, Stochastic Differential Equations and
Applications, Vol. 1. New York: Academic Press, 1975.
AN INTRODUCTION TO NONLINEAR FILTERING 75

24. I. I. Gihman and A. V. Skorohod, Stoahastia Differential


Equations. New York: Springer-Verlag, 1972.
25. I. Gertner, "An alternative approach to nonlinear filtering,"
Stochastic Processes and their Applications, 7, 1978,
pp. 231-246.
26. D. W. Stroock and S. R. S. Varadhan, MUltidimensional
Diffusion Proaesses. New York: Springer-Verlag, 1979.
27. D. All inger and S. K. Mitter, "New results on the innovations
problem for nonlinear filtering," Stochastics, to appear.
28. E. Wong, "Recent progress in stochastic process -- a survey,"
IEEE Trans. Inform. Theory, IT-19, 1973, pp. 262-275.
29. J. Jacod, Calaul Stoahastique et Problemes de Martingales.
Berlin-Heidelberg-New York: Springer-Verlag, 1979.
30. S. K. Mitter, this volume.

You might also like