The document proposes approximating the posterior probability density function of a noisy dynamic system's state conditioned on measurements, using a convex combination of Gaussian densities. This approximation allows circumventing difficulties in evaluating Bayesian recursion relations for nonlinear or non-Gaussian systems. Specifically, the approximation involves representing the density as a weighted sum of Gaussian terms. As more terms are added, the approximation converges uniformly to the true density. Further, any finite sum is a valid density function. The paper applies the approximation to a linear system with non-Gaussian noise to determine the posterior density and minimum variance estimates.
The document proposes approximating the posterior probability density function of a noisy dynamic system's state conditioned on measurements, using a convex combination of Gaussian densities. This approximation allows circumventing difficulties in evaluating Bayesian recursion relations for nonlinear or non-Gaussian systems. Specifically, the approximation involves representing the density as a weighted sum of Gaussian terms. As more terms are added, the approximation converges uniformly to the true density. Further, any finite sum is a valid density function. The paper applies the approximation to a linear system with non-Gaussian noise to determine the posterior density and minimum variance estimates.
Automatica, Vol. 7, pp. 465--479. Pergamon Press, 1971. Printcxl in Great Britain.
Recursive Bayesian Estimation Using Gaussian Sums"
Estimation recurrente bayesienne utilisant les sommes gaussiennes Rekursive Bayes-Schtitzung unter Benutzung Gauss'scher Summen PeKypCHBHa~ 6afiecoBcra~ ot/errKa c noMomsro rayccoBcKnx CyMM H. W. SORENSON t and D. L. ALSPACH~ The approximation O/ t he p, obabi l i t y density p(xklZk) of the state of a noisy dynamic system conditioned on available measurement data using a convex combination of gaussian densities is proposed as a practical means Jbr accomplishing non-linear filtering. Sununary--The Bayesian recursion relations which describe the behavior of the a posteriori probability density function of the state of a time-discrete stochastic system conditioned on available measurement data cannot generally be solved in dosed-form when the system is either non-linear or non- gaussian. In this paper a density approximation involving convex combinations of gaussian density functions is intro- duced and proposed as a meaningful way of circumventing the difficulties encountered in evaluating these relations and in using the resulting densities to determine specific estima- tion policies. It is seen that as the number of terms in the gaussian sum increases without bound, the approximation converges uniformly to any density function in a large class. Further, any finite sum is itself a valid density function unlike many other approximations that have been investigated. The problem of determining the a posteriori density and minimum variance estimates for linear systems with non- gaussian noise is treated using the gaussian sum approxima- tion. This problem is considered because it can be dealt with in a relatively straightforward manner using the approximation but still contains most of the diffculfies that one eneotmters in considering non-linear systems since the a posteriori density is nongaussian. After discussing the general problem from the point-of-view of applying gaussian sums, a numericai example is presented in which the actual statistics of the a posteriori density are compared with the values predicted by the gaussian sum and by the Kalman filter approximations. I. INTRODUCTION THE P~OBLrM of est i mat i ng t he st at e of a non- l i near st ochast i c syst em f r om noi sy meas ur ement dat a is consi dered. Thi s pr obl em has been t he subj ect of consi derabl e research i nt erest dur i ng t he pas t few years, and JAZWrNS~I [1] gives a t hor ough dis- cussi on of t he subject. Al t hough a gr eat deal has * Received 24 August 1970; revised 18 December 1970. The original version of this paper was not presented at any IFAC meeting. It was recommended for publication in revised form by associate editor L. Meier. t Department of the Aerospace and Mechanical Engine- ering Sciences, University of California at San Diego, La Jolla, California 92037, U.S.A. ~; Department of Electrical Engineering, Colorado State University, Fort Collins, Colorada, U.S.A. been publ i shed on t he subject, t he basi c obj ect i ve of obt ai ni ng a sol ut i on t hat can be i mpl ement ed in a st r ai ght f or war d manner f or specific appl i cat i ons has not been sat i sfact ori l y realized. Thi s is mani f est ed by t he fact t hat t he Ka l ma n filter equat i ons [2, 3], which are val i d onl y f or linear, gaussi an systems, cont i nue t o be widely used f or non-l i near, non- gaussi an systems. Of course cont i nued appl i cat i on has resul t ed in t he devel opment of ad hoc t echni ques [e.g. Refs. 4, 5] t hat have i mpr oved t he per f or mance of t he Ka l ma n filter and whi ch give it some of t he charact eri st i cs of non-l i near filters. Cent ral t o t he non-l i near est i mat i on and st ochast i c cont rol pr obl ems is the det er mi nat i on of t he pr obabi l i t y densi t y funct i on of t he st at e con- di t i oned on t he avai l abl e meas ur ement dat a. I f this a posteriori densi t y f unct i on were known, an esti- mat e of t he st at e f or any per f or mance cri t eri on coul d be det ermi ned. Unf or t unat el y, al t hough t he ma nne r in whi ch t he density evolves wi t h t i me and addi t i onal meas ur ement dat a can be descri bed in t er ms of differential, or difference, equat i ons [6-8] t hese rel at i ons are general l y very difficult t o solve ei t her in dos e d f or m or numeri cal l y, so t hat it is usual l y i mpossi bl e t o det er mi ne t he a posteriori densi t y f or specific appl i cat i ons. Because of this difficulty it is nat ur al t o i nvest i gat e t he possi bi l i t y of appr oxi mat i ng t he densi t y wi t h some t r act abl e f or m. I t is t o this appr oxi mat i on pr obl em t hat this discussion is directed. 1.I. The general problem The appr oxi mat i on t hat is discussed bel ow is i nt r oduced as a means of deal i ng wi t h t he fol l owi ng syst em and filtering pr obl em. Suppose t hat t he st at e x evolves accor di ng t o xk+ ~ = fRx~, wD ( 1) 465 466 H. W. SORENSON and D. L. ALSPACH and t hat the behavior of the state is observed im- perfectly t hrough the measurement dat a Zk=hk(Xk, Vk). (2) The w k and Vk represent white noise sequences and are assumed to be mut ual l y independent. The basic probl em t hat is considered is t hat of estimating the state x k from the measurement data* Z k for each k, t hat is, the filtering problem. Gener- ally, one at t empt s to determine a "best " estimate by choosing the estimate to extremize some per- formance criterion. For example, the estimate could be selected to minimize the mean-square error. Regardless of the performance criterion, given the a post eri ori density f unct i on p(Xkl Zk) , any t ype of estimate can be determined. Thus, the est i mat i on probl em can first be approached as the probl em of determining the a post eri ori density. This is generally referred to as the Bayesian approach [9]. 1.2. The Bayesi an approach As has been demonst rat ed by the great interest in the Kal man filter, it is frequently desirable to determine the estimates recursively. That is, an estimate of the current state is up-dat ed as a func- t i on of a previous estimate and the most recent or new measurement data. In the Bayesian case, the a post eri ori density can be determined recursively according to the following relations. p( x lZ ) -P( x lZ k-1 ) P( Z lX ) p(ZklZk_ 0 ( 3 ) ,( XklZk_,) = Sp(Xk_ llZk_ 1)p(xklXk _ ,)dXk_ x (4) where the normalizing cons t ant p ( Z k [ Z k _ 1) in equa- t i on (3) is given by p( z lZ k- 1 ) = . [ p( xkl 1 ) P( z lx ) d x ( 5 ) The initial density p(xo[Zo) is given by p( x0 l z0 ) - P( Z [ X ) p( x ) . ( 6) p( zo) The density p(ZklXk) in equat i on (3) is determined by the a pri ori measurement noise density p(Vk) and the measurement equat i on (2). Similarly, the fl(Xk[Xk- 1) in equat i on (4) is determined byp( w k_ 1) and equat i on (1). Knowl edge of these densities and p(Xo) determines the P(xk[Zk) for all k. However, it is generally impossible to accomplish the integra- t i on indicated in equat i on (4) in closed form so t hat the density cannot actually be determined for most * The upper case letter Ze denotes the set (z0,zl . . . . z~). applications. The principal exception occurs when the plant and measurement equations are linear and the initial state and the noise sequences are gaussian. Then, equations (3-6) can be evaluated and the a post eri ori density P ( X k [ Z k ) is gaussian for all k. The mean and covariances for this system are known as the Kal man filter equations. When the system is non-linear and/ or the a pri ori distributions are non-gaussian, two problems are encountered. First the integration in equation (4) cannot be accomplished in closed form and, second the moment s are not easily obtained from equat i on (3). These problems lead to the investiga- t i on of density approxi mat i ons for which the required operations can be accomplished in a straightforward manner. A particularly promising approxi mat i on is considered here. Emphasis in this discussion is on the approxi mat i on itself rather t han the non-linear filtering problem but the latter has been stated because it provides the mot i vat i on for considering an approximation. The approxi mat i on of probability density functions is discussed in section 2. The use of the gaussian stun approxima- tion for a linear system with non-gaussian a pri ori density functions for the initial state and for the pl ant and measurement noise sequences is dis- cussed in section 3. 2. APPROXIMATION OF DENSITIES The probl em of approxi mat i ng density functions in order to facilitate the det ermi nat i on of the a post eri ori density and its moment s has been considered previously [10] using Edgewort h and Gr am- Char l i er expansions. While this approach has several advantages and its utility has been demonst rat ed in some applications, it has the distinct disadvantage t hat when these series are t runcat ed, the resulting density approxi mat i on is not positive for all values of the independent variable. Thus, the approxi mat i on is not a valid density funct i on itself. To avoid or to at least reduce the negativity of the density approxi mat i on, it is sometimes necessary to retain a large number of terms in the series which can make the approxi- mat i on comput at i onal l y unattractive. Furt her, it is well known t hat the series converges to a given density onl y under somewhat restrictive conditions. Thus, al t hough the Edgewort h expansion is a con- venient choice in many ways, it has proven desirable t o seek an approxi mat i on which eliminates these difficulties. The gaussian s um approxi mat i on t hat is described here exhibits the basic advantages of the Edgeworth expansion but has none of the dis- advantages already noted. That is, the approxima- tion is always a valid density funct i on and, further, converges uni forml y to any density of practical concern. An approxi mat i on of this type was Recursive Bayesian est i mat i on using gaussian sums 467 suggested briefly by AOKt [11]. Mor e recently, CAMERON [12] and Lo [13] have assumed t hat the a pri ori density funct i on for the initial state of linear systems with gaussian noise sequences has the form of a gaussian sum. 2.1. Theoretical f oundat i ons o f the approxi mat Dn Consider a probability density funct i on p which is assumed to have the following properties. 1. p is defined and cont i nuous at all but a finite number of locations 2. Ioo p ( x ) d x = 1 3. p(x)>_ 0 for all x It is convenient al t hough not necessary to consider only scalar r andom variables. The generalization t o the vector case is not difficult but complicates the presentation and, it is felt, unnecessarily detracts from the basic ideas. The probl em of approxi mat i ng p can be con- veniently considered within the context of delta families of positive t ype [14]. Basically, these are families of functions which converge t o a delta, or impulse, funct i on as a paramet er characterizing the fami l y converges to a limit value. More precisely, let {64} be a fami l y of functions on ( - o% oo) which are integrable over every interval. This is called a delta f a mi l y o f positive type if the following con- ditions are satisfied. (i) I a a 6~(x)dx tends t o one as 2 tends t o some limit value ;to for some a. (ii) For every const ant y>_0, no mat t er how small, 6x tends to zero uni forml y for 7 < [ x [ < ~ as 2 tends to 20. (iii) 6z ( x ) > 0 for all x and 2. Using the delta families, the following result can be used for the approxi mat i on of a density funct i on p. Theorem 2.1. The sequence px( x) which is formed by the convol ut i on of 6z and p pa( x) = f ~oo cSa( x- u) p( u) du (7) converges uni forml y to p ( x ) on every interior sub- interval of ( - ~ , oo). For a pr oof of this result, see KOREVAAR [14]. When p has a finite number of discontinuities, the t heorem is still valid except at the points of dis- continuity. It shoul d be not ed t hat essentially the same result is given by Theorem 2.1 in FELLER [15]. I f {6a} is required to satisfy the condi t i on t hat it follows f r om equat i on (7) t hat Pa is a probabi l i t y density funct i on for all 2. It is basically the presence of the gaussian weight- ing funct i on t hat has made the Edgewort h expan- sion attractive for use in the Bayesian recursion relations. The operations defined by equations (3-6) are simplified when the a pri ori densities are gaussian or closely related to the gaussian. Bearing this in mind, the following delta fami l y is a nat ural choice for density approximations. Let ,L(x) A_U~(x) = (2rc22) e xp[ - x2/i2]. ( 8 ) It is shown wi t hout difficulty t hat Na( x) forms a delta family of positive type as ; t ~0. That is, as the variance tends to zero, the gaussian density tends t o the delta function. Using equations (7) and (8), the density approxi- mat i on p~ is written as I oo Pz ( x) = p ( u ) Nz ( x - u ) d u . (9) - - 00 It is this form t hat provides the basis for the gaussian sum approxi mat i on t hat is the subject of this discussion. While equat i on (9) is an interesting result, it does not immediately provide the approxi mat i on t hat can be used for specific application. However, it is clear t hat p ( u ) N x ( x - u) is integrable on ( - o% oo) and is at least piecewise continuous. Thus, (9) can itself be approxi mat ed on any finite interval by a Ri emann sum. In particular, consider an approxi- mat i on of Pz over some bounded interval (a, b) given by /1 p. . / x) = ~ ~__E 1 p(x~)N~(x- x,)[~,- 4,- i ] (i0) where the interval (a, b) is divided i nt o n sub- intervals by selecting points ~, such t hat a = ~ o < ~ l < . . . ~, = b . In each subinterval, choose the poi nt x~ such t hat f ~ P( X, ) [ ~i - 4, - 1] = p ( x ) d x ~ - I which is possible by the mean-value theorem. The const ant k is a normalizing const ant equal to f ~o 6z( x) dx= l k I > x, d x 468 H. W. SORENSON and D. L. ALSPACH and insures t ha t p. , x is a density funct i on. Clearly, f or ( b - a ) sufficiently large, k can be made arbi- t rari l y dos e t o 1. Not e t hat it follows t hat Thus, one can at t empt to choose ~i, #i ai ( i = 1, 2 . . . . , n) n k,'= p( x i ) [ { , - 4, - a] = 1 (11) so t hat p. , a essentially is a convex combi nat i on of gaussian density ft mct i ons Na. It is basically this f or m t hat will be used in all fut ure discussion and which is referred t o hereaft er as t he gaussian sum approximation. It is i mpor t ant t o recognize t he p. , a t hat are f or med in this manner are valid pr oba- bility density funct i ons f or all n, 2. 2.2. Implementation of the gaussian sum approxima- tion The precedi ng discussion has i ndi cat ed t hat a pr obabi l i t y density funct i on p t hat has a finite number of discontinuities can be appr oxi mat ed arbi t rari l y closely outside of a regi on of arbi t rari l y small measure ar ound each poi nt of di scont i nui t y by t he gaussian sum p, . a as defined in equat i on (10). These asympt ot i c propert i es are cert ai nl y necessary f or the cont i nued investigation of t he gaussian sum. However, f or pract i cal purposes, it is desirable, in fact imperative, t hat p can be appr oxi mat ed t o within an accept abl e accuracy by a relatively small number of t erms of t he series. This r equi r ement furnishes an addi t i onal facet t o t he pr obl em t hat is consi dered in this section. For t he subsequent discussion, it is conveni ent t o write t he gaussian sum appr oxi mat i on as where p. ( x) = ~ ~l N. , ( x- pt ) (12) i = 1 ~ = 1 ; ~ > 0 f or all i . i = 1 The rel at i on of equat i on (12) t o equat i on (10) is obvi ous by inspection. Unl i ke equat i on (10) in which t he vari ance 2 2 is common t o every t er m of t he gaussian sum, it has been assumed t hat t he vari ance ai 2 can var y f r om one t er m t o anot her. Thi s has been done t o obt ai n great er flexibility f or appr oxi mat i ons using a finite number of terms. Cert ai nl y, as t he number of t erms increase, it is necessary t o requi re t hat t he ai t end t o become equal and vanish. The pr obl em of choosi ng t he par amet er s ~, / ~i , a~ t o obt ai n t he " bes t " appr oxi mat i on p, t o some densi t y f unct i on p can be considered. To define this mor e precisely, consi der t he L k norm. The distance bet ween p and p , can be defined as oO p _ p , k = p ( x ) - so t hat t he distance IIP-P,[[ is minimized. As t he number of t erms n increases and as the variance decreases t o zero, t he distance must vanish. How- ever, f or finite n and nonzer o variance, it is reason- able t o at t empt to minimize the distance in a manner such as this. In doi ng this, t he stochastic estimation pr obl em has been recast at this poi nt as a deter- ministic curve fitting probl em. Ther e are ot her norms and pr obl em formul at i ons t hat coul d be considered. In many probl ems, it may be desirable to cause the appr oxi mat i on t o mat ch some of t he moment s, f or example, t he mean and variance, of the t rue densi t y exactly. I f this were a requi rement , t hen one coul d consi der the moment s as const rai nt s on the mi ni mi zat i on pr obl em and proceed appropri at el y. For example, if t he mean associated with p is p, t hen t he con- straint t ha t p, have mean value p woul d be U ~ n ~ i a l - - i # = ~ ailtl. (14) i = 1 Thus, equat i on (14) would be consi dered in addi t i on t o t he const rai nt s on the oq stated aft er equat i on (12). Studies rel at ed t o t he pr obl em of approxi mat i ng p with a small number of terms have been conduct ed f or a large number of density functions. These in- vestigations have indicated, not surprisingly, t hat densities which have discontinuities generally are mor e difficult t o appr oxi mat e t han are cont i nuous functions. The results f or t wo density functions, t he uni f or m and t he gamma, are discussed below. The uni f or m density is di scont i nuous and is of interest f r om t hat poi nt of view. The gamma funct i on is nonzer o onl y f or positive values of x so is an exampl e of a nonsymmet r i c density t hat extends over a semi-infinite range. Consi der t he fol l owi ng uni f or m density funct i on J' f or - 2 < x _ 2 p(x) = ~0 elsewhere ( 1 5 ) This di st ri but i on has a mean value of zero and variance of 1.333. Two different met hods of fitting equat i on (15) have been considered. First, consi der an approxi - mat i on t hat is suggested directly by (10) and referred t o subsequent l y as a Theorem Fit. The paramet ers of t he appr oxi mat i on are chosen in t he following general manner. Recursive Bayesian est i mat i on using gaussian sums 469 (1) Select the mean value p~ of each gaussian so t hat the densities are equally spaced on ( - 2 , 2). By an appropri at e l ocat i on of the densities the mean value const rai nt (14) can be satisfied immediately. (2) The weighting factors 0q are set equal to i/n so ~ t = l . t = l (3) The variance 12 of each gaussian is the same and is selected so t hat the L 1 distance between p and p, is minimized. This approxi mat i on procedure requires onl y a one- dimensional search to determine 2. To investigate the accuracy and the convergence of the approxi mat i on, the number of terms in the sum was varied. Figure l( a) shows the approxi ma- t i on when 6, 10, 20 and 49 terms were included. It is interesting t o observe t hat the approxi mat i on retains the general character of the uni form density even for the six-term case. As shoul d be expected, the largest errors appear in the vicinity of the dis- continuities at + 2. These approxi mat i ons exhibit an apparent oscillation about the true value t hat is not visually satisfying. This oscillation can be eliminated by using a slightly larger value for the variance of the gaussian terms as is depicted in Fig. l(b). The second and fourt h moment s and the L 1 error are listed in Table 1 for these t wo sets of approxi ma- tion. The individual terms were located sym- metrically about zero so t hat the mean value of the gaussian sum agrees with t hat of the uni form den- sity in all cases. Not e t hat for the best fit the error in the variance is onl y 1.25 per cent when 20 terms are used. As should be expected, higher order [a. t ~ 13. O. S - ( o ) 0 . 4 0 . 3 0 . 2 o - I / - 3 - 2 . 1 0 . 5 - (c) 0 . 4 0 . 3 0.2 '-,.b ' 6 X 0.5- (b) 0 . 4 - 0 . 3 0 . 2 - ' I-~0 2" 0 3 " 0 - 3 - 0 - 2 - 0 i , , i i f i t i , - I ' 0 0 I ' 0 2 ' 0 3 ' 0 0.1 {' r , i i , i , 1 , i - 3 ' 0 - 2 " 0 - I "0 0 | ' 0 X Fzo. 1. Gaussian [ ~ s e o r c h f i t . . . . . I 0 t e r m - b e s t t h e o r e m f i t . . . . . . . . . I 0 t e r m - - s m o o t h e d t h e o r e m f i t k \ 2 . 0 3. 0 sum approxi mat i ons of uni form density function. (a) Best t heorem f i t - - 6, 10, 20 and 49 t erm approxi- mations. (b) Smoot hed t heorem f i t - - 6, 10, 20 and 49 t erm approximations. (C) L 2 Search fit comparison, 470 H. W. SORENSON a nd D. L. ALSPACH mome nt s conver ge mor e sl owl y since er r or s f ar t hes t a wa y f r om t he me a n as s ume mor e i mpor t ance. Fo r exampl e, t he f our t h cent r al mo me n t has an er r or o f 5 per cent f or t he 20- t er m appr oxi mat i on. The er r or s i n t he mome nt s o f t he s moot he d fit are aggr avat ed onl y sl i ght l y a l t hough t he L I er r or i ncreases in a nont r i vi al manner . TABLE 1. UNIFORM DENSITY APPROXIMATION Fo u r t h cent ral L 1 Var i ance mo me n t er r or True values 1 "333 3.200 - - o f t he n u mb e r o f t er ms i nvol ved, obt ai ni ng a search fit is si gni fi cant l y mor e difficult t han obt ai ni ng a t he or e m fit f or t he same numbe r of t erms. Thus, t he t he or e m fit ma y be mor e desi rabl e f r om a pr act i cal st andpoi nt . The mome nt s a nd L 1 er r or f or t he sear ch fit is al so i ncl uded in Tabl e 1. These val ues are sl i ght l y bet t er t han t he t heor em fit i nvol vi ng t en t erms. I t is i nt erest i ng in Fig. l( c) t o not e t he " s pi kes " t hat have appear ed at t he poi nt s o f dis- cont i nui t i es. Thi s appear s t o be anal ogous t o t he Gi bbs p h e n o me n o n o f Four i er series. The s econd exampl e t hat is di scussed here is t he g a mma densi t y f unct i on. I t is defi ned as Best theorem f i t 6 terms 1.581 5-320 0.2199 10 terms 1.417 3.884 0.1271 20 terms 1.354 3.363 0.0623 49 terms 1.336 3.226 0.0272 f 0 f or x < 0 p ( x ) = x 3 e - X f or x > 0 . 6 (16) Smoothed theorem f i t 6 terms 1 "690 6.387 0.2444 10 terms 1.456 4.224 0.1426 20 terms 1.363 3"442 0.0701 49 terms 1.338 3.238 0.0280 L 2 search f i t 6 terms 1.419 3.626 0.0968 As an al t er nat i ve a ppr oa c h, t he par amet er s ~i #t, tr2 were chos en t o mi ni mi ze t he L 2 di st ance whi ch is her eaf t er r ef er r ed t o as an L 2 sear ch fit. These resul t s are s ummar i zed i n Fi g. 1 (c) f or n = 6. I ncl uded f or c ompa r i s on i n t hi s fi gure ar e t he 10- t er m t he or e m fits f r om Figs. l ( a) a nd l ( b) . Because The di s t r i but i on has a me a n val ue of 4 a nd second, t hi r d, a nd f our t h cent r al mome nt s o f 4, 8 a nd 72 respect i vel y. Fi rst , consi der a t he or e m fit o f t hi s densi t y i n whi ch t he me a n val ues ar e di st r i but ed uni f or ml y on (0, 10). Fo r t he uni f or m densi t y, t he uni f or m pl ace- me nt was na t ur a l ; f or t he g a mma densi t y it is not as appr opr i at e. F o r exampl e, f or n = 6 or 10, it is seen i n Fi g. 2 ( a) t hat t he appr oxi mat i ng densi t y is not as good, at l east vi sual l y, as one mi ght hope. The first f our cent r al mome nt s ar e listed i n Tabl e 2. Cl earl y, t he hi gher or der mome nt s cont ai n l arge er r or s a nd even t he me a n val ue is i ncor r ect i n cont r as t wi t h t he uni f or m densi t y. TABLE 2. GAMMA DENSI TY APPROXIMATION Third Fourth central central L1 Mean Variance moment moment Error True values 4 4 8 72 Theorem f i t 6 terms in (0, 10) 3"94 4"345 5"496 60.78 0"119 10 terms in (0, 10) 3.94 3.861 4-941 47.84 0.053 20 terms in (0, 10) 3.93 3.611 4-653 41.23 0.023 20 terms in (0, 12) 4.00 4.206 7-477 71.41 0.042 L2 search f i t 1 terms in (0, 10) 3'51 3"510 0 10.53 0"203 2 terms in (0, 10) 3"82 3"427 3.04 34"96 0"078 3 terms in (0, 10) 3.91 3.632 4"711 43-39 0"036 4 terms in (0, 10) 3"95 3-744 5"682 49-57 0"018 Recursi ve Baye s i a, est i mat i on usi ng gaussi an s ums 471 Lt. ( L O -Z 0 ' 2 0. 1 0.1 0 , 0 0 . 0 - 2 . 0 /J / 0 " 0 (a) 1 0 - t e r m . . . . . . . . . . . . 0 . Z 6 - t e r m . . . . . . . . o . 2 0 . 1 LL Q 0- 0"1 ;X ~*k O 0 2 " 0 4 " 0 6 " 0 8 ' 0 I 0 " 0 1 2 ' 0 x ) : -' 0 . 0 ' - " : ~ 14"0 - 2 " 0 0 "0 ( b ) i n t e r v a l s ( 0, 10) . . . . . . . ( 0 , 1 2 ) . . . . . . . . . . 2 . 0 4 . 0 6 . 0 8 . 0 I 0 . 0 12-0 14. 0 X 0 . 2 0 ' 2 o., rt 0' 1 I 0 ' 0 t 0.0 t x - 2 - 0 0 . 0 2 - 0 4 . 0 6 . 0 X FIO. 2. Gaussian sum approximations o f gamma density function. ( a) Best theorem fit--6 and 1 0 term approximations. ( b) Twenty term approximations over d ifferent inter- vals. ( c) L2 search fit: 3 and 4 term approximations. ( c ) 3 - t e r m . . . . . . 4 - t e r m . . . . . . . . . . . 8 ' 0 1 0 ' 0 12"0 14"0 Tw o d ifferent 20 - term approxi mat i ons are d e- pi cted in Fig. 2( b) . I n one the mean val ues o f the gaussi an terms are sel ected in the i nterval ( 0 , 1 0 ) , whereas the s econd approxi mat i on is d i stri buted in ( 0 , 1 2) . No t e in t he first case t he gaussi an s um tend s t o z ero much more rapi d l y than the gamma f unct i on f or x > 1 0 . Thus, t o i mprove the approxi mat i on and the mome nt s it is necessary t o i ncrease the i nterval over whi ch terms are pl aced and the s econd curve i nd i cates the i nf l uence o f thi s change. The resul ts of these t wo cases are i ncl ud ed in Fig. 2 and i nd i cate that i ncreasi ng the i nterval over whi ch the approxi - mat i on is val i d has si gni f i cantl y i mproved the moment s . The t heorem fit provi d es a very si mpl e me t hod f or obt ai ni ng an approxi mat i on. However, it is cl ear that better resul ts coul d be obt ai ned by choos - i ng at l east the mean val ues o f the i nd i vi d ual gaussi an terms more careful l y. Consi d er now s ome L 2 search fits. I n Fig. 2( c) , the search fits f or three 472 H. W. SORENSON and D. L. ALSPACH and four terms are depicted and the moments are listed in Table 2. Clearly, values of the mean and variance appear to be converging to the true values of 4. Note also that the 4-term approximation is considerably better than the 10-term theorem fit. Thus, the search technique, while more difficult to obtain, points out the desirability of judicious placement of the gauss,an terms in order to obtain the most suitable approximation. 3. LINEAR SYSTEMS WITH NONGAUSSIAN NOISE It is envisioned that the gauss,an sum approxima- tion will be very useful in dealing with non-linear stochastic systems. However, many of the properties and concomitant difficulties of the approximation are exhibited by considering linear systems which are influenced by nongaussian noise. As is well-known, the a p o s t e r i o r , density p( Xk / Zk ) is gauss,an for all k when the system is linear and the initial state and plant and measurement noise sequences are gauss,an. The mean and variance of the conditional density are described by the Kalman filter equations. When nongaussian distributions are ascribed to the initial state and/or noise se- quences, p ( Xk / Zk ) is no longer gauss,an and it is generally impossible to determine p ( X k / Z k ) in a closed form. Furthermore, in the linear, gauss,an problem, the conditional mean, i.e. the minimum variance estimate, is a linear function of the measurement data and the conditional variance is independent of the measurement data. These characteristics are generally not true for a system which is either non-linear or nongaussian. For the following discussion, consider a scalar system whose state evolves according to Xk = ~ k , k - l Xk - 1 + Wk - 1 (17) and whose behavior is observed through measure- ment data z k described by zk = Hk x ~ + VK. (18) Suppose that the density function describing the initial state has the form lo p(xo)= ~ ~ , o N t r ' i ( X o - , ; o ) . (19) i = l Assume that the plant and measurement noise sequences (i.e. {Wk} and {Vk} ) are mutually inde- pendent, white noise sequences with density func- tions represented by ~k p(w~)= Y t~,~Nqk,(w~--o),~) (20) i = 1 m k P @ k ) : E ] ' i k N r , k ( O k - - V i k ) " ( 2 1 ) i = 1 There are a variety of ways in which the gauss,an sum approximation could be introduced. For example, it is natural to proceed in the manner that is to be discussed here in which the a p r i o r i distribu- tions are represented by gauss,an sums as in equa- tions (19), (20) and (21). This approach has the advantage that the approximation can be deter- mined off-line and then used directly in the Bayesian recurs,on relations. An alternative approach would be to perform the approximation in more of an on-line procedure. Instead of approximating the a p r i o r i densities, one could deal with p ( X J Z k ) in equation (3) and the integrand in (4) and derive approximations at each stage. This would be more direct but has the disadvantage that considerable computation may be required during the processing of data. Discussion of the implementation of this approach will not be attempted in this paper. 3. 1. De t e r mi n a t i o n o f t he a posterior, di s t r i but i ons Suppose that the a p r i o r i density functions are given by equations (19, 20 and 21). In using these representations in the Bayesian recurs,on relations, it is useful to note the following properties of gauss,an density functions S e h o l i u m 3.1: For a4:0, N . ( x - a y ) = 1 N . l . ( y - x / a ) . (22) a S c h o l i u m 3.2: where N~, ( x - , i ) N , j ( x - , j ) 2 - - 2 21~ - Nt . , , + ~ j ( . , - . s ) N ~ , s ( x - . , j ) . , 4 + . J4 2 2 _ 2 _ _ i f , 7 j O,s a? +a~. (23) The proofs of (22) and (23) are omitted. Armed with these two results it is a simple matter to prove that the following descriptions of the filtering and prediction densities are true. T h e o r e m 3 . 1 . Suppose that P ( X k / Z k - O is described by It, p ( X k / Z k - 1 ) = E a*kN, ; k( Xk- - "; k) " (24) i = 1 Then, p ( X k / Z k ) is given by l k mk p(~dz~)= E Z , j N . , ~ ( x ~ - ~ , j ) (25) i = 1 j = I Recursive Bayesian est i mat i on using gaussian sums 473 where e ~ , , . ~ N ~(z~ - P~,.) C i j = O t l k T j k N x ( z k - - P i j m ~ = 1 y' 2 __ 1t./' 2_.' 2 - - _ 2 - - ~ k O t k "1- r j k , al~H~ rz eq = I~ik 4 ~ , 2 ~--'7-Y'7, 2 L k - - V~k-- Hk#tk] ~ t k -rl k "t" r j k 2 , ' 2 , ' 4 2 , ' 2 2 2 tru = ~i~ - trtk Hk/ ( ai ~ H~ + r ~ ) . It is obvious t hat the c u > 0 and t hat l k rak 2 Z c,~=~. Thus, equat i on (25) is a gaussian sum and for con- venienee one can rewrite it as Ilk p( xk/ Zk) = ~. atkN~,~(Xk-- I~ik) (26) t = 1 where nk =( l k ) ( mk ) and the aik, Pi k and /qk are formed in an obvious fashi on from the c u, tr u and e U The pr oof of the t heorem is straightforward. Fr om the definition of the measurement relation (18) and the measurement noise density (21), one sees t hat mk P(Zk/Xk) = ~ . , ~kN, , k( Zk- - Hk x k - - Vtk) . Using this and (24) in (3) and applying the two scholiums, one obtains (25). The prediction density is determined from (4) and leads t o the following result. Theor em 3.2. Assume t hat p ( x k / Z k ) is given by (26). Then for the linear system (17) and pl ant noise (20) the prediction density p(Xk+ 1/ Zk) is nk $ k p(x~+l/z~)= Y, 2 a~jJ%/x~+l i =1 j =l - Ck + 1, kl~ik -- 09jk) (27) where 2 2 2 2 2u = ~ + l , k Plk +qyk It is convenient to redefine terms so t hat (27) can be written as l k + l p(x~+ ~l Z ~) = ~ ~ ( ~ + ~) N~( ~ + , ) ( xk+ ~ - ~i( ~ + ~)). (28) Clearly, the definition of p( Xo) has the form (28) as does the p ( x k / Z k - 1 ) assumed in Theorem 3.1. Thus, it follows t hat the gaussian sums repeat themselves from one stage t o the next and t hat (26) and (28) can be regarded as the general forms t or an arbi t rary stage. Thus, the gaussian sum can al most be regarded as a reproducing density [16]. It is i mport ant , however, t o not e t hat the number of terms in the gaussian sum increases at each stage so t hat the density is not described by a fixed number of parameters. The density would be t rul y reproducing i f onl y the initial state were non- gaussian, for example, see Ref. [12]. It is clear in this case t hat the number of terms in the gaussian sum remains equal to the number used t o define p( x o) so t h a t p ( Xk / Z k ) is described by a fixed number of parameters. There are several aspects t hat require comment at this point. First, i f the gaussian sums for the a pr i or i density all cont ai n onl y one term, t hat is, t hey are gaussian, the Kal man filter equations and the gaussian a pos t er i or i density are obtained. In fact the e u and a. . 2 in (25) and the means and variances ~J 21k 2 in (27) each represent the Kal man filter equa- tions for the i j t h density combi nat i on. Thus, the gaussian sum in a manner o1 speaking describes a combi nat i on of Kal man filters operating in concert. To examine this t urt her, consider the first and second moment s associated with the prediction and filtering densities. Theor em 3.3 l k mk 1= 1 j = l eE( x~-~/~) ~lzd A_ p~/~ 2 l k mk = Y _ _ ., Y . , c i j [ o , j 2 + ( ~ k / , , - ~ t j ) 2 ] ( 3 0 ) i = 1 j = l E [ x k + ~ l Z d A e ~ + x / ~ = O k + ~ , k e ~ / k + E [ w d (31) E [ ( x ~ + 1 - ~ + ~ /~)2[ Z d = p k + ~/k 2 =~k+ l, k2pk/k2 + E {[wk--E(ogk)] 2} (32) where ~k t-----1 ~k E { EWk - - ~ ( ( Dk ) ] 2 } = ~ ~ i k ( q i k 2 dl- (.Oik 2 ) - - E 2 ( W k ) . i - - - I In equat i on (29), the mean value ~k/k is formed as the convex combi nat i on of the mean values 8q of the individual terms, or Kal man filters, of the gaussian sum. It is i mpor t ant to recogni ze t hat the cq, as is apparent from equat i on (25), depend upon the measurement data. Thus, the condi t i onal mean is a non-l i near funct i on of the current measurement data. 474 H. W. SORENSON and D. L. ALSPACH The condi t i onal variance pk/k 2 descri bed by (30) is mor e t han a convex combi nat i on of the variances of the individual t erms because of the presence of the t erm ( ~k/k--eij) z- Tiffs shows t hat t he variance is increased by the presence of t erms whose mean values differ significantly f r om t he condi t i onal mean ~k/k. The influence of these t erms is t empered by the weighting f act or c~j. Not e also t hat the con- ditional vari ance (in cont r ast t o t he linear Kal man filter) is a funct i on of t he measurement dat a because of t he cij and the (~k/k-- eij). The mean and vari ance of t he predi ct i on den- sity are descri bed in an obvi ous manner. I f t he gaussian sum is an appr oxi mat i on to the t rue noise density, these rel at i ons suggest t he desirability of mat chi ng t he first t wo moment s exactly in or der t o obt ai n an accurat e descri pt i on of the condi t i onal mean ~k+ l/k and vari ance Pk+ l/~. As discussed earlier, it is conveni ent t o assign the same vari ance t o all t erms of the gaussian sum. Thus, i f t he initial state and t he measurement and noise sequences are identically distributed, it is reasonabl e t o consi der t he variances f or all t erms to be identical and t o det ermi ne the consequences of this assumpt i on. Not e f r om Scholium 3.2 t hat i f t hen 0"i 2 = O' j 2 mo -2 (rU2 = a2/2 f or all i, j . Thus, t he vari ance remai ns t he same f or all t erms in t he gaussian sum whenp( xk/ Zk) is f or med and the concent rat i ons as described by t he vari ance becomes greater. Fur t her mor e, f r om Schol i um 3.2 it follows t hat t he mean value is given by i j - - ~ i " 3 7 / ~ j 2 so t he new mean val ue is the average of t he previ ous means. This suggests t he possibility t hat t he mean values of some terms, since t hey are t he average of t wo ot her terms, may become equal, or al most equal. I f t wo t erms of the sum have equal means and variances, t hey coul d be combi ned by addi ng t hei r respective weighting factors. This woul d reduce t he t ot al number of t erms in t he gaussian s u m. The ~j in (25) are essentially det ermi ned by a gaussian density. Thus, i f Zk--p~ j becomes very large, t hen t he ci j may become sufficiently small t hat t he entire t erm is negligible. I f t erms coul d be neglected, t hen t he t ot al number of t erms in the gaussian sum coul d be reduced. The predi ct i on and filtering densities are repre- sented at each stage by a gaussian sum. However, it has been seen t hat the sums have the characteristic t hat the number of t erms increases at each stage as t he pr oduct of the number of t erms in the two con- stituent sums f r om which t he densities are formed. This fact coul d seriously reduce the practicality o! this appr oxi mat i on i f t here were no alleviating circumstances. The discussion above regardi ng the diminishing of the weighting fact ors and the com- bining of t erms with nearly equal moment s has i nt r oduced the mechanisms which significantly reduce the appar ent ill effects caused by t he increase in t he number of terms in the sum. I t is an observed fact t hat the mechanisms whereby terms can be neglected or combi ned are i ndeed operat i ve and in fact can somet i mes permi t the number of t erms in t he series t o be reduced by a substantial amount . Since weighting fact ors f or individual t erms do not vanish identically nor do the mean and variance of mor e gaussian densities become identical, it is necessary t o establish criteria by which one can det ermi ne when t erms are negligible or are approxi - mat el y t he same. This is accompl i shed by defining numeri cal t hreshol ds which are prescribed t o main- rain t he numeri cal er r or less t han acceptable limits. Consi der the effects on t he L ~ er r or of neglecting t erms with small weighting factors. Suppose t hat t he density is p( x) = ~ oqNa( x - ai ) (33) i = 1 and such t hat al , a2 . . . . , ~, , - l ( m <n) are less t han some positive number 6~. Not e t hat the variance has been assumed t o be t he same f or each term. Consi der replacing p by Pa where pA(X) = 1 ~ o~iN,,(x--ai). (34) ~ , O~i i = m i = m The t ol l owi ng difficulty. Theorem 3.4. bound is det ermi ned wi t hout f ~ m - - 1 I p( x) -p ( x) ld x 2 E ( 3 5 ) oo 1= 1 < 2(m - 1)61 . (36) The L ~ er r or caused by neglecting ( m- 1) t erms each of whi ch are less t han fi~ is seen in (35) t o be less t han twice t he sum of the neglected terms. Thus, t he t hreshol d fi~ can be selected by using (36) or (35) t o keep t he increased L 1 er r or within acceptable limits. Consi der t he si t uat i on in which t he absol ut e value of t he difference of t he mean values of t wo t erms is small. In part i cul ar, suppose t hat a 1 and a2 are approxi mat el y the same and consi der t he L 1 er r or t hat results i f the p( x ) given in (33) is replaced by pa( x) = ~ o q N, ( x - a i ) +( o q + o t z ) N ~ ( x - ~ ) (37) i = 3 Recursi ve Bayesian est i mat i on using gaussian sums 475 where Usi ng (37), one can pr ove t he fol l owi ng bound. For a detailed pr oof of this and ot her results, see Ref. [17]. Theorem 3.5. f ~ _ o l P ( x ) - p A ( X ) l d x < _ 4 ~ 2 M l a 2 - a l I ~ + ~ (38) Thus t erms can be combi ned i f t he ri ght -hand side of (38) is less t han some positive number 62 which represents t he allowable L ~ error. The M in (38) is t he maxi mum val ue of N, and is given by 1 M= Observe t hat as t he vari ance tr decreases, the distance bet ween t wo t erms t o be combi ned must also decrease in or der t o retain t he same er r or bound. 3.2. A numerical example In this section t he results present ed in section 3.1 are appl i ed t o a specific example. To make (17) and (18) mor e specific, suppose t hat t he system is X k " ~ - - X k - I "~ Wk- 1 (39) Zk = Xk + vk (40) where t he xo, Wk, Vk(k = O, 1 . . . . ) are assumed t o be uni f or ml y di st ri but ed on ( - 2 , 2) as defined by (15). The pr obl em t hat is consi dered represents some- thing of a worst case f or t he appr oxi mat i on because t he initial state and t he noise sequences are assumed t o be uni f or ml y di st ri but ed with t he density dis- cussed in section 2.2. As discussed there, t he dis- cont i nui t i es at + 2 make it diificult t o fit this density and necessitates t he use of many t erms in t he gaussian sum. The specific appr oxi mat i on used here cont ai ns 10 t erms and is shown in Fig. l ( b) . I t is appar ent t hat this appr oxi mat i on has non- trivial errors in t he nei ghbor hood of t he dis- cont i nui t i es but nonetheless retains t he basic char act er of t he uni f or m distribution. The per f or mance of the gaussian sum approxi ma- t i on f or this exampl e is described bel ow by com- pari ng t he condi t i onal mean and vari ance pr ovi ded by t he appr oxi mat i on with t hat predi ct ed by t he Kal man filter and with the statistics obt ai ned by consi deri ng t he t rue uni f or m di st ri but i on. The l at t er have been det ermi ned f or this exampl e aft er a nont ri vi al amount of numeri cal comput at i on. In addi t i on the t rue a posteriori densi t y p(Xk/Zk) has been comput ed and is compar ed with t hat obt ai ned using t he approxi mat i on. The Kal man er r or vari ance is i ndependent of the measur ement sequence and can cause misleading filter response. For example, f or some measure- ment sequences perfect knowl edge of t he state is possible, t hat is t he variance is zero, but t he Kal man vari ance still predicts a large uncert ai nt y. For example, suppose t hat t he measurement s at each stage are equal t o Zk=2k+4, k = 0 , 1 . . . . Then, t he mi ni mum mean-square estimate f or t he state based on t he uni f or m di st ri but i on is .k/k=2k+2, k= O, 1 . . . . and the vari ance of this estimate is Pk/k2=O f or all k. Thus, f or this measur ement real i zat i on the mini- mum mean-square estimate is error-free. Since t he Kal man vari ance is i ndependent of t he measure- ments, it is necessarily a poor appr oxi mat i on of the act ual condi t i onal variance. The square r oot of t he Kal man variance 7KA L and t he er r or in t he best linear estimate t oget her with t he square r oot of t he vari ance a~s and er r or in t he estimate of t he state f or t he t en t er m gaussian sum f or this measur ement real i zat i on is shown in Fig. 3(a). Thi s shows t hat a considerable i mpr ovement in bot h t he mean and vari ance is pr ovi ded by t he gaussian sum when compar ed with t he results pr ovi ded by t he Kal man filter. (a) o ' i o ! 0"8 0-7 0. 6 0"5 0"4 0-5 0. 2 0"1 % I 0 0-9 0.8 0.7 0.6 0.5 0-4- 0. 3 0. 2 0.1 0 0 + f.tr - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + C T K A L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . " ~ K A L . . . . . ' " f / ( . : 1 - " ~ " . . . . . " . . . . X C S - / . v " I 1 I I I I I I 1 I I I I 1 I I I I I I 2 3 4 5 6 7 8 9 10 II 12 13 14 15 16 17 18 19 20 K (b) I / / I , , ' / V " , x o o , ~ . arRuz - 5 ', "~'.-'.,:'-~c.-~'<'~....:T'-.~ L."~"r..a. , ,-.r..~.., I 2 3 4 5 6 7 8 9 I0 II 12 13 14 15 16 17 18 19 PO I< F[ o. 3. Gaussian sum and kal man fi l ters compared wi t h best non-l i near f i l t er. (a) Perfect knowl edge example. Co) Random noise example. 476 H. W. SORENSON and D. L. ALSPACH The measurement sequence d escribed above is highly improbable. A more representative case is d epicted in Fig. 3 ( b) in which the plant and measure- ment noise sequences were chosen using a rand om number generator f or the uniform d istribution prescribed above. This figure again shows that the gaussian sum approximation to the true variance is consid erably improved over the Kal man estimate and , in fact, agrees very cl osel y with the true stand ard d eviation aTRU~, of the aposteri ori d ensity. However, the Kal man variance is representative enough that the error in the estimate of the mean is not particularly d ifferent than that provid ed by the gaussian sum. The aposteri ori d ensity function for the third and fourteenth stages is shown in Figs. 4( a) and 4( b) . I n these figures, the actual d ensity, the gaussian approximation provid ed by the Kalman filter and the gaussian sum approximation are all includ ed . At stage 3 , the true variance is smaller than the Kal man variance and the Kal man mean has a significant error whereas at stage 1 4 the true variance is larger than that pred icted by the Kalman filter. Not e that the error in the a pri ori d ensity approximation at the d iscontinuities is still evid ent in the a posteri ori approximation but that the general character of the d ensity is reprod uced by the gaussian sum. b . ( : 3 13. 0 , 7 5 0 . 5 0 0 " 2 5 ( a ) Tr ue p r o b a b i l i t y densi t y . . . . . . . Kal man a p p r o x i ma t i o n . . . . Gaussi an sum appr oxi mat i on / \ . . . . . . . \ . l / " +. . \ . 7 . . . : . ' " '.+ . \ . . "f'/ i / " . . . . i " + . . ' \ \ "'"'# / "+ "~.X , , , / / "'t.. X \ I I I I I i I i I I I 0" 5 I ' 0 1' 5 2 " 0 2 ' 5 S' O x 0 . 5 0 . 4 ~ 0 . 3 a ( 3 - 0"2 O. i I 0 ( b ) . . " " . . , , . . \ . . L " ' , j .:.:,',,, / ," ". \ / ' .. \ . : " "... \ \ / . : ' . \ / . . ' ' , / . . " ' ' . 1, 0 2 ' 0 3- 0 . 4 - 0 x " " ~3 5"0 0 ' 5 0 r ~ Q . 0 ' 2 5 ( c ) o.75 T r u e p r o b a b i l i t y d e n s i t y . . . . . 8~ = 0 . 0 0 1 = 8 2 . , . ' " , ~ ; ' " ~ ~ I I I 0 - 0 0 . 5 0 . 4 - 0 " 3 Q n O . 2 % I I 0 I ' 0 I - 5 2"0 2 ' 5 S' O 0 x ( d) . . . . . . . . . . . 8 , = 0 . 0 O I , 8 2 = 0 . 0 0 . 5 . . . . . 81= 0 " 0 0 5 , 82 = 0 . 0 O I . " ~ . ~ :+ -h+ /Y ;W ~ " . . / . ' - I : " \ ~ - ' 1 , , , , , , , 1.0 2"0 3"0 4 . 0 x FI O. 4. I nfluence of 61 , &z on a posteri orl d ensity. ( a) Third stal~l---8 2------0 "0 0 1 . Co) Fourteenth stag~---Sl = 8 2 = 0 . 0 0 1 . ( c) Third s t a l l = 0 " 0 0 1 , 62= 0 . 0 0 5 . and 5 t = 0 "0 0 5 , 61 = 0 "0 0 1 . ( d ) Fourteenth staile---alffi0 4) 0 5 , &z= 0 "0 0 1 . and 5 1 = 0 . 0 0 1 , 8 z= 0 . 0 0 5 . \ 5 ' 0 Recursive Bayesian estimation using gaussian sums 477 In Figs. 4(a) and 4(b) the approximations at each stage are based on terms being eliminated when their weighting functions are less than St=0. 001 and combined when the difference in mean values cause the L ~ error to be less than 52=0.001. The effect of changing these parameters, that is, 5~ and 52, can be seen in Figs. 4( c) and 4(d). In these figures, the effect of changing 5t and c52 is shown. In one case tS~ is increased from 0.001 to 0.005 while keeping 52 equal to 0.001. Alternatively, 52 is in- creased to 0.005 while the value of 5~ is maintained equal to 0.001. It is apparent in this example that 52 has a larger effect on the density approximation. The increase in this parameter can be seen to introduce a "ripple" into the density function and indicates that the individual terms have become too widely separated to provide a smooth approxima- tion. It is interesting that the effect appears to be cumulative as the ripple is not apparent after three stages but is quite marked at the 14th stage. The effect of improving the accuracy of the density approximation by including more terms in the a p r i o r i representations and by retaining more terms for the a p o s t e r i o r i representation can be seen in Fig. 5. Twenty terms are included in the a p r i o r i densities and 5~ and 52 are reduced to 0.0001. The figure presents the actual a p o s t e r i o r i density, the gaussian sum approximation, and the gaussian approximation provided by the Kal man filter equations. Comparison of the results for stages 3 and 14 with Fig. 4 indicates the improvements in the approximation that have occurred. - - T r u e p r o b a b i l i t y d e n s i t y . . . . . K a l m a n approxi mati on - - - G a u s s i a n s u m approxi mati on 4. CONCL US I ONS The approximation of density functions by a sum of gaussian densities has been discussed as a reasonable framework within which estimation policies for non-linear and/or nongaussian stochastic systems can be established. It has been shown that a probability density function can be approximated arbitrarily closely except at discontinuities by such a gaussian sum. In contrast with the Edgeworth or Gram-Charlier expansions that have been in- vestigated earlier, this approximation has the advantage of converging to a broader class of density functions. Furthermore, any finite sum of these terms is itself a valid density function. The gaussian sum approximation is a departure from more classical approximation techniques because the sum is restricted to be positive for all possible values of the independent variable. As a result, the series is not orthogonalizable so that the manner in which parameters appearing in the sum are chosen is not obvious. Two numerical pro- cedures are discussed in which certain parameters are chosen to satisfy constraints or are somewhat arbitrarily selected and others are chosen to mini- mize the L k error. It is anticipated that the gaussian sum approxima- tion will find its greatest application in developing estimation policies for non-linear stochastic systems. However, many of the characteristics exhibited by non-linear systems and some of the difficulties in using the gaussian sum in these cases are exhibited by treating linear systems with nongaussian noise 0 5 0 F . . 2 O F 0. 75 I - 0 ' 5 0 I - L l b - . . l . . . . . . . " ". . u _ - - - ' " ' 0 2 5 ~ .--' ~-"m---~-I o I 0 c-, . . a.- 0 25 / . . . . . . , . , ..... . . . . . , . o . _ o , : . , o l . . . . . ; i o o - " i o [ . ' ' , "-~ - 2 - 5 0 0 2 5 0 2 0 1.50 5 ' 0 0 0 2 . 5 0 5 . 0 0 x X X X 0. 50 F . . . . I 0 - 0- 5Of - 0 5 0 [ - . ' . . . . . . . t - b " - , , t - 0.~:5 ..' ~_ , ~ l 1 . " ~ % 2 5 t- ..7 " . t ~ c ~ t- I,' . . ' " " ' - ' - ' ~ , , , o , . . . , J " , ' l _ , o l . . . . . . . . . : / , , I % o 1 . ; ' 1 , , - I 0 0 i 5 0 - 2 5 0 ' , 00 5 0 0 0 2. 50 5 0 0 1.50 4 - 0 0 6- 50 X X X X 0 . 5 0 [ - 0 5 0 [ 0 5 0 l - I 0 [ 1' 50 4 ' 0 0 6 ' 5 0 0 2 5 0 ! 0 0 1.50 4 0 0 2 0 4 0 6 0 X X X X 5 r l - ..--~-'~m-q I / " " k 5 r 5 1 - I f ~ " o L , ~ f 1 I I I f ' . , ol , . d" ~ I r r - . , , oL. , L" ~ I I Itz' ~ o i ~ i i t i " L'~---.t 0 2 5 0 5 0 0 0 2 5 0 5. 00 C,'~O 2. 00 4- 50 i O 4 0 5. 0. 6 0 X X X X Fi e. 5. A posteriori d e n s i t y f o r s i x t e e n s t a g e s . 478 H. W. SORENSON a nd D. L. ALSPACH sources. Of course, when t he noi se is ent i r el y gaus s i an t he pr obl e m degener at es and t he f ami l i ar Ka l ma n filter e qua t i ons are obt a i ne d as t he exact s ol ut i on of t he pr obl em. The l i near , nonga us s i a n e s t i ma t i on pr obl e m is di scussed a n d it is s hown t hat , i f t he a priori densi t y f unc t i ons are r epr es ent ed as gaus s i an sums, t he n t he n u mb e r of t er ms r equi r ed t o descr i be t he a posteriori dens i t y is equal t o t he pr oduc t of t he n u mb e r of t er ms of t he a priori densi t i es used t o f or m it. The a ppa r e nt di s advant age is seen t o cause l i t t l e di ffi cul t y however , because t he mo me n t s of ma n y i ndi vi dua l t er ms conver ge t o c o mmo n val ues whi ch al l ows t he m t o be c ombi ne d. Fur t he r , t he wei ght i ng f act or s associ at ed wi t h ma n y ot her t er ms become ver y smal l a n d pe r mi t t hose t er ms t o be negl ect ed wi t hout i nt r oduc i ng si gni f i cant er r or t o t he a ppr oxi ma t i on. Nume r i c a l r esul t s f or a specific syst em are pr es ent ed whi ch pr ovi de a d e mo n s t r a t i o n of some of t he effects di scussed i n t he text. These r esul t s conf i r m dr amat i cal l y t ha t t he gaus s i an s um appr oxi - ma t i o n can pr ovi de cons i der abl e i ns i ght i nt o pr obl e ms t hat hi t her t o have been i nt r a c t a bl e t o anal ysi s. REFERENCES [1] A. H. JAZWINSKI: Stochastic Processes and Filtering Theory. Academic Press, New York (1970). [2] R. E. KALMAN: A new approach to linear filtering and prediction problems. J. bas. Engng 82D, 35--45 (1960). [3] H. W. SORENSON : Advances in Control Systems, Vol. 3, Ch. 5. Academic Press, New York (1966). [4] R. COSAERT and E. GOTTZEIN: A decoupled shifting memory filter method for radio tracking of space vehicles. 18th International Astronautical Congress, Belgrade, Yugoslavia (1967). [5] A. JAZWINSKI: Adaptive filtering. Automatica 5, 475- 485 (1969). [6] Y. C. Ho and R. C. K. LEE: A Bayesian approach to problems in stochastic estimation and control. IEEE Trans. Aut. Control 9, 333-339 (1964). [71 H. J. KUSHNER: On the differential equations satisfied by conditional probability densities of Markov pro- cesses. SIAMJ. Control 2, 106-119 (1964). [8] J. R. FmrlER and E. B. STEAR: Optimal non-linear filtering for independent increment processes, Parts I and II. IEEE Trans. Inform. Theory 3, 558-578 (1967). [9] M. AOKI: Optimization of Stochastic Systems, Topics in Discrete-Time Systems. Academic Press, New York (1967). [10] H. W. SORENSON and A. R. STOBBERUD: Nonlinear filtering by approximation of the a posteriori density. Int. J. Control 18, 33-51 (1968). [11] M. AOKI: Optimal Bayesian and rain-max control of a class of stochastic and adaptive dynamic systems. Pro- ceedings IFAC Symposium on Systems Engineering for Control System Design, Tokyo, pp. 77-84 (1965). [12] A. V. CAMERON: Control and estimation of linear systems with nongaussian a priori distributions. Pro- ceedings of the Third Annual Conference on Circuit and System Science (1969). [13] J. T. Lo: Finite dimensional sensor orbits and optimal nonlinear filtering. University of Southern California. Report USCAE 114, August 1969. [14] J. KOREYAAR: Mathematical Methods, Vol. 1, pp. 330- 333. Academic Press, New York (1968). [l 5] W. FELLER: An Introduction to Probability Theory and Its Applications, Vol. lI, p. 249. John Wiley, New York (1966). [16] J. D. SI'RAOINS: Reproducing distributions lbr machine learning. Standard Electronics Laboratories, Technical Report No. 6103-7 (November 1963). [17] D. L. ALST'ACH: A Bayesian approximation technique for estimation and control of time-discrete stochastic systems. Ph.D. Dissertation, University of Cahfornia, San Diego (1970). R6sum~--Les relations recurrentes bayesiennes qui decrivent le comportement de la fonction 'de densit6 de probabilit6 a posteriori de l'6tat d' un syst6me al6atoire, discret dans le temps, en se basant sur les donn6es des mesures disponibles, ne peuvent ~tre g6n6ralement resolues sous une forme fermee lorsque le syst6me est soit non-lin6aire, soit non-gaussien. ke present article introduit et propose une approximation de densit6, mettant en jeu des combinaisons convexes de fonc- tions de densit6 gaussiennes, ~t titre de m6thode efficace pour 6viter les difficultes rencontr6es dans l'6valuation de ces relations et dans l'utilisation des densit6s en r6sultant pour determiner des strat6gies d'6valuation particuli6res. II est montr6 que, lorsque le nombre de termes de la somme gaussienne augmente sans limites, l'approximation converge uniformement vers une fonction de densit6 quelconque dans une cat6gorie 6tendue. De plus, toute somme finie est elle- m6me une fonetion de densit6 valable, d' une mani6re diff6rente des nombreuses autres approximations 6tudi6es dans le pass6. Le probl6me de la determination des estimations de la densit6 a posteriori et de la variance minimale pour des syst~mes lin~aires avee bruit non-gaussien est trait6 en utilisant l'approximation des sommes gaussiennes. Ce probl6me est 6tudi6 parce qu' il peut ~tre trait6 d' une mani~re relativement simple en utilisant l'approximation mais contient encore la plupart des difficult~.s rencontr6es en consid6rant des syst+mes non-lin6aires, puisque la densit6 a posteriori est non-gaussienne. Apr6s la discussion du probl6me g6n6ral du point de vue de l'application des sommes gaussiennes, l'article pr6sente un exemple num6rique dans lequel les statistiques r6611es de la densit6 a posteriori sont compar6es avec les valeurs pr6dites par les approximations des sommes gaussiennes et du filtre de Kalman. Zusammenfassung--Die Bayes'schen Rekursionsbezie- hungen, die das Verhalten der a priori-Wahrscheinlichkeits- dichtefunktion des Zustandes eines zeitdiskreten stochasti- schen Systems beschrieben, k6nnen nicht allgemein in geschlossener Form gel6st werden, wenn das System ent- weder nichtlinear oder nicht-gaussisch ist. In dieser Arbeit wird eine Dichteapproximation, die konvexe Kombina- tionen von Gauss'schen Dichtefunktionen enth~ilt, eingefiihrt und als bedeutungsvoUer Weg zur Umgehung der Schwierig- keiten vorgeschlagen, die sich bei der Auswertung dieser Beziehungen und bei der Benutzung der resultierenden Dichten ergeben, um die spezifische Schiitzstrategie zu bestimmen. Ersichtlich konvergiert, da die Zahl der Terme in der Gauss'schen Summe unbegrenzt w~ichst, die Approxi- mation gleichm~il~ig zu einer Dichtefunktion in einer um- fassenden Klasse. Weiter ist eine endliche Summe selbst eine gfiltige Dichtefunktion, anders als viele andere schon untersuchte Approximationen. Das Problem der Bestimmung der Sch~ttzungen der a posteriori-Dichte und des Varianzminimums ftir lineare Systeme mit nichtgauss'schem Ger~iusch wird unter Behand- lung der Gauss'schen Summenapproximation behandelt. Dieses Problem wird betrachtet, well es in einer relativ geraden Art unter Benutzung der Approximation behandelt werden kann, abet immer noch die meisten der Schwierig- keiten enth~ilt, denen man bei der Betrachtung nichtlinearer Systeme begegnet, da die a posteriori-Dichte nichtgaussisch ist. Nach der Diskussion des allgemeinen Problems vom Gesichtspunkt der Anwendung Gauss'scher Summen, wird ein Zahlenbeispiel gebracht, in dem die aktuellen Statistiken der a posteriori-Dichte mit den Werten verglichen werde, die durch die Gauss'sche Summe und durch die Kalman- Filter-Approximationen vorhergesagt wurden. Recursive Bayesian estimation using gaussian sums 479 Pe~IOMe---PeKypcHBHMe 6a ~ ec o m :r ~ e BMpa~eHH~ OnHCM- aal ot t l ne nOBelIeHHe anocTepHopHoi~ ~ymmH nnOTHOCTH BepO~ITHOCTH COCTOIIHH~I cay~aI~o~ CHCTeMbI, ]IHCI{peTHO~ n o SpeMetlrl, OCHOSl, magcl , Ha ~ocTynm~Ix pe3ynbTaTax H3MepeHgl~, He MO FyT 6MTI, O61,DIHO peiiieHl, i B 3aMKHyTOI~ dpOpMe Kor ea CHCTeMa HH60 HCHHHe]~Ha, dIH60-~e He ssnaer c~ r aycco~cr oi t HaCTOSi~a~ CTaTb~ BBO~HT H npemmraer npH6HH~KeHHe ILrlOTHOCTH, npa6c raromc c K CO qeTallI ~ lM rayccoBCKHX dpyHKl1Hlt HHOTHOCTH, B Ka~ICCTBe 3Op~eKTHBHOFO MeTo~Ia ~na H3~KaHHfl 3aTpy~HeHHl~ BCTpeqaeMblX B OI~eHKe 3THX Bblpa~KeHH~ H B HCHO.rlb3OBaHHH B],~TeKaIOILmX H3 HHX HJIOTHOCTeI'~ ~II~1 onpe~eHH~ ~aHHblX CTaTerH.fi OReHKH. I'IoKa3MBaeTC~ qTO, KOFRa ~IHCHO ~IeHOB raycco~croit CyMMbI 6ecnpe~eH~HO Bo3pacTaeT, rlpH6nH~KeHHe paBHOMepHo CTpeMrlTC~I K n/ o6o~t (I~yHKI~HH ILrIOTHOCTH B IIIHpOKOM KJlaCC~. CBepx ~TOFO, BCaKa~I KOHeqHall CyMMa caMa ~IBJl~leTCfl rlpHeMneMOl'~ ~yHKHHeI~ IIHOTHOCTH~ B OTJIPIqI4H OT MHOFO~4~HCHeHH]bIX ~Ipyrtlx npH6Jm~cermll H3y~aBmHXC~ S npomnoM. I-[po6J1eMa oHpo~eJ1erm~ OHOHOK anOCTOpHOpHOfl rmO- THOCTII H MHHHMaYlbHO~ BapHaHTHOCTH ~flH HHHCflHbIX CHCTeM C IIo-rayccoBCIOIM IIIyMOM pemaeTcfl c noMot I~rO npH6nr mc on~ rayccoBcrdax CyMM. ~Ta npo6f l eMa n3y- tlaeTCfl IIOTOMy qTO OHa MO)KeT 6bITb peIi i eHa cpaBHllTe- JlbHO npOCTO Hcno.rtl,3y~ npHf m, txcerme HO co~lep~crlT e me 60JIBIIIHHCTBO 3aTpy~IHeHni~ BCTpeqaeMI, IX pacCMaTpHBa~ HeaaHeianbie CHCTeMbI, H60 ariocrepaopriaa nnor aocr b a a ~e wc s He-rayccoBczoit Hocne o6cy:~cjXeHaa o6IUeR npo6HeMbl C TOqKH 3peHHll HpHMeHeHH~I rayCCOBCKHX CyMM~ CTaTl~fl ~laeT qHCJIOBOI~ rlpltMep B KOTOpOM ~el~CTBHTe- IlbHble CTaTHCTHKH anOcTeprlopHolt ILrlOTHOCTH cpaBHHBalO- TCfl C 3HaLIeHH~VIH Hpe~cKa3at-IHblMrl HpH6HH)KeHri~Mri r ayccoacKax CyMM H ~HHbTpa KanMana.