100% found this document useful (1 vote)

382 views280 pages

Poisson Point Processes Imaging, Tracking, and Sensing

This document provides an overview and preface for a book on Poisson Point Processes (PPPs). The preface discusses the intended audience of graduate students and practitioners. It describes how the author's research into problems in passive sonar led him to study PPPs. The preface acknowledges those who provided feedback and support during the writing process. It expresses the author's hope that the book will make PPPs accessible to readers and provide useful insights and techniques without requiring an axiomatic approach.

Uploaded by

Yulei Wang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

382 views280 pages

Poisson Point Processes Imaging, Tracking, and Sensing

Uploaded by

Yulei Wang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 280

Poisson Point Processes

Roy L. Streit

Poisson Point Processes

Imaging, Tracking, and Sensing

123
Roy L. Streit
Metron Inc.
1818 Library St
Reston Virginia 20190-6281, USA
[email protected]
www.roystreit.com

ISBN 978-1-4419-6922-4 e-ISBN 978-1-4419-6923-1

DOI 10.1007/978-1-4419-6923-1
Springer New York Dordrecht Heidelberg London

Library of Congress Control Number: 2010932610

c Springer Science+Business Media, LLC 2010

All rights reserved. This work may not be translated or copied in whole or in part without the written
permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York,
NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in
connection with any form of information storage and retrieval, electronic adaptation, computer
software, or by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if
they are not identified as such, is not to be taken as an expression of opinion as to whether or not
they are subject to proprietary rights.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

To my wife, Nancy
Preface

A constructive approach to Poisson point processes (PPPs) is well suited to those

who seek to learn quickly what they are, how they are used, and—above all—
whether or not they can help the reader in whatever application they are working
on. To help readers reach these goals, PPPs are defined via a two-step realization,
or simulation, procedure. This procedure is subsequently used to obtain many won-
derful properties of PPPs.
The approach is pedagogically efficient because it uses a constructive definition
of a PPP in place of the traditional PPP axioms that include, for example, inde-
pendent scattering. The two approaches are in fact mathematically equivalent. The
question of elegance is a matter of taste. Many readers will find the constructive
definition easier to understand than an axiomatic one, especially on first encounter.
In any event, the simulation approach is taken here because it provides an efficient
entrée to PPPs and point processes generally.
There are now several books on PPPs and related topics. Most seem to be written
by mathematicians striving for mathematical rigor, and these are valuable books for
that very reason. Many practitioners, however, do not have the background to read
them without a serious commitment of time and effort. What I felt was lacking was
a book that made PPPs accessible to many mathematically capable readers and that
provided insight and technique without recourse to an axiomatic measure-theoretic
approach. Therefore, while I was writing the book, the audience foremost in my
mind was graduate students, especially beginning students in electrical engineering,
computer science, and mathematics. The real audience for this book is, of course,
anyone who wants to read it.
PPPs came only slowly to my attention and from many sources, most of which
relate in one way or another to specific passive sonar problems. One was two
dimensional power spectral estimation, especially k-ω beamforming for towed lin-
ear arrays. Another was the relationship between sonar beamforming and computed
tomography (especially the projection-slice theorem). Other problems were single
target tracking in clutter and the even harder problem of multiple target tracking. A
final problem was target detection by a distributed field of sonobuoys. The pattern
of the book is clearly visible in this progression of topics.
The book relates what I have learned about PPPs in a way I hope others will find
worthwhile. It is a calibrated exposition of PPP ideas, methods, and applications

vii
viii Preface

that would have facilitated my own understanding, had I but known the material at
the outset.
I am indebted to many individuals and institutions for their help in writing this
book. I thank Prof. Don Tufts (University of Rhode Island) for the witty, but appro-
priate, phrase “the alternative tradition in signal processing.” This phrase captures
to my mind the novelty I feel about the subject of PPPs. I thank Dr. Wolfgang Koch
(Fraunhofer-FKIE/University of Bonn) for cluing me in to the splendid aphorism
that begins this book. It is a great moral support against the ubiquity of the advocates
of simulation. I thank Dr. Dale Blair (Georgia Tech Research Institute, GTRI) for
suggesting a tutorial as a way to socialize PPPs. That the project eventually became
a book is my own fault, not his.
I thank Dr. Keith Davidson (Office of Naval Research) for supporting the research
that became the basis Chapter 6 on multitarget intensity tracking. I thank Metron,
Inc., for providing a nurturing mathematical environment that encouraged me to
explore seriously the various applications of PPPs. Such working environments are
the result of sustaining leadership and management over many years.
I thank Dr. Lawrence Stone, one of the founders of Metron, for many helpful
comments on early drafts of several chapters. These resulted in improvements of
content and clarity. I thank Dr. James Ferry (Metron) for helpful discussions over
many months. I have learned much from him. I thank Dr. Grant Boquet (Metron)
for his insight into wedge products, and for helping me to learn and use LATEX. His
patience is remarkable. I also thank Dr. Lance Kaplan (US Army Research Labo-
ratory), Dr. Marcus Graham (US Naval Undersea Warfare Center), and Dr. Frank
Ehlers (NATO Undersea Research Centre, NURC) for their encouragement and
helpful comments on early drafts of the tutorial that started it all.
I thank my wife Nancy, our family Adam, Kristen, Andrew, and Katherine, and
our ever-hopeful four-legged companions, Sam and Eddie, for their steadfast love
and support. They are first to me, now and always.

Reston, Virginia Roy L. Streit

February 14, 2010
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Chapter Parade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Part I: Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Part II: Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Part III: Beyond the Poisson Point Process . . . . . . . . . . . . . . . 5
1.1.4 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 The Real Line Is Not Enough . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 General Point Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 An Alternative Tradition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Part I Fundamentals

2 The Poisson Point Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1 The Event Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Intensity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Realizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Likelihood Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5.2 Random Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6 Campbell’s Theorem2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6.1 Characterization of PPPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6.2 Probability Generating Functional . . . . . . . . . . . . . . . . . . . . . . 27
2.7 Superposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.8 Independent (Bernoulli) Thinning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.9 Declarations of Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.9.1 Independent Scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.9.2 Poisson’s Gambit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.9.3 Inevitability of the Poisson Distribution . . . . . . . . . . . . . . . . . 38
2.9.4 Connection to Stochastic Processes . . . . . . . . . . . . . . . . . . . . . 41
2.10 Nonlinear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.11 Stochastic Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

ix
x Contents

2.11.1 Transition Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.11.2 Measurement Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.12 PPPs on Other Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.12.1 Discrete Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.12.2 Discrete-Continuous Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3 Intensity Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.1 Maximum Likelihood Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.1.1 Necessary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.1.2 Gaussian Crosshairs and Edge Effects . . . . . . . . . . . . . . . . . . 60
3.2 Superposed Intensities with Sample Data . . . . . . . . . . . . . . . . . . . . . . . 63
3.2.1 EM Method with Sample Data . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2.2 Interpreting the Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.2.3 Simple Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.2.4 Affine Gaussian Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.3 Superposed Intensities with Histogram Data . . . . . . . . . . . . . . . . . . . . . 73
3.3.1 EM Method with Histogram Data . . . . . . . . . . . . . . . . . . . . . . 73
3.3.2 Affine Gaussian Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.4 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.4.1 Parametric Tying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.4.2 Bayesian Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4 Cramér-Rao Bound (CRB) for Intensity Estimates . . . . . . . . . . . . . . . . . 81

4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.1.1 Unbiased Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.1.2 Fisher Information Matrix and the Score Vector . . . . . . . . . . 83
4.1.3 CRB and the Cauchy-Schwarz Inequality . . . . . . . . . . . . . . . 84
4.1.4 Spinoffs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.2 CRB for PPP Intensity with Sample Data . . . . . . . . . . . . . . . . . . . . . . . 88
4.3 CRB for PPP Intensity with Histogram Data . . . . . . . . . . . . . . . . . . . . 90
4.4 CRB for PPP Intensity on Discrete Spaces . . . . . . . . . . . . . . . . . . . . . . 93
4.5 Gating: Gauss on a Pedestal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.6 Joint CRB for Gaussian Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.6.1 Mean Vectors in a Gaussian Sum . . . . . . . . . . . . . . . . . . . . . . 98
4.6.2 Means and Coefficients in a Gaussian Sum . . . . . . . . . . . . . . 99
4.7 Observed Information Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.7.1 General Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.7.2 Affine Gaussian Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Part II Applications to Imaging, Tracking, and Distributed Sensing

5 Tomographic Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.1 Positron Emission Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Contents xi

5.2 PET: Time-of-Flight Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

5.2.1 Image Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.2.2 Small Cell Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.2.3 Intuitive Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.3 PET: Histogram Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.3.1 Detectors as a Discrete Space . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.3.2 Shepp-Vardi Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.4 Single-Photon Computed Emission Tomography (SPECT) . . . . . . . . . 124
5.4.1 Gamma Cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.4.2 Image Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.5 Transmission Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.5.2 Lange-Carson Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.6 CRBs for Emission and Transmission Tomography . . . . . . . . . . . . . . . 142
5.7 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.7.1 Grenander’s Method of Sieves . . . . . . . . . . . . . . . . . . . . . . . . . 143

6 Multiple Target Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

6.1 Intensity Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.1.1 PPP Model Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.1.2 Predicted Target and Measurement Processes . . . . . . . . . . . . 150
6.1.3 Information Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.1.4 The Final Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.2 Relationship to Other Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6.2.1 Probability Hypothesis Density (PHD) Filter . . . . . . . . . . . . . 159
6.2.2 Marked Multisensor Intensity Filter (MMIF) . . . . . . . . . . . . . 160
6.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
6.3.1 Particle Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
6.3.2 Mean Shift Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
6.3.3 Multimode Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
6.3.4 Covariance Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
6.3.5 Gaussian Sum Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
6.3.6 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.4 Estimated Target Count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.4.1 Sources of Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.4.2 Variance Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.5 Multiple Sensor Intensity Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.5.1 Identical Coverage Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.5.2 Heterogeneous Sensor Coverages . . . . . . . . . . . . . . . . . . . . . . 176
6.6 Historical Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

7 Distributed Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

7.1 Distance Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
7.1.1 From Sensors To Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
xii Contents

7.1.2 Between Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

7.2 Communication Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
7.3 Detection Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
7.3.1 Stationary Sensor Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
7.3.2 Drifting Fields and Anisotropy . . . . . . . . . . . . . . . . . . . . . . . . 195
7.4 Stereology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

Part III Beyond the Poisson Point Process

8 A Profusion of Point Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

8.1 Marked Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
8.1.1 Product Space and Marking Theorem . . . . . . . . . . . . . . . . . . . 205
8.1.2 Filtered Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
8.1.3 FIM for Unbiased Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . 207
8.2 Hard Core Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
8.3 Cluster Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
8.3.1 Poisson Cluster Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
8.3.2 Neyman-Scott Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
8.4 Cox (Doubly Stochastic) Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
8.4.1 Equivalent Neyman-Scott Process . . . . . . . . . . . . . . . . . . . . . . 214
8.4.2 Intensity Function as Solution of an SDE . . . . . . . . . . . . . . . . 215
8.4.3 Markov Modulated Poisson Processes . . . . . . . . . . . . . . . . . . 216
8.5 Gibbs Point Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

9 The Cutting Room Floor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

9.1 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
9.2 Possible Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

A Expectation-Maximization (EM) Method . . . . . . . . . . . . . . . . . . . . . . . . . 223

A.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
A.1.1 E-step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
A.1.2 M-step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
A.1.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
A.2 Iterative Majorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
A.3 Observed Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

B Solving Conditional Mean Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

C Bayesian Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

C.1 General Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
C.2 Special Case: Kalman Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
C.2.1 Multitarget Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Contents xiii

D Bayesian Derivation of Intensity Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

D.1 Posterior Point Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
D.2 PPP Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
D.2.1 Altogether Now . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
D.3 First Moment Intensity and Janossy Densities . . . . . . . . . . . . . . . . . . . 243

E MMIF: Marked Multitarget Intensity Filter . . . . . . . . . . . . . . . . . . . . . . . 245

E.1 Target Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
E.2 Joint Measurement-Target Intensity Function . . . . . . . . . . . . . . . . . . . . 246
E.2.1 Likelihood Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
E.3 MMIF Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

F Linear Filter Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

F.1 PPP Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
F.2 Poisson Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
F.2.1 Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
Chapter 1
Introduction

Nothing is as practical as a good theory.1

James Clerk Maxwell

Abstract The purpose of the book is to provide an accessible discussion of

multidimensional nonhomogeneous Poisson point processes. While often over-
looked in the literature, new applications are bringing them to greater prominence.
One chapter is devoted to developing their basic properties in a constructive manner.
The approach is the reverse of the usual abstract approach, and it greatly facili-
tates understanding of PPPs. Two chapters discuss intensity estimation algorithms,
with special attention to Gaussian sums, and the Cramér-Rao bound on unbiased
estimation error. Three chapters are devoted to applications from medical imag-
ing, multitarget tracking, and distributed network sensing. A final chapter discusses
non-Poisson point processes for modeling spatial correlation between the points of
a process.

Keywords Poisson point processes (PPPs) · Tomography · Positron emission

tomography (PET) · Single photon emission computed tomography (SPECT) ·
Multitarget tracking · Distributed sensor detection · Communication diversity ·
Maximum likelihood estimation · Cramer-Rao Bound · Marked PPP · Germ-grain
models · Binomial point process

Poisson point processes (PPPs) are very useful theoretical models for diverse appli-
cations involving the geometrical distribution of random occurrences of points in a
multidimensional space. Both the number of points and their locations are modeled
as random variables. Nonhomogeneous PPPs are designed specifically for applica-
tions in which spatial and/or temporal nonuniformity is important. Homogeneous
PPPs are idealized models useful only for applications involving spatial or temporal
uniformity.
The exposition shows that nonhomogeneous PPPs require little or no additional
conceptual and mathematical burden above that required by homogeneous PPPs.

1 Physics community folklore attributes this aphorism to Maxwell, but the available reference [68]
is to Kurt Lewin (1890-1947), a pioneer of social, organizational, and applied psychology.

R.L. Streit, Poisson Point Processes, DOI 10.1007/978-1-4419-6923-1_1, 1

C Springer Science+Business Media, LLC 2010
2 1 Introduction

The discussion is readily accessible to a broad audience—constructive mathematical

tools are used throughout. Abstractions that contribute to mathematical rigor but not
to insight and understanding are left to the references.
PPPs are highly flexible models with a growing number of new applications.
Interesting modern applications typically involve both nonhomogeneous and mul-
tidimensional PPPs. In some applications, PPPs are extremely well matched to
the physics and the engineering system. Tomography, especially positron emission
tomography, is an excellent example of this kind. In other applications, PPP approx-
imations capture the most important aspects of a problem and manage to avoid or
circumvent crippling computational bottlenecks. Recent multitarget tracking appli-
cations are wonderful examples of this kind. In still other applications, PPPs are used
primarily to gain insight into the behavior of very complex systems. Examples of
this kind include detection coverage and inter-sensor communication in distributed
sensor networks.
By presenting diverse applications in a common PPP framework, unanticipated
connections between them are exposed, and directions for new research are revealed.
Some of these connections are presented here for the first time. As engineers and
physicists know well, mathematics and its applications are deeply satisfying when
they work together.

1.1 Chapter Parade

The book divides into three distinct parts, together with several appendices. The
first part comprises the first four chapters. These chapters discuss several aspects of
PPPs, ranging from fundamental properties to inference and estimation. The PPPs
themselves, not their applications, are the focus of the discussion.
Applications come into their own in the second part. This part deals with applica-
tions in imaging, tracking, and sensing. Imaging means almost exclusively tomog-
raphy; tracking means multitarget and multisensor tracking; sensing means largely
detection coverage and communication diversity with distributed sensor networks.
The third part is an effort to balance the overall perspective by discussing point
processes that find use in applications but are not PPPs. Only a few of the many
interesting classes of point processes are mentioned.

1.1.1 Part I: Fundamentals

Chapter 1 introduces PPPs and the justification for the emphasis on multidimen-
sional processes. It discusses several ways to restrict the too general concept of a
random set so that it is useful in theory and application. This puts the PPP into
proper perspective relative to other point processes.
Chapter 2 discusses several of the useful and important properties of nonhomo-
geneous PPPs. The general PPP is defined via a two-step simulation procedure. The
1.1 Chapter Parade 3

important properties of PPPs follow from the simulation. This approach enables
those new to the subject to understand quickly what PPPs are about; however, it
is the reverse of the usual textbook approach in which the simulation procedure is
derived almost as an afterthought from a few “idealized” assumptions. The style
is informal throughout. The main properties of PPPs that are deemed most useful
to practitioners are discussed first. Many basic operations are applied to PPPs to
produce new point processes that are also PPPs. Several of the most important of
these operations are superposition, independent thinning, nonlinear mappings, and
stochastic transformations such as transition and measurement processes. Examples
are presented to assist understanding.
Chapter 3 discusses estimation problems for PPPs. The defining parameter of a
PPP is its intensity function, or intensity for short. When the intensity is known, the
PPP is fully characterized. In many applications the intensity is unknown and is esti-
mated from data. This chapter discusses the case when the form of intensity function
is specified in terms of a finite number of parameters. The estimation problem is to
determine appropriate values for these parameters from given measured data. Max-
imum likelihood (ML) and maximum a posteriori (MAP) methods are the primary
estimation methods explored in this book. ML algorithms for estimating an intensity
that is specified as a Gaussian sum are obtained by the method of Expectation-
Maximization (EM). Gaussian sums, when normalized to integrate to one, are called
Gaussian mixtures, and are widely used in applications to model probability density
functions (pdfs). Two different kinds of data are considered—PPP sample data that
comprise the points of a realization of a PPP, and histogram data that comprise only
the numbers of points that fall into a specified grid of histogram cells.
Chapter 4 explores the quality of estimators of intensity, where quality is quan-
tified in terms of the Cramér-Rao Bound (CRB). The CRB is a lower bound on the
variance of any unbiased estimator, and it is determined directly from the mathemat-
ical form of the likelihood function of the data. It is a remarkable fact that the CRB
for general PPP intensity estimation takes a simple form. The CRB of the Gaussian
sum intensity model in Chapter 3 is given in this chapter. The CRB is presented for
PPP sample data, sometimes called “count record” data as well as histogram data.
Sample data constitute a realization of the PPP, so the points are i.i.d. (independent
and identically distributed).

1.1.2 Part II: Applications

Chapter 5 discusses both emission and transmission tomography. Positron emission

tomography (PET) is presented first because of the excellent match between PET
and the PPP model. The maximum likelihood algorithm for PET is derived. The
original algorithm [110] dates to 1982, and there are now many variations of it,
but all are called Shepp-Vardi algorithms. It is justly famous, not least because it
is (said to be) the basis of all currently fielded PET medical imaging systems. It
was independently discovered in image deconvolution some years earlier, first by
4 1 Introduction

Richardson in 1972 [102] and then again by Lucy in 1974 [72]. In these applications
it is known as the Richardson-Lucy algorithm.
SPECT (single photon emission computed tomography) is much more com-
monly used diagnostically than PET. The reconstructed image is estimated from
multiple snapshots made by a movable gamma camera. A reconstruction algorithm
for SPECT based on EM was derived by Miller, Snyder, and Miller [82] in 1985.
Loosely speaking, the algorithm averages several Shepp-Vardi algorithms, one for
each gamma camera snapshot; that is, it is a multi-snapshot average. It is presented
in Section 5.4.
Transmission tomography (called computed tomography (CT)) is discussed in
Section 5.5. A reconstruction algorithm based on EM was derived by Lange and
Carson [65] in 1984. While based on EM, its detailed structure differs significantly
from that of Shepp-Vardi algorithms. CRBs for PET and CT are presented in Sec-
tion 5.6.
Chapter 6 presents multitarget tracking applications of PPPs. The multitarget
state is modeled as a PPP. The Bayesian posterior point process is not a PPP, but
it is approximated by a PPP. It is called an intensity filter because it recursively
updates the intensity of the PPP approximation. An augmented state space enables
“on line” estimates of the clutter and target birth PPPs to be produced as intrinsic
parts of the filter. The information update of the intensity filter is seen to be identical
to the first step of the Shepp-Vardi algorithm for PET discussed in Chapter 5.
The PHD (Probability Hypothesis Density) filter is obtained from the intensity
filter by modifying the posterior PPP intensity with a priori knowledge of the clut-
ter and target birth PPPs. The PHD filter was first derived by other methods in
the multitarget application by Mahler in a series of papers beginning about 1994.
For details, see [76] and the papers referenced therein. The relationship between
Mahler’s method and the approach taken here is discussed.
The multisensor intensity filter is also presented in Chapter 6. The PPP multiple
target model is the same as in the single sensor case. The sensors are assumed condi-
tionally independent. The resulting intensity filter is essentially the same as the first
step of the Miller-Snyder-Miller algorithm for SPECT given in Chapter 5. Sensor
data are, by analogy, equivalent to the gamma camera snapshots.
Chapter 7 discusses distributed networked sensors using an approach based
on ideas from stochastic geometry, one of the classic roots of PPPs. These
results typically provide ensemble system performance estimates rather than a
performance estimate for a given sensor configuration. The contrast between
point-to-event and event-to-event properties is highlighted, and Slivnyak’s The-
orem is discussed as a method that relates these two concepts. Thresh-
old effects are dramatic and provide significant insights. For example, recent
results from geometric random graph theory show that—with very high
probability—randomly distributed networks very abruptly achieve communication
diversity as the sensor communication range increases. This result guarantees that
the overwhelming majority of random sensor distributions achieve (do not achieve)
communication diversity if the sensor communication range R is larger (smaller)
than some threshold, say Rthresh .
1.1 Chapter Parade 5

1.1.3 Part III: Beyond the Poisson Point Process

Chapter 8 reviews several interesting point processes that are not PPPs. Non-
Poisson point processes are needed for high fidelity, realistic models in many differ-
ent applications, including signal processing. Perhaps the most widely used are the
marked PPPs in which a “mark” is associated with each point of a PPP realization.
Other models provide various kinds of spatial correlation between the points of a
process. They are needed for problems in which points are not distributed inde-
pendently. An example is the set of points that are the centers of nonoverlapping
hard spheres. Many useful processes obtained from PPPs by various devices are
described in this chapter. Other point processes are at best only distantly related to
PPPs. A prominent example of this kind are Gibbs processes. They model spatial
correlation via a specified “energy” function defined on the points of the process.
Chapter 9 provides a brief sketch about directions for further work in appli-
cations and methods. Imaging technologies that overcome the Rayleigh resolution
limit will probably provide a wealth of new applications of nonhomogeneous and
multidimensional PPPs. New methods in statistical thinking, for example, Markov
Chain Monte Carlo (MCMC) [24], will probably make non-Poisson processes even
more widely used in applications than they currently are.

1.1.4 Appendices

Several appendices contain material that would interfere with the flow of the expo-
sition in one way or another. Others contain material that supplements the text.

• Appendix A briefly presents the Expectation-Maximization algorithm. The itera-

tive majorization method is used to provide a strictly numerical, non-probabilistic
insight into EM.
• Appendix B discusses the numerical solution of the conditional mean equations.
These equations arise in a variety of maximum likelihood estimation problems.
• Appendix C gives an overview of general Bayesian filtering. The linear-Gaussian
Kalman filter is given as a special case.
• Appendix D gives an alternative Bayesian derivation of the intensity filter that
stresses symmetry, marginalization, and the “mean field” approximation. It may
also be helpful to readers who seek to read the original literature.
• Appendix E derives the conditional cluster intensity filter for multitarget tracking
via the EM method. It is a parametric filter based on superposition of conditional
Gaussian single target PPP intensities.
• Appendix F presents a novel marked PPP model of the power spectrum of sta-
tionary time series. The model lends itself to EM estimation methods.
6 1 Introduction

1.2 The Real Line Is Not Enough

Applications of interest in imaging, target tracking, and distributed sensing are typi-
cally of dimension greater than one. Consequently, with few exceptions, only prop-
erties of PPPs that hold in any finite dimension are discussed. Excluding PPPs on
the real line is a loss since their applications are widespread and highly successful
in telephony, beginning with Erlang (c. 1900) and Palm (1943), renewal theory, and
queuing networks. Many of their properties depend crucially on the linear ordering
of points on the real line and, therefore, do not extend to higher dimensional spaces.
One dimensional PPPs are well and widely discussed in the literature.
The multidimensional perspective provides a purely technical reason to neglect
one dimensional PPPs: all nonhomogeneous PPPs on the real line can be trans-
formed into a homogeneous PPP (see the discussion of Section 2.10 below). The
required transformation is the inverse of a nonlinear mapping and is of independent
interest; however, the inverse function does not exist in dimensions larger than one.
Thus, much of the innate distinctiveness of nonhomogeneous PPPs is lost on the
real line. They come into their own in higher dimensional spaces.

1.3 General Point Processes

The notion of a random set is too general to be particularly useful, and it is typically
restricted in some fashion. There are many ways to do this. In stochastic geometry, a
random set is often a “randomly” chosen member of a class of sets deemed interest-
ing. The class of all triangles in the plane is an example of such a class. A traditional
question in this case is “What is the probability that a random triangle is acute?” A
harder question is “What is the probability that a random quadrilateral is convex?”
A more elaborate example is the class of unions of finite numbers of discs in
the plane. In this case a random set is the union of a finite number of closed discs
where the number of discs, as well as their centers and radii, are selected according
to specified random procedures. This kind of model is often called a Boolean model,
or sometimes a germ-grain model [16, Section 9.1.3], [123, Chapter 3]. The germs
are the centers of the discs and the grains (also called primary grains) are the discs
themselves.
Germs and grains can take very general forms—the grains need not even be
connected sets. The germs are typically chosen to be the points of a special kind
of finite point process called a PPP, which is described below. A classic question
for Boolean models is “What is the probability that a given point is covered by
(contained in) at least one disc in the random set?” A closely related question is
“What is the probability that a given point is covered by exactly k ≥ 1 discs in the
random set?” Sets generated by a Boolean model are a special kind of much more
general class of sets called RACS (random closed sets) [79].
Random sets that belong to the class of sets containing only points of the state
space S gives rise to point processes. A point process is thus a random variable
1.3 General Point Processes 7

whose realizations are sets in this class. For most applications point processes are
defined on Rm , m ≥ 1.
A finite point process is a random variable whose realizations in any bounded
subset R of S are sets containing only a finite number of points of R. The number of
points and their locations can be chosen in many ways. An important subclass com-
prises finite point processes with independently and identically distributed (i.i.d.)
points.
There are many members of the class i.i.d. finite point processes, at least two of
which are named processes. One is the binomial point process (BPP). The BPP is a
finite point process in which the number of points n is binomially distributed on the
integers {0, 1, . . . , K }, where K ≥ 0 is an integer parameter. The points of the BPP
are located according to a spatial random variable X on S with probability density
function (pdf) p X (x). Explicitly, for any bounded subset R ⊂ S, the probability of
n points occurring in R is

K
Pr[n] = ( pR )n (1 − pR ) K − n , (1.1)
n

where the probability pR is given by

pR = p X (x) dx . (1.2)
R
!
The binomial coefficient is Kn = n!(KK−n)! . The points {x1 , . . . , xn } of the BPP
are i.i.d. samples from the pdf p X (x). The number of points and the pdf of the point
locations are closely coupled via the parameter pR . Using well known facts about
the binomial distribution shows that the mean, or expected, number of points in R
is pR K and the variance is pR (1 − pR ) K . The number of points in any set R
cannot exceed K ; therefore, even though the points are i.i.d., there is a negative
correlation in the distribution of the numbers of points in disjoint sets. Loosely, for
R1 ∩ R2 = ∅, if more points than expected are in R1 , the fewer there are in R2 .
The other named i.i.d. finite point process is the Poisson point process (PPP). As
discussed at length in the next chapter, the number of points is Poisson distributed
on the nonnegative integers {0, 1, 2, . . .}. The defining parameter of the Poisson
distribution on a subset R of S is closely coupled to a spatial random variable
X | R restricted to (conditioned on) R. In this instance, however, the pdf of X | R is
proportional to a nonnegative function called the intensity function, and the propor-
tionality constant is the expected number of points in a realization. PPPs are a nar-
rowly specialized class of i.i.d. finite point processes. The property that makes them
uniquely important is that PPPs on disjoint subsets of S are statistically independent.
This property is called independent scattering (although it is also sometimes called
independent increments). In this terminology, BPPs are not independent scattering
processes.
The points in realizations of PPPs with an intensity function defined on a contin-
uous space are distinct with probability one. The points of a PPP realization are in
8 1 Introduction

this case often referred to as random finite sets. However, PPPs are also sometimes
defined on discrete spaces and discrete-continuous spaces (see Section 2.12). In
such cases, the discrete points of the PPP can be repeated with nonzero probability.
Sets do not, by definition, have repeated elements, so it is more accurate to speak of
the points in a PPP realization as a random finite list, or multiset, as such lists are
sometimes called. To avoid making too much of these subtleties, random finite sets
and lists are both referred to simply as PPP realizations.

1.4 An Alternative Tradition

PPPs constitute an alternative tradition to that of the more widely known stochastic
processes, with which they are sometimes confused. A stochastic process is a family
X (t) of random variables indexed by a parameter t that is usually, but not always,
thought of as time. The mathematics of stochastic processes, especially Gaussian
stochastic processes, is widely known and understood. They are pervasive, with
applications ranging from physics and engineering to finance and biology.
Poisson stochastic processes (again, not to be confused with PPPs) provided one
of the first models of white Gaussian noise. Specifically, in vacuum tubes, electrons
are emitted by a heated cathode and travel to the anode, where they contribute to
the anode current. The anode current is modeled as shot noise, and the fluctuations
of the current about the average are approximately a white Gaussian noise process
when the electron arrival rate is high [11, 101]. The emission times of the electrons
constitute a one dimensional PPP. Said another way, the occurrence times of the
jump discontinuities in the anode current are a realization of a PPP.
Wiener processes are also known as Brownian motion processes. The sample
paths are continuous, but nowhere differentiable. The process is sometimes thought
of more intuitively as integrated white Gaussian noise. Armed with this intuition it
may not be too surprising to the reader that nonhomogeneous PPPs approximate the
level crossings of the correlation function of Gaussian processes [48] and sequential
probability ratio tests.
The concept of independent increments is important for both point processes
and stochastic processes; however, the concept is not exactly the same for both. It is
therefore more appropriate to speak of independent scattering in point processes and
of independent increments for stochastic processes. As is seen in Chapter 2, every
PPP is an independent scattering process. In contrast, every independent increments
stochastic process is a linear combination of a Wiener process and a Poisson process
[119]. Further discussion is given in Section 2.9.1. The mathematics of PPPs is not
as widely known as that of stochastic processes.
PPPs have many properties such as linear superposition and invariance under
nonlinear transformation that are useful in various ways in many applications. These
fundamental properties are presented in the next chapter.
Part I
Fundamentals
Chapter 2
The Poisson Point Process

Make things as simple as possible, but not simpler.1

Albert Einstein (paraphrased),
On the Method of Theoretical Physics, 1934

Abstract Properties of multidimensional Poisson point processes (PPPs) are dis-

cussed using a constructive approach readily accessible to a broad audience. The
processes are defined in terms of a two-step simulation procedure, and their fun-
damental properties are derived from the simulation. This reverses the traditional
exposition, but it enables those new to the subject to understand quickly what PPPs
are about, and to see that general nonhomogeneous processes are little more con-
ceptually difficult than homogeneous processes. After reviewing the basic concepts
on continuous spaces, several important and useful operations that map PPPs into
other PPPs are discussed—these include superposition, thinning, nonlinear trans-
formation, and stochastic transformation. Following these topics is an amusingly
provocative demonstration that PPPs are “inevitable.” The chapter closes with a
discussion of PPPs whose points lie in discrete spaces and in discrete-continuous
spaces. In contrast to PPPs on continuous spaces, realizations of PPPs in these
spaces often sample the discrete points repeatedly. This is important in applications
such as multitarget tracking.

Keywords Event space · Intensity function · Orderly PPP · Realizations ·

Likelihood functions · Expectations · Random sums · Campbell’s Theorem · Char-
acteristic functions · Superposition · Independent thinning · Independent scattering
· Poisson gambit · Nonlinear transformations · Stochastic transformations · PPPs on
discrete spaces · PPPs on discrete-continuous spaces

Readers new to PPPs are urged to read the first four subsections below in order.
After that, they are free to move about the chapter as their fancy dictates. There is
a lot of information here. It cannot be otherwise for there are many wonderful and
useful properties of PPPs.

1 What he really said [27]: “It can scarcely be denied that the supreme goal of all theory is to make
the irreducible basic elements as simple and as few as possible without having to surrender the
adequate representation of a single datum of experience.”

R.L. Streit, Poisson Point Processes, DOI 10.1007/978-1-4419-6923-1_2, 11

C Springer Science+Business Media, LLC 2010
12 2 The Poisson Point Process

The emphasis throughout the chapter is on the PPP itself, although applica-
tions are alluded to in several places. The event space of PPPs and other finite
point processes is described in Section 2.1. The concept of intensity is discussed
in Section 2.2. The important concept of orderliness is also defined. PPPs that are
orderly are discussed in Sections 2.3 through 2.11. PPPs that are not orderly are
discussed in the last section, which is largely devoted to PPPs on discrete and
discrete-continuous spaces.

2.1 The Event Space

The points of a PPP occur in the state space S. This space is usually the Euclidean
space, S = Rm , m ≥ 1, or some subset thereof. Discrete and discrete-continuous
spaces S are discussed in Section 2.12. PPPs can be defined on even more abstract
spaces, but this kind of generality is not needed for the applications discussed in this
book.
Realizations of PPPs on a subset R of S comprises the number n ≥ 0 and the
locations x1 , . . . , xn of the points in R. The realization is denoted by the ordered
pair

ξ = (n, {x1 , . . . , xn }) .

The set notation signifies only that the ordering of the points x j is irrelevant, but
not that the points are necessarily distinct. It is better to think of {x1 , . . . , xn } as an
unordered list. Such lists are sometimes called multisets. Context will always make
clear the intended usage, so for simplicity of language, the term set is used here and
throughout the book.
It is standard notation to include n explicitly in ξ even though n is determined
by the size of the set {x1 , . . . , xn }. There are many technical reasons to do so; for
instance, including n makes expectations easier to define and manipulate.
If n = 0, then ξ is the trivial event (0, ∅), where ∅ denotes the empty set. The
event space is the collection of all possible finite subsets of R:

E(R) = {(0, ∅)} ∪∞
n=1 (n, {x 1 , . . . , x n }) : x j ∈ R,
j = 1, . . . , n .
(2.1)
The event space is clearly very much larger in some sense than the space S in which
the individual points reside.

2.2 Intensity
Every PPP is parameterized by a quantity called the intensity. Intensity is an intuitive
concept, but it takes different mathematical forms depending largely on whether the
state space S is continuous, discrete, or discrete-continuous. The continuous case
2.3 Realizations 13

is discussed in this section. Discussion of PPPs on discrete and discrete-continuous

spaces S is postponed to the last section of the chapter.
A PPP on a continuous space S ⊂ Rn is orderly if the intensity is a nonnegative
function λ(s) ≥ 0 for all s ∈ S. If λ(s) ≡ α for some constant α ≥ 0, the PPP is
said to be homogeneous; otherwise, it is nonhomogeneous. It is assumed that

0 ≤ λ(s) ds < ∞ (2.2)
R

for all bounded subsets R of S, i.e., subsets contained in some finite radius m-
dimensional sphere. The sets R include—provided they are bounded—convex sets,
sets with “holes” and internal voids, disconnected sets such as the union of disjoint
spheres, and sets that are interwoven like chain links.
The intensity function λ(s) need not be continuous, e.g., it can have step dis-
continuities. The only requirement on λ(s) is the finiteness of the integral (2.2).
The special case of homogeneous PPPs on S = Rm with R = S shows that the
inequality (2.2) does not imply that S λ(s) ds < ∞. Finally, in physical problems,
the integral (2.2) is a dimensionless number, so λ(s) has units of number per unit
volume of Rm .
The intensity for general PPPs on the continuous space S takes the form

λ D (s) = λ(s) + w j δ(s − a j ) , s ∈ S, (2.3)

j=1

where δ( · ) is the Dirac delta function and, for all j, the weights w j are nonnegative
and the points a j ∈ S are distinct: ai = a j for i = j. The intensity λ D (s) is not a
function in the strict meaning of the term, but a “generalized” function. It is seen in
the next section that the PPP corresponding to the intensity λ D (s) is orderly if and
only if w j = 0 for all j; equivalently, a PPP is orderly if and only if the intensity
λ D (s) is a function, not a generalized function.
The concept of orderliness can be generalized so that finite point processes other
than PPPs can also be described as orderly. There are several nonequivalent defini-
tions of the general concept, as discussed in [118]; however, these variations are not
used here.

2.3 Realizations
The discussion in this section and through to Section 2.11 is implicitly restricted to
orderly PPPs, that is, to PPPs with a well defined intensity function on a continuous
space S ⊂ Rm . Realizations and other properties of PPPs on discrete and discrete-
continuous spaces are discussed in Section 2.12.
Realizations are conceptually straightforward to simulate for bounded subsets
of continuous spaces S ⊂ Rm . Bounded subsets are “windows” in which PPP
14 2 The Poisson Point Process

realizations are observed. Stipulating a window avoids issues with infinite sets; for
example, realizations of homogeneous PPPs on S = Rm have an infinite number
of points but only a finite number in any bounded window.
Every realization of a PPP on a bounded set R is an element of the event space
E(R). The realization ξ therefore comprises the number n ≥ 0 and the locations
{x1 , . . . , xn } of the points in R.
A two-step procedure, one step discrete and the other continuous, generates (or,
simulates) one realization ξ ∈ E(R) of a nonhomogeneous PPP with intensity λ(s)
on a bounded subset R of S. The procedure also fully reveals the basic statistical
structure of the PPP. If R λ(s) ds = 0, ξ is the trivial event. If R λ(s) ds > 0,
the realization is obtained as follows:

Step 1. The number n ≥ 0 of points is determined by sampling the discrete

Poisson random variable, denoted by N , with probability mass function given
by
n
λ(s) ds
R
p N (n) ≡ e− R λ(s) ds
. (2.4)
n!

If n = 0, the realization is ξ = (0, ∅), and Step 2 is not performed.

Step 2. The n points x j ∈ R, j = 1, . . . , n, are obtained as independent

and identically distributed (i.i.d.) samples of a random variable X on R with
probability density function (pdf) given by

λ(s)
p X (s) = for s ∈ R . (2.5)
R λ(s) ds

The output is the ordered pair ξo = (n, (x1 , . . . , xn )). Replacing the ordered
n-tuple (x1 , . . . , xn ) with the set {x1 , . . . , xn } gives the PPP realization ξ =
(n, {x1 , . . . , xn }).
The careful distinction between ξo and ξ is made to avoid annoying, and some-
times confusing, problems later when order is important. For example, it is seen in
Section 2.4 that the pdfs (probability density functions) of ξo and ξ differ by a factor
of n! . Also, the points {x1 , . . . , xn } are i.i.d. when conditioned on the number n of
points. The conditioning on n is implicit in the statement of Step 2.
For continuous spaces S ⊂ Rm , an immediate consequence of Step 2 is that the
points {x1 , . . . , xn } are distinct with probability one: repeated elements are allowed
in theory, but in practice they never occur (with probability one). Another way to
say this is that the list, or multiset, {x1 , . . . , xn } is a set with probability one. The
statement fails to hold when the PPP is not orderly, that is, when the intensity (2.3)
has one or more Dirac delta function components. It also does not hold when the
state space S is discrete or discrete-continuous (see Section 2.12).
An acceptance-rejection procedure (see, e.g., [56]) is used to generate the i.i.d.
samples of (2.5). Let
2.3 Realizations 15

p X (s)
α = max , (2.6)
s∈R g(s)

where g(s) > 0 is any bounded pdf on R from which i.i.d. samples of R can
be generated via a known procedure. The function g( · ) is called the importance
function. For each point x with pdf g, compute t = p X (x)/(α g(x)). Next, generate
a uniform variate u on [0, 1] and compare u and t: if u > t, reject x; if u ≤ t, accept
it. The accepted samples are distributed as p X (x).
The acceptance-rejection procedure is inefficient for some problems, that is, large
numbers of i.i.d. samples from the pdf (2.5) may be drawn before finally accepting
n samples. As is well known, efficiency depends heavily on the choice of the impor-
tance function g( · ). Table 2.1 outlines the overall procedure and indicates how
the inefficiency can occur. If inefficiency is a concern, other numerical procedures
may be preferred in practice. Also, evaluating R λ(s) ds may require care in some
problems.

Table 2.1 Realization of a PPP with intensity λ(s) on bounded set R

• Preliminaries
– Select importance function g(s) > 0, s ∈ R
– Set efficiency scale ce f f = 1000 (corresponds to a 0.1% acceptance rate)
• Step 1.
– Compute μ = R λ(s) ds
– Compute
α = max μλ(s)
g(s)
s∈R

– Draw random integer n ∈ {0, 1, 2, . . . } from Poisson distribution with

parameter μ
Pr[n] = e−μ μn!
n

– IF n = 0, STOP
• Step 2.
– FOR j = 1 : ce f f n
• Draw random sample x with pdf g
• Compute
t = α μλ(s)
g(s)

• Draw random sample u with pdf Uniform[0, 1]

• ACCEPT x, if u ≤ t
• REJECT x, otherwise
• Stop when n points are accepted
– END FOR
• If number of accepted samples is smaller than n after computing ce f f n draws
from g, then find a better importance function or make ce f f larger and accept the
inefficiency.

Example 2.1 The two-step procedure is used to generate i.i.d. samples from a PPP
whose intensity function is nontrivially structured. These samples also show the
difficulty of observing this structure in small sample sets. Denote the multivariate
Gaussian pdf on Rm with mean μ and covariance matrix Σ by
16 2 The Poisson Point Process

1 1
N (s ; μ, Σ) = √ exp − (s − μ)T Σ −1 (s − μ) . (2.7)
det (2π Σ) 2

Let Ξ be the bivariate PPP whose intensity function on the square R ≡

[−4 σ, 4 σ ]2 is, for s ≡ (x, y)T ∈ R,

a
λ(s) ≡ λ(x, y) = + b f (x, y), (2.8)
64 σ 2

where a = 20, b = 80, and

⎧
⎨ 0, if − 65 σ ≤ y ≤ − 45 σ
f (x, y) = x 0 10
⎩N ; , σ2 , otherwise.
y 0 01

a b

c d

Fig. 2.1 Realizations of the PDF (2.9) of the intensity function (2.8) for σ = 1, a = 20, and
b = 80. Samples are generated by the acceptance-rejection method. The prominent horizontal
notch in the intensity is hard to see from the samples alone
2.4 Likelihood Function 17

For σ = 1, numerical integration gives the mean intensity μ = R λ(x, y)
dx dy = 92.25 , approximately. A pseudo-random integer realization of the Poisson
discrete variable (2.4) is n = 90, so 90 i.i.d. samples of the pdf (cf. (2.5))

p X (s) ≡ p(x, y) = λ(x, y) / 92.25 (2.9)

are drawn via the acceptance-rejection procedure with g(x, y) = 1/(64 σ 2 ) . The
pdf (2.9) is shown as the 3-D plot in Fig. 2.1a and as a set of equispaced contours
in Fig. 2.1b, respectively Fig. 2.1c and 2.1d, show the 90 sample points with and
without reference to the intensity contours.
The horizontal “notch” is easily missed using these 90 samples in Fig. 2.1c. The
detailed structure of an intensity function can be estimated reliably only in special
circumstances, e.g., when a large number of realizations is available, or when the
PPP has a known parametric form (see Section 3.1).

2.4 Likelihood Function

The random variable Ξ with realizations in E(R) for every bounded subset R of
S is a PPP if its realizations are generated via the two-step procedure. Let pΞ (ξ )
denote the pdf of Ξ evaluated at Ξ = ξ . Let Ξ ≡ (N , X ), where N is the
number of points and X ≡ {x1 , . . . , x N } is the point set. Let the realization be
ξ = (n, {x1 , . . . , xn }). From the definition of conditioning,

pΞ (ξ ) = p N (n) pX |N ({x1 , . . . , xn } | n) , (2.10)

where p N (n) is the unconditional probability mass function of N given by (2.4).

The conditional pdf of X | N is

n
pX |N ( {x1 , . . . , xn } | n) = n! p X (x j ) , (2.11)
j=1

where X is the random variable corresponding to a single sample point whose pdf
is (2.5). The n! in (2.11) arises from the fact that there are n! equally likely ordered
i.i.d. trials that generate the unordered set X . Substituting (2.4) and (2.11) into (2.10)
gives the pdf of Ξ evaluated at ξ = (n, {x1 , . . . , xn }) ∈ E(R):

pΞ (ξ ) = p N (n) pX |N ( {x1 , . . . , xn } | n)
n
n
R λ(s) ds
λ(x j )
= exp − λ(s) ds n!
R n! λ(s) ds
j=1 R
n
= exp − λ(s) ds λ(x j ) , for n ≥ 1. (2.12)
R j=1
18 2 The Poisson Point Process

The likelihood of the trivial event

is the special case n = 0 of (2.4), so that
pΞ (ξ = (0, ∅)) = exp(− R λ(s) ds). The pdf of Ξ is parameterized by the
intensity function λ(s). Any positive scalar multiple of the pdf is “the” likelihood
function of Ξ .
The expression (2.12) is used in estimation problems involving measured data
sets for which data order is irrelevant. For ordered data, from (2.11),

1
pX |N (x1 , . . . , xn | n) ≡ pX |N ( {x1 , . . . , xn } | n) (2.13)
n!
n
= p X (x j ) (2.14)
j=1
n
λ(x j )
= . (2.15)
j=1 R λ(s) ds

Let ξo = (n, (x1 , . . . , xn )). Using (2.15) and the definition of conditioning gives

pΞ (ξo ) = p N (n) pX |N (x1 , . . . , xn | n) (2.16)

n
1
= exp − λ(s) ds λ(x j ) , for n ≥ 1. (2.17)
n! R j=1

This notation interprets arguments in the usual way, so it is easier to understand and
manipulate than (2.12). For example, the discrete pdf p N (n) of (2.4) is merely the
integral of (2.17) over x1 , . . . , xn , but taking the same integral of (2.12) requires an
additional thought to restore the missing n!.
The argument ξo in (2.17) is written simply as ξ below. This usage may cause
some confusion, since then the left hand side of (2.17) becomes pΞ (ξ ), which is
the same as the first equation in (2.12), a quantity that differs from it by a factor of
n!. A similar ambiguity arises from using the same subscript X | N on both sides of
(2.13). Context makes the intended meaning clear, so these abuses of notation will
not cause confusion.
In practice, when the number of points in a realization is very large, the points of
a PPP realization are often replaced by a smaller data set. If the smaller data set also
reduces the information content, the likelihood function obtained in this section no
longer applies. An example of a smaller data set (called histogram count data) and
its likelihood function is given in Section 2.9.1.

2.5 Expectations
Expectations are decidedly more interesting for point processes than for ordinary
random variables. Expectations are taken of real valued functions F defined on the
event space E(R), where R is a bounded subset of S. Thus F(ξ ) evaluates to a real
2.5 Expectations 19

number for all ξ ∈ E(R). The expectation of F(ξ ) is written in the very general
form

E [F] ≡ E Ξ [F] = F(ξ ) pΞ (ξ ), (2.18)

ξ ∈E (R)

where the sum, properly defined, is matched to the likelihood function of the point
process. In the case of PPPs, the likelihood function is that of the two-step simu-
lation procedure. The sum is often referred to as an “ensemble average” over all
realizations of the point process.
The sum is daunting because of the huge size of the set E(R). Defining the
expectation carefully is the first and foremost task of this section. The second is
to show that for PPPs the expectation, though fearsome, can be evaluated explicitly
for many functions of considerable application interest.

2.5.1 Definition

Let ξ = (n, {x1 , . . . , xn }). For analytical use, it is convenient to rewrite the
function F(ξ ) = F (n, {x1 , . . . , xn }) in terms of a function that uses an easily
understood argument list, that is, let

F (n, {x1 , . . . , xn }) ≡ F(n, x1 , . . . , xn ). (2.19)

The function F inherits an important symmetry property from F. Let Sym(n)

denote the set of all permutations of the first n positive integers. For all permutations
σ ∈ Sym(n),

F(n, xσ (1) , . . . , xσ (n) ) = F n, {xσ (1) , . . . , xσ (n) }
= F (n, {x1 , . . . , xn })
= F(n, x1 , . . . , xn ). (2.20)

In words, F(n, x1 , . . . , xn ) is symmetric, or invariant, under permutations of its

location arguments.
Using ordered argument forms in the expectation (2.18) gives

E [F] = F(n, x1 , . . . , xn ) pΞ (n, x1 , . . . , xn ). (2.21)

(n, x1 , ..., xn )

The sum in (2.21) is an odd looking discrete-continuous sum that needs interpreta-
tion. The conditional factorization

pΞ (ξ ) = p N (n) pX |N (x1 , . . . , xn | n)
20 2 The Poisson Point Process

of the ordered realization ξ = (n, x1 , . . . , xn ) provides the key—make the sum

over n the outermost sum, and interpret “continuous sums” in a natural way as
integrals over sets of the form R×· · ·×R. This gives the expectation of the function
F as a nested pair of expectations. The first E N is over N , and the second E X |N is
over X |N . The expectation with respect to the point process Ξ is given by

E[F] ≡ E N E X |N [F] (2.22)
∞

≡ p N (n) ··· F(n, x1 , . . . , xn ) pX |N (x1 , . . . , xn |n) dx1 · · · dxn .
n=0 R R
(2.23)

The expectation is formidable, but it is not as bad as it looks. Its inherently straight-
forward structure is revealed by verifying that E[F] = 1 for F(n, x1 , . . . , xn ) ≡ 1.
The details of this trivial exercise are omitted.
The expectation of non-symmetric functions is undefined. The definition is
extended—formally—to general functions, say G(n, x1 , . . . , xn ), via its sym-
metrized version:

G Sym (n, x1 , . . . , xn ) = G(n, xσ (1) , . . . , xσ (n) ). (2.24)

n!
σ ∈Sym(n)

The expectation of G is defined by E[G] = E G Sym . This definition works
because G Sym is a symmetric function of its arguments, a fact that is straightfor-
ward to verify. The definition is clearly compatible with the definition for symmetric
functions since G Sym (n, x1 , . . . , xn ) ≡ G(n, x1 , . . . , xn ) if G is symmetric.
The expectation is defined by (2.23) for any finite point process with events in
E (R), not just PPPs. For PPPs and other i.i.d. finite point processes (such as BPPs),

n
pX |N (x1 , . . . , xn | n) = p X (x j ), (2.25)
j=1

so the expectation (2.23) is

∞

n
E[F] ≡ p N (n) ··· F(n, x1 , . . . , xn ) p X (x j ) dx1 · · · dxn .
n=0 R R j=1
(2.26)

PPPs are assumed throughout the remainder of this chapter, so the discrete proba-
bility distribution p N (n) and pdf p X (x) are given by (2.4) and (2.5).
The expected number of points in R is E[N (R)]. When the context clearly
identifies the set R, the expectation is written simply as E[N ]. By substituting
F(n, x1 , . . . , xn ) ≡ n into (2.26) and observing that the integrals all integrate
2.5 Expectations 21

to one, it follows immediately that

E[N ] ≡ n p N (n)
n=0

= λ(s) ds . (2.27)
R

Similarly, the variance is equal to the mean:

Var[N ] = (n − E[N ] )2 p N (n)

n=0

= λ(s) ds. (2.28)
R

The explicit sums in (2.27) and (2.28) are easily verified by direct calculation using
(2.4).

2.5.2 Random Sums

Evaluating expectations presents significant difficulties for many choices of the

function F. There are, fortunately, two important classes of functions whose expec-
tations simplify dramatically. The first class comprises functions called random
sums. They are especially useful in physics and signal processing. The expectations
of random sums reduce to an ordinary integral over R, a result that is surprising on
first encounter.
Let f (x) be a given real valued function. The random variable

N
F(Ξ ) = f (X j ) (2.29)
j=1

is called a random sum. Given a realization Ξ = ξ , a realization of the random

sum is given by

n
F(n, x1 , . . . , xn ) = f (x j ) , for n ≥ 1 , (2.30)
j=1

and, for n = 0, by F(0, ∅) ≡ 0. The special case of (2.30) for which f (x) ≡ 1
reduces to F(n, x1 , . . . , xn ) = n, the number of points in R. The mean of F is
given by
22 2 The Poisson Point Process
⎡ ⎤

N
E[F] = E ⎣ f (X j )⎦ (2.31)
j=1

= f (x) λ(x) dx . (2.32)
R

The expectation (2.32) is obtained by cranking through the algebra—substituting

(2.30) into (2.26) and interchanging the sum over j and the integrals over R gives

∞

n

n
E[F] = p N (n) ··· f (x j ) p X (x j ) dx1 · · · dxn .
n=0 j=1 R R j=1

All but one of the integrals evaluates to 1, so

∞

E[F] = p N (n) n f (x j ) p X (x j ) dx j .
n=0 R

Substituting (2.4) and (2.5) and simplifying gives (2.32).

The result (2.32) also holds for vector valued functions f , i.e., functions such
that f (x) ∈ Rm . This is seen by separating f into components.
Let G be the same kind of function as F, namely,

n
G(ξ ) = g(x j ) , (2.33)
j=1

where g(x) is a real valued function. Then the expected value of the product is

E[F G] = f (x) λ(x) dx g(x) λ(x) dx + f (x) g(x) λ(x) dx. (2.34)
R R R

Before verifying this result in the next paragraph, note that since the means of F
and G are determined as in (2.32), the result is equivalent to

cov[F, G] ≡ E [(F − E[F]) (G − E[G])]

= f (x) g(x) λ(x) dx. (2.35)
R

Setting g(x) = f (x) in (2.34) gives the variance:

Var[F] = E[F 2 ] − E 2 [F]

= f 2 (x) λ(x) dx . (2.36)
R
2.6 Campbell’s Theorem2 23

The special case f (x) ≡ 1, (2.36) reduces to the variance (2.28) of the number of
points in R.
The result (2.34) is verified by direct evaluation. Write

n
F(ξ ) G(ξ ) = f (xi ) g(x j ) + f (x j ) g(x j ). (2.37)
i, j=1 j=1
i= j

The second term in (2.37) is (2.30) with f (x j ) g(x j ) replacing f (x j ), so its expec-
tation is the second term of (2.34). The expectation of the first term is evaluated in
much the same way as (2.32); details are omitted. The identity (2.34) is sometimes
written
⎡ ⎤

⎢
N
⎥
E⎢ ⎣ f (X i ) g(X )
j ⎦
⎥ = f (x) λ(x) dx g(x) λ(x) dx . (2.38)
i, j=1 R R
i= j

The expression (2.38) holds for products of any number of functions.

For vector valued functions, f and g, the result (2.34) holds if g and G are
replaced by g T and G T :

E[F G T ] = f (x) λ(x) dx g T (x) λ(x) dx + f (x) g T (x) λ(x) dx.
R R R
(2.39)

This is verified by breaking the product F G T into components.

2.6 Campbell’s Theorem2

Campbell’s Theorem is the classic keystone result for random sums that dates to
1909 [11]. It gives the characteristic function of random sums F of the form (2.30).
The characteristic function is useful in many problems. For instance, it enables all
the moments of F to be found by a straightforward calculation. The mean and vari-
ance of F given in the previous section are corollaries of Campbell’s Theorem.
Slivnyak’s Theorem is the keystone result for a random sum in which the argu-
ment of the function in the summand depends in a “holistic” way on the PPP real-
ization. These sums are useful for applications involving spatially distributed sensor
networks. The application context facilitates understanding the result, so discussion
of Slivnyak’s Theorem is postponed to Section 7.1.2.

2 This section can be skipped entirely on a first reading of the chapter. The material presented is
used only Chapter 8.
24 2 The Poisson Point Process

Under mild regularity conditions, Campbell’s Theorem says that when θ is purely
imaginary

θF
E e = exp eθ f (x)
− 1 λ(x) dx , (2.40)
R

where f (x) is a real valued function. The expectation exists for any complex θ for
which the integral converges. It is obtained by algebraic manipulation. Substitute
the explicit form (2.17) into the definition of expectation and churn:

∞

n
E e θF
= p N (n) ··· eθ j=1 f (x j )
pX |N (x1 , . . . , xn | n) dx1 · · · dxn
n=0 R R
⎧ ⎫
∞

⎨ n ⎬
1
= e− R λ(s) ds
··· eθ f (x j )
λ(x j ) dx1 · · · dxn
n! R R ⎩ ⎭
n=0 j=1

∞ n
− λ(s) ds 1 θ f (s)
= e R e λ(s) ds
n! R
n=0

= e− R λ(s) ds
exp e θ f (s)
ds . (2.41)
R

The last expression is obviously equivalent to (2.40). See [49, 57, 63] for further
discussion.
The characteristic
√ function of F is given by (2.40) with θ = iω, where ω is
real and i = −1, and R = R. The convergence of the integral requires that the
Fourier transform of f exist as an ordinary function, i.e., it cannot be a generalized
function. As is well known, the moment generating function is closely related to the
characteristic function [93, Section 7.3]. Expanding the exponential gives

(ω)2 F
E ei ω F = E 1 + i ω F + (i)2 + ···
2!
(ω)2
= 1 + i ω E[F] + (i)2 E[F 2 ] + · · · ,
2!

assuming that integrating term by term is valid. Hence, by differentiation, the

moment of order n ≥ 1 is

dn
E F n = (−i)n n
E eiω F . (2.42)
dω ω=0

The results (2.32) and (2.36) are corollaries of (2.42).

The joint characteristic function of the random sum F and the sum G defined via
the function g(x) as in (2.33) is
2.6 Campbell’s Theorem2 25

E ei ω1 F + i ω2 G = exp ei ω1 f (x) + i ω2 g(x)
− 1 λ(x) dx . (2.43)
R

To see this, simply use ω1 f (x) + ω2 g(x) in place of f (x) in (2.40). An imme-
diate by-product of this result is an expression for the joint moments of F and G.
Expanding (2.43) in a joint power series and assuming term by term integration is
valid gives

E ei ω1 F + i ω2 G

(i ω1 )2 2 (i ω1 )(i ω2 ) (i ω2 )2 2
= E 1 + i ω1 F + i ω2 G + F + FG + G + ···
2! 2! 2!
= 1 + i ω1 E[F] + i ω2 E[G]
(i ω1 )2 (i ω1 )(i ω2 ) (i ω2 )2
+ E[F 2 ] + E[F G] + E[G 2 ] + · · · ,
2! 2! 2!

where terms of order larger than two are omitted. Taking partial derivatives gives the
joint moment of order (r, s) as

∂r ∂s
E F r G s = (−i)r +s E ei ω1 F + i ω2 G . (2.44)
∂ω1 ∂ω2
r s ω1 = ω2 = 0

In particular, a direct calculation for the case r = s = 1 verifies the earlier result
(2.34).
The form (2.40) of the characteristic function also characterizes the PPP; that is,
a finite point process whose expectations of random sums satisfies (2.40) is neces-
sarily a PPP. The details are given in the next subsection.

2.6.1 Characterization of PPPs

A finite point process is necessarily a PPP if its expectation of random sums matches
the form given by Campbell’s Theorem. Let Ξ be a finite point process whose real-
izations ξ = (n, {x1 , . . . , xn }) are in the event space E(S). The pdf of Ξ is pΞ (ξ ),
and the expectation is defined as in (2.18). The expectation of the random sum

N
F(Ξ ) = f (X j ) , n ≥ 1, (2.45)
j=1

is assumed to satisfy Campbell’s Theorem (with θ = −1) for a sufficiently large

class of functions f . This class of functions is defined shortly. Thus, for all f in this
class, it is assumed that
26 2 The Poisson Point Process

E e−F = exp e− f (x) − 1 λ(x) dx (2.46)
R

for some nonnegative function λ(x). The goal is to show that (2.46) implies that
the finite point process Ξ is necessarily a PPP with intensity function λ(x). This is
done by showing that Ξ satisfies the independent scattering property for any finite
number k of sets A j such that S = ∪kj=1 A j and Ai ∩ A j = ∅ for i = j.
Consider a nonnegative function f with values f 1 , f 2 , . . . , f k on the specified
sets A1 , A2 , . . . , Ak , respectively, so that

A j = {x : f (x) = f j } .

Let

mj = λ(x) dx .
Aj

The right hand side of (2.46) is then

⎡ ⎤

k
E e−F = exp ⎣ e− f j − 1 m j ⎦ . (2.47)
j=1

Observe that

k
f (X j ) ≡ f j N (A j ) , (2.48)
j=1 j=1

where N (A j ) is the number of points in A j . For the given function f , the assumed
identity (2.46) is equivalent to
⎡ ⎤
k
k
E e− j=1 f j N (A j )
= exp ⎣ e− f j − 1 m j ⎦ . (2.49)
j=1

Let z j = e− f j . The last result is

⎡ ⎤

k
N (A )
k
E⎣ zj j ⎦ = em j (z j − 1) . (2.50)
j=1 j=1

By varying the choice of function values f j ≥ 0, the result (2.50) is seen to hold
for all z j ∈ (0, 1).
The joint characteristic function of several random variables is the product
of the individual characteristic functions if and only if the random variables are
2.6 Campbell’s Theorem2 27

independent [93], and the characteristic function of the Poisson distribution with
mean m j is (in this notation) em j (z j − 1) . Therefore, the counts N (A j ) are indepen-
dent and Poisson distributed with mean m j . Since the sets A j are arbitrary, the finite
point process Ξ is a PPP.
The class of functions for which the identity (2.46) holds must include the class
of all nonnegative functions that are piecewise constant, with arbitrarily specified
values f j , on an arbitrarily specified finite number of disjoint sets A j . The discus-
sion here is due to Kingman [63].

2.6.2 Probability Generating Functional

A functional is an operator that maps a function to a real number. With

this language
in mind, the expectation operator is a functional because E[ f ] = S f (x) dx ∈ R.
The Laplace functional evaluated for the function f is defined for finite point
processes Ξ by

L Ξ ( f ) = E e−F(Ξ )
N
= E e− j=1 f (X j ) . (2.51)

The characteristic function of the random sum F is L Ξ (−iω f ). As in Campbell’s

Theorem, f is a nonnegative function for which the expectation exists.
Mathematical discussions also commonly use the probability generating func-
tional. For functions f such that 0 < f (x) ≤ 1 , it is defined as the Laplace
functional of − log f :

G Ξ ( f ) = L Ξ (− log f )
⎡ ⎛ ⎞⎤

N
= E ⎣exp ⎝ log f (X j )⎠⎦
j=1
⎡ ⎤

N
= E⎣ f (X j )⎦. (2.52)
j=1

The probability generating functional is the analog for finite point processes of the
probability generating function for random variables.
The Laplace and probability generating functionals are defined for general finite
point processes Ξ , not just PPPs. If Ξ is a PPP with intensity function λ(x), then

G Ξ ( f ) = exp ( f (x) − 1) λ(x) dx . (2.53)
R

Probability generating functionals are used only in Chapter 8.

28 2 The Poisson Point Process

2.7 Superposition
A very useful property of independent PPPs is that their sum is a PPP. Two PPPs
on S are superposed, or summed, if realizations of each are combined into one
event. Let Ξ and Υ denote these PPPs, and let their intensities be λ(s) and ν(s).
If (m, {x1 , . . . , xm }) and (n, {y1 , . . . , yn }) are realizations of Ξ and Υ , then the
combined event is (m + n, {x1 , . . . , xm , y1 , . . . , yn }). Knowledge of which points
originated from which realization is assumed lost.
The combined event is probabilistically equivalent to a realization of a PPP
whose intensity function is λ(s) + ν(s). To see this, let ξ = (r, {z 1 , . . . , zr }) ∈
E(R) be an event constructed in the manner just described. The partition of this
event into an m point realization of Ξ and an r − m point realization of Υ is
unknown. Let the sets Pm and its complement Pmc be such a partition, where
Pm ∪ Pmc = {z 1 , . . . , zr }. Let Pm denote the collection of all partitions of size m.
There are

r r!
≡
m m!(r − m)!

partitions in Pm . The partitions in Pm are equally likely, so the likelihood of ξ is the

sum over partitions:

r
1

p(ξ ) = r pΞ (m, Pm ) pΥ r − m, Pmc .
m=0 m Pm ∈Pm

Substituting the pdfs using (2.12) and rearranging terms gives

⎛ ⎞
e−μ

⎝
r
p(ξ ) = λ(z) ν(z)⎠ ,
r!
m=0 Pm ∈Pm c z∈Pm z∈Pm

where μ ≡ R (λ(s) + ν(s)) ds. The double sum in the last expression is recog-
nized (after some thought) as an elaborate way to write an r -term product. Thus,

e−μ
r
p(ξ ) = (λ(z i ) + ν(z i )) . (2.54)
r!
i=1

Comparing (2.54) to (2.12) shows that p(ξ ) is the pdf of a PPP with intensity func-
tion given by λ(s) + ν(s).
More refined methods that do not rely on partitions show that superposition holds
for a countable number of independent PPPs. The intensity of the superposed PPP
is the sum of the intensities of the constituent PPPs, provided the sum converges.
For details, see [63].
2.7 Superposition 29

The Central Limit Theorem for sums of random variables has an analog for point
processes called the Poisson Limit Theorem: the superposition of a large number
of “uniformly sparse” independent point processes converges in distribution to a
homogeneous PPP. These point processes need not be PPPs. The first statement and
proof of this difficult result dates to the mid-twentieth century. For details on R1 ,
see [62, 92]. The Poisson Limit Theorem also holds in the multidimensional case.
For these details, see [15, 40].

Example 2.2 The sum of dispersed unimodal intensities is sometimes unimodal.

Consider the intensity function

x i 2 10
λc (x, y) = cN ; ,σ , (2.55)
y j 01
i∈{−1,0,1} j∈{−1,0,1}

a b

c d

Fig. 2.2 Superposition of an equispaced grid of nine PPPs with circular Gaussian intensities (2.55)
of equal weight and spread, σ = 1. Samples from the PPP components are generated indepen-
dently and superposed to generate samples from the unimodal flat-topped intensity function
30 2 The Poisson Point Process

where (x, y) ∈ R ≡ [−4σ, 4σ ]2 , = 1.75, σ = 1, and c = 25. The nine

term sum is unimodal, as is seen in Fig. 2.2a. The terms in the sum are proportional
to truncated Gaussian pdfs; they are over-plotted (but not added!) in Fig. 2.2b. The
means of the summands range from 24.3925 to 24.9968 . The number of samples
in the realizations of the nine discrete Poisson variates range from 21 to 39, with an
average of 28.89 in this case.
The realizations of the nine component PPPs are shown in Fig. 2.2d using plot-
ting symbols “1” to “9” to identify the originating component. All nine sample sets
are combined and recolored green in Fig. 2.2c; the green samples are statistically
equivalent to a realization of the intensity function shown in Fig. 2.2a.

2.8 Independent (Bernoulli) Thinning

Thinning is a powerful method for sculpting interesting and practical PPP intensities
by reducing the number of points in the realizations. Let Ξ be a PPP on S. For
every x ∈ S, let 1 − α(x), 0 ≤ α(x) ≤ 1, be the probability that a point located
at x is removed, or culled, from any realization that contains it. For the realization
ξ = (n, {x1 , . . . , xn }), the point x j is retained with probability
α(x
j ) and culled

with probability 1 − α(x j ). The thinnedrealization is ξα = m, x1 , . . . , xm ,
where m ≤ n is the number of points x1 , . . . , xm ⊂ {x1 , . . . , xn } that pass
the Bernoulli test. The culled realization ξ1−α is similarly defined. Knowledge of
the number n of points in ξ is assumed lost. It is called Bernoulli, or independent,
thinning because α(x) depends only on x.
The thinned process is a PPP with intensity function

λα (x) = α(x) λ(x) . (2.56)

To see this, consider first the special case that α(x) is constant on R. Let

μα
μ = λ(x) dx, μα = λα (x) dx, β = .
R R μ

The probability that ξα has m points after thinning ξ is

n
Pr[m | n] = β m (1 − β)n−m , m ≤ n.
m

The number of points n in the realization of ξ is unknown, so

2.8 Independent (Bernoulli) Thinning 31

∞
n
Pr[m] = β m (1 − β)n−m Pr[n] (2.57)
n=m
m
∞

n! μn −μ
= β m (1 − β)n−m e
n=m
m!(n − m)! n!
∞
(β μ)m −μ
((1 − β) μ)n−m
= e
m! n=m
(n − m)!
(β μ)m −β μ μm
= e ≡ α e−μα .
m! m!

Thus, from (2.2),

the number of points m is Poisson distributed with mean μα .
The samples x1 , . . . , xm are clearly i.i.d., and a Bayesian posterior computation
shows that their pdf is λα (x)/μα = λ(x)/μ .
The problem is harder if α(x) is not constant. A convincing demonstration in this
case goes as follows. Break the set R into a large number of “small” nonoverlapping
cells. Let R be one such cell, and let

μα
μ = λ(x) dx, μα = λα (x) dx, β = .
R R μ

The probability that ξα has m points after thinning ξ is, by the preceding argument,
Poisson distributed with mean μα . The samples x1 , . . . , xm are i.i.d., and their
pdf on R is λα (x)/μα . Now extend the intensity function from R to all R by
setting it to zero outside the cell. Superposing these cell-level PPPs and taking the
limit as cell size goes to zero shows that λα (x) is the intensity function on the full
set R. Further details are omitted.
An alternative demonstration exploits the acceptance-rejection method. Generate
a realization of the PPP with intensity function λ(x) from the homogeneous PPP
with intensity
function Λ = maxx∈R λ(x). Redefine μα = R λα (x) dx, and let
|R| = R dx. The probability that no points remain in R after thinning by α(x) is

v(R) = Pr[(n, {x1 , . . . , xn }) and all points are thinned]

n=0

∞ n
Λn |R|n
1 λ(s)
= e−Λ|R| 1 − α(s) ds
n=0 R |R|
n! Λ
∞

−Λ|R| Λn |R|n μα n
= e 1−
n! Λ|R|
n=0
= e−μα .

The void probabilities v(R) for a sufficiently large class of “test” sets R characterize
a PPP, a fact whose proof is unfortunately outside the scope of the present book.
32 2 The Poisson Point Process

(A clean, relatively accessible derivation is given in [136, Theorem 1.2].) Given the
result, it is clear that the thinned process is a PPP with intensity function λα (x).

Example 2.3 Triple Thinning. The truncated and scaled zero-mean Gaussian inten-
sity function on the rectangle [−2σ, 2σ ] × [−2σ, 3σ ],

λc (x, y) = c N (x ; 0, σ 2 )N (y ; 0, σ 2 ),

is depicted in Fig. 2.3a for c = 2000 and σ = 1. Its mean intensity (i.e., the inte-
gral of λ2000 over the rectangle) is μ0 = 1862.99. Sampling the discrete Poisson
variate with mean μ0 gives, in this realization, 1892 points. Boundary conditions
are imposed by the thinning functions

a b

c d

Fig. 2.3 Triply thinning the Gaussian intensity function by (2.58) for σ = 1 and c = 2000 yields
samples of an intensity with hard boundaries on three sides
2.9 Declarations of Independence 33

α1 (x, y) = 1 − e−y if y ≥ 0
x −2
α2 (x, y) = 1 − e if x ≤ 2 (2.58)
−x − 2
α3 (x, y) = 1 − e if x ≥ −2,

where α j (x, y) = 0 for conditions not specified in (2.58). The overall thinning
function, α1 α2 α3 , is depicted in Fig. 2.3b overlaid on the surface corresponding to
λ1 . The intensity of the thinned PPP, namely α1 α2 α3 λ2000 , is nonzero only on the
rectangle [−2σ, 2σ ] × [0, 3σ ]. It is depicted in Fig. 2.3c. Thinning the 1892 points
of the realization of λ2000 leaves the 264 points depicted in Fig. 2.3d. These 264
points are statistically equivalent to a sample generated directly from the thinned
PPP. The mean thinned intensity is 283.19.

2.9 Declarations of Independence

Several properties of PPPs related to independence are surveyed in this section.

Independent scattering is discussed first. It is most often used as one of the defin-
ing properties of PPPs. Since the two-step generation procedure defines PPPs, it is
necessary to obtain it from the procedure. Thinning is the method used here. As
mentioned earlier, PPPs are characterized by the form (see Campbell’s Theorem)
of the characteristic function of random sums. The easy way to see this relies on
independent scattering, so this is the second topic.
The nest topic is Poisson’s gambit. This is a hopefully not irreverent name for a
surprising property of the Poisson distribution when it is used as a prior on the num-
ber of Bernoulli trials performed. The last topic speaks of the fact that a finite point
process that satisfies independent scattering must have Poisson distributed numbers
of points.

2.9.1 Independent Scattering

Independent scattering3 is a fundamental property of point processes. It may seem

somewhat obvious at first glance, but there is a small subtlety in it that deserves
respect. The discussion hopefully exposes the subtlety and makes clear the impor-
tance of the result. In any event, the independent scattering property is very useful in
applications, and often plays a crucial role in determining the mathematical structure
of the likelihood function.
Let Ξ ≡ Ξ (R) denote a point process on R ⊂ Rm , and let ξ =
(n, {x1 , . . . , xn }) be a realization. It is not assumed that Ξ is a PPP. Let A ⊂ R

3 This name conveys genuine meaning in the point process context, but it seems of fairly recent
vintage [84, Section 3.1.2] and [123, p. 33]. It is more commonly called independent increments,
which can be confusing because the same name is used for a similar, but different, property of
stochastic processes. See Section 2.9.4.
34 2 The Poisson Point Process

and B ⊂ R denote bounded subsets of R. The point processes Ξ (A) and Ξ (B)
are obtained by restricting realizations of Ξ to A and B, respectively. Simply put,
the points in ξ(A) are the points of ξ that are in A ∩ R, and the same for ξ(B).
This somewhat obscures the fact that the realizations ξ A and ξ B are obtained from
the same realization ξ . Intuition may suggest that constructing ξ A and ξ B from the
very same realization ξ will force the point processes Ξ (A) and Ξ (B) to be highly
correlated in some sense. Such intuition is in need of refinement, for it is incorrect.
This is the subtlety mentioned above.
Let ξ denote an arbitrary realization of a point process Ξ (A∪ B) on the set A∪ B.
The point process Ξ (A ∪ B) is an independent scattering process if

pΞ (A∪B) (ξ ) = pΞ (A) (ξ A ) pΞ (B) (ξ B ) , (2.59)

for all disjoint subsets A and B of R, that is, for all subsets such that A ∩ B = ∅.
The pdfs in (2.59) are determined by the specific character of the point process,
so they are not in general those of a PPP. The product in (2.59) is the reason the
property is called independent scattering.
A nonhomogeneous multidimensional PPP is an independent scattering point
process. To see this it is only necessary to verify that (2.59) holds. Define thinning
probability functions, α(x) and β(x), by
%
1, if x ∈ A
α(x) =
0, if x ∈
/ A

and
%
1, if x ∈ B
β(x) =
0, if x ∈
/ B.

The point processes Ξ (A) and Ξ (B) are obtained by α-thinning and β-thinning
realizations ξ of the PPP Ξ (A ∪ B), so they are PPPs. Let λ(x) be the intensity
function of the PPP Ξ (A ∪ B). Let ξ = (n, {x1 , . . . , xn }) be an arbitrary realiza-
tion of Ξ (A ∪ B). The pdf of ξ is, from (2.12),

n
pΞ (A∪B) (ξ ) = e− A∪B λ(x) dx
λ(x j ). (2.60)
j=1

Because the points of the α-thinned and β-thinned realizations are on disjoint sets
A and B, the realizations ξ A = (i, {y1 , . . . , yi }) and ξ B = (n, {z 1 , . . . , z k }) are
necessarily such that i + k = n and

{y1 , . . . , yi } ∪ {z 1 , . . . , z k } = {x1 , . . . , xn }.

Because Ξ (A) and Ξ (B) are PPPs, the pdfs of ξ A and ξ B are
2.9 Declarations of Independence 35

i
pΞ (A) (ξ A ) = e− A λ(x) dx
λ(y j )
j=1

k
pΞ (B) (ξ B ) = e− B λ(x) dx
λ(z j ) .
j=1

The product of these two pdfs is clearly equal to that of (2.60). The key elements of
the argument are that the thinned processes are PPPs, and that the thinned realiza-
tions are free of overlap when the sets are disjoint. The argument extends easily to
any finite number of disjoint sets.
Example 2.4 Likelihood Function for Histogram Data. A fine illustration of the util-
ity of independent scattering is the way it makes the pdf of histogram data easy to
determine. Denote the cells of a histogram by R1 , . . . , R K , K ≥ 1. The cells are
assumed disjoint, so R j ∩ R j = ∅ for i = j. Histogram data are nonnegative
integers that count the number of points of a realization of a point process that fall
within the various cells. No record is kept of the locations of the points within any
cell. Histogram data are very useful for compressing large volumes of sample (point)
data.
Denote the histogram data by n 1:K ≡ {n 1 , . . . , n K }, where n j ≥ 0 is the number
of points of the process that lie in R j . Let the point process Ξ be a PPP, and let
Ξ (R j ) denote
the PPP obtained by restricting Ξ to R j . The intensity function of
Ξ (R j ) is R j λ(s) ds. The histogram cells are disjoint. By independent scattering,
the PPPs Ξ (R1 ), . . . , Ξ (R K ) are independent and the pdf of the histogram data is

& '
n j

K
Rj λ(s) ds
p (n 1:K ) = exp − λ(s) ds (2.61)
Rj n j!
j=1
n j
K λ(s) ds
Rj
= exp − λ(s) ds , (2.62)
R n j!
j=1

where R = R1 ∪ · · · ∪ R K ⊂ S is the coverage of the histogram. Estimation

problems involving histogram PPP data start with expression (2.62).
Example 2.5 Poisson Distribution Without Independent Scattering. It is possible for
a point process to have a Poisson distributed number of points in bounded subsets
R, but yet not satisfy the independent scattering property on disjoint sets, that is, it
is not a PPP. An interesting example on the unit interval due to L. Shepp is given
here (see [40, Appendix]).
Choose the number of points n in the interval [0, 1] with probability e−λ λn /n! ,
where λ > 0 is the intensity function of a homogeneous PPP on [0, 1]. For
n = 3, let the points be i.i.d., so their cumulative distribution function (CDF) is
F(c1 , . . . , cn ) = c1 · · · cn , where c j ∈ [0, 1]. For n = 3, the points are chosen
36 2 The Poisson Point Process

according to the CDF

F(c1 , c2 , c3 ) = c1 c2 c3 + ε (c1 − c2 )2 (c1 − c3 )2 (c2 − c3 )2

× c1 c2 c3 (1 − c1 ) (1 − c2 )(1 − c3 ). (2.63)

The point process has realizations in the event space E([0, 1]), but it is not a PPP
because of the way the points are sampled for n = 3.
For any c ∈ [0, 1], define the random variable
%
1, if x < c
X c (x) = (2.64)
0, if x ≥ c.

The number of points in a realization of the point process in the interval [a, b]
conditioned on n points in [0, 1] is

G n (a, b, m) = Pr exactly m points of {x1 , . . . , xn } are in [a, b] . (2.65)

Using the functions (2.64),

n
G n (a, b, m) = Pr [{x1 , . . . , xm } ∈ [a, b]] Pr {xm+1 , . . . , xn } ∈ / [a, b]
m
⎡ ⎤

n
m
n

= E⎣ X b (x j ) − X a (x j ) X a (x j ) + X 1 (x j ) − X b (x j ) ⎦ .
m
j=1 j=m+1
(2.66)

For n = 3, the points are i.i.d. conditioned on n, so for all c j ∈ [0, 1]

E X c1 (x1 ) · · · X cn (xn ) = F [c1 , . . . , cn ] = c1 · · · cn . (2.67)

For n = 3, the product in G n (a, b, m) expands into a sum of expectations of

products of the form (2.67) with c j equal to one of the three values a, b, or 1. From
the definition (2.63), it follows in this case that F [c1 , c2 , c3 ] = c1 c2 c3 . Hence,
(2.67) holds for all n ≥ 0. Substituting this result into (2.66) and manipulating the
result in the manner of (2.57) shows that the number of points in the interval [a, b]
is Poisson distributed with intensity λ(b − a).

2.9.2 Poisson’s Gambit

A Bernoulli trial is an idealized coin flip. It is any random variable with two out-
comes: “success” and “failure.” The outcomes are commonly called “heads” and
“tails”. Obviously, the names attached to the two outcomes are irrelevant here. The
2.9 Declarations of Independence 37

probability of heads is p and the probability of tails is q = 1 − p. Sequences of

Bernoulli trials are typically independent unless stated otherwise.
Denote the numbers of heads and tails observed in a sequence of n ≥ 1 inde-
pendent Bernoulli trials by n h and n t , respectively. The sequence of Bernoulli trials
is performed (conceptually) many times, so the observed numbers n h and n t are
realizations of random variables, denoted by Nh and Nt , respectively. If exactly n
trials are always performed, the random variables Nh and Nt are not independent
because of the deterministic constraint

Nh + Nt = n.

However, if the sequence length n is a realization of a Poisson distributed random

variable, denoted by N , then Nh and Nt are independent random variables! The
randomized constraint

Nh + Nt = N

holds, but it is not enough to induce any dependence whatever between Nh and Nt .
This property is counterintuitive when first encountered, but it plays an important
role in many applications. To give it a name, since one seems to be lacking in the
literature, Poisson’s gambit4 is the assumption that the number of Bernoulli trials is
Poisson distributed. Poisson’s gambit is realistic in many applications, but in oth-
ers it is only an approximation. The name is somewhat whimsical—it is not used
elsewhere in the literature.
Invoking Poisson’s gambit, the number N is an integer valued, Poisson dis-
tributed random variable with intensity λ > 0. Sampling N gives the length n of
the sequence of Bernoulli trials performed. Then n = n h + n t , where n h and n t
are the observed numbers of heads and tails. The random variables Nh and Nt are
independent Poisson distributed with mean intensities pλ and (1− p)λ, respectively.
To see this, note that the probability of a Poisson distributed number of n Bernoulli
trials with outcomes n h and n t is

Pr[N = n, Nh = n h , Nt = n t ] = Pr[n] Pr[n h , n t | n]

λn n
= e−λ p n h (1 − p)n t
n! n h
% nh ( % nt (
− p λ ( p λ) −(1 − p)λ ((1 − p)λ)
= e e .
nh ! nt !
(2.68)

4 A gambit in chess involves sacrifice or risk with hope of gain. The sacrifice here is loss of control
over the number of Bernoulli trials, and the gain is independence of the numbers of different
outcomes.
38 2 The Poisson Point Process

The final product is the statement that the number of heads and tails are independent
Poisson distributions with the required parameters. For further comments, see, e.g.,
[52, Section 9.3] or [42, p. 48].
Example 2.6 Independence of Thinned and Culled PPPs. The points of a PPP that
are retained and those that are culled during Bernoulli thinning are both PPPs. Their
intensities are p(x)λ(x) and (1 − p(x))λ(x), respectively, where p(x) is the prob-
ability that a point at x ∈ S is retained. Poisson’s gambit implies that the numbers
of points in these two PPPs are independent. Step 2 of the realization procedure
guarantees that the sample points are of the two processes are independent. The
thinned and culled PPPs are therefore independent, and superposing them recovers
the original PPP, since the intensity function of the superposition is the sum of the
component intensities. In other words, splitting a PPP into two parts using Bernoulli
thinning, and subsequently merging the parts via superposition recovers the original
PPP.
Example 2.7 Coloring Theorem. Replace the Bernoulli trials in Example 2.6 by
independent multinomial trials with k ≥ 2 different outcomes, called “colors” in
[63, Chapter 5], with probabilities { p1 (x), . . . , pk (x)}, where

p1 (x) + · · · + pk (x) = 1 .

Every point x ∈ S of a realization of the PPP Ξ with intensity function λ(x) is

colored according to the outcome of the multinomial trial. For every color j, let Ξ j
denote the point process that corresponds to points of color j. Then Ξ j is a PPP,
and its intensity is

λ j (x) = p j (x) λ(x).

Poisson’s gambit and Step 2 of the realization procedure shows that the PPPs inde-
pendent. The intensity of their superposition is

k
λ j (x) = p j (x) λ(x) = λ(x),
j=1 j=1

which is the intensity of the original PPP.

2.9.3 Inevitability of the Poisson Distribution

If an orderly point process satisfies the independent scattering property and the num-
ber of points in any bounded set R is finite and not identically zero (with probability
one), then the number of points of the process in a given set R is necessarily Poisson
distributed—the Poisson distribution is inevitable (as Kingman wryly observes).
This result shows that if the number points in realizations of the point process is
2.9 Declarations of Independence 39

not Poisson distributed for even one set R, then it is not an independent scattering
process, and hence not a PPP. To see this, a physics-style argument (due to Kingman
[63, pp. 9–10]) is adopted.
Given a set A = ∅ with no “holes”, or voids, define the family of sets At , t ≥ 0
by

At = ∪a∈A x ∈ Rm : x − a ≤ t ,

where · is the usual Euclidean distance. Because A has no voids, the boundary
of At encloses the boundary of As if t > s. Let

pn (t) = Pr [N (At ) = n]

and

qn (t) = Pr [N (At ) ≤ n] ,

where N (At ) is the random variable that equals the number of points in a realization
that lie in At . The point process is orderly, so it is assumed that the function pn (t)
is differentiable. Let

μ(t) ≡ E [N (At )] .

Finding an explicit mathematical form for this expectation is not the goal here. The
goal is to show that

μn (t)
pn (t) = e−μ(t) .
n!

In words, the number N (At ) is Poisson distributed with parameter μ(t).

Since N (At ) increases with increasing t, the function qn (t) is decreasing. Simi-
larly, μ(t) is an increasing function. For h > 0, the probability that N (At ) jumps
from n to n + 1 between t and t + h is qn (t) − qn (t + h) ≥ 0. This is the
probability that exactly one point of the realization occurs in the annular region

Ath = At+h \ At .

Another way to write this probability uses independent scattering. For sufficiently
small h > 0, the probability that one point falls in Ath is

μ(t + h) − μ(t) = Pr N Ath = 1 ≥ 0 .

This probability is independent of N (At ) since At ∩ Ath = ∅, so

40 2 The Poisson Point Process

qn (t) − qn (t + h) = Pr [N (At ) = n] Pr N Ath = 1 [| N (At ) = n]

= Pr [N (At ) = n] Pr N Ath = 1
= pn (t) (μ(t + h) − μ(t)) .

Dividing by h and taking the limit as h → 0 gives

dqn (t) dμ(t)

− = pn (t) . (2.69)
dt dt

For n = 0, q0 (t) = p0 (t), so (2.69) gives

d p0 (t) dμ(t) d
− = p0 (t) ⇔ (μ(t) + log p0 (t)) = 0 .
dt dt dt

Since p0 (0) = 1 and μ(0) = 0, it follows that

p0 (t) = e−μ(t) . (2.70)

For n ≥ 1, from (2.69),

pn−1 (t) μ (t) = − qn−1

(t) − pn (t) μ (t) + pn (t) μ (t)

= − qn−1 (t) + qn (t) + pn (t) μ (t)
= pn (t) + pn (t) μ (t) ,

where the last step follows from pn (t) = qn (t) − qn−1 (t). Multiplying both sides
by e μ(t) and using the product differentiation rule gives

d dμ(t)
pn (t) e μ(t) = pn−1 (t) e μ(t) .
dt dt

Integrating gives the recursion

t dμ(x)
pn (t) = e−μ(t) pn−1 (x) e μ(x) dx . (2.71)
0 dx

Solving the recursion starting with (2.70) gives pn (t) = e−μ(t) μn (t)/ n! , the Pois-
son density (2.4) with mean μ(t).
The class of sets without voids is a very large class of “test” sets. To see that
the Poisson distribution is inevitable for more general sets requires more elaborate
theoretical methods. Such methods are conceptually lovely and mathematically rig-
orous. They confirm but do not deepen the insights provided by the physics-style
argument, so they are not presented here.
2.9 Declarations of Independence 41

2.9.4 Connection to Stochastic Processes

The notion of independent increments is defined for stochastic processes. A stochas-
tic process X (t) is a family of random variables indexed by a continuous param-
eter t ≥ t0 , where t0 is an arbitrarily specified starting value. In many prob-
lems, t is identified with time. They are widely used in engineering, physics, and
finance. A stochastic process is an independent increments process if for every
set of ordered time indices t0 ≤ t1 < · · · < tn , the n random variables
X (t1 ), X (t2 ) − X (t1 ), . . . , X (tn ) − X (tn−1 ) are independent. The differences
X (t j ) − X (t j−1 ) are called increments.
There are two different kinds of independent increments stochastic process,
namely, the Poisson process and the Wiener process. Independent increments
stochastic processes are linear combinations of these two processes [39, Chapter 6].
In the univariate case, the Poisson process is the counting process, often denoted
by {N (t) : t ≥ t0 }, of the points of a PPP with intensity λ(t). The process N (t)
counts the number of points of a PPP realization in the interval [t0 , t). The sample
paths of N (t) are therefore piecewise constant and jump in value by +1 at the loca-
tions of the points of the PPP realization. The CDF of the time interval τ between
successive points (the interarrival time) of the PPP is

Ft j−1 [τ ] = Pr next point after t j−1 is ≤ τ + t j−1

= 1 − Pr next point after t j−1 is > τ + t j−1

= 1 − Pr N (τ + t j−1 ) − N (t j−1 ) = 0
= 1 − e−(Λ(τ + t j−1 ) − Λ(t j−1 )) , (2.72)

where
t
Λ(t) = λ(τ ) dτ , t ≥ t0 .
t0

Differentiating (2.72) with respect to τ gives the pdf of interarrival times as

pt j−1 (τ ) = λ τ + t j−1 e−(Λ(t j ) − Λ(t j−1 )) .

The interarrival times are identically exponentially distributed if the PPP is homo-
geneous. Explicitly, for λ(t) ≡ λ0 ,

pt j−1 (τ ) ≡ p0 (τ ) = λ0 e−λ0 τ .

Because of independent scattering property of PPPs, the interarrival times are also
independent in this case.
In contrast to the discontinuous sample paths of the Poisson process, the sample
paths of the Wiener process are continuous with probability one. For Wiener pro-
cesses, the random variable X (t1 ) is zero mean Gaussian distributed with variance
42 2 The Poisson Point Process

t1 Σ, where Σ is a positive definite matrix, and the increments X (t j ) − X (t j−1 ) are

zero mean Gaussian distributed with variances (t j − t j−1 ) Σ. The interval between
zero crossings, or more generally between level crossings, of the sample paths
of one dimensional Wiener processes is discussed in [93, Section 14.7] and also
in [101].

2.10 Nonlinear Transformations

An important property of PPPs is that they are still PPPs after undergoing a deter-
ministic nonlinear transformation. The invariance of PPPs under nonlinear mapping
is important in many applications.
Let the function f : S → T be given, where S ⊂ Rm and T ⊂ Rκ , κ ≥ 1. The
PPP, say Ξ , is transformed, or mapped, by f from S to T by mapping the realization
ξ = (n, {x1 , . . . , xn }) to the realization f (ξ ) ≡ (n, { f (x1 ), . . . , f (xn )}). The
transformed process is denoted f (Ξ ), and it takes realizations in the event space
E( f (S)), where f (S) ≡ {t ∈ T : t = f (s) for some s ∈ S} ⊂ T .
For a broad class of functions f , the transformed process is a PPP. To see this
when f is a change of variables, y = f (x), note that
)) ∂ f −1 (y) ))
λ(x) dx = λ f −1
(y) )) ) dy , (2.73)
R f (R) ∂y )

where |∂ f −1 /∂ y| is the determinant of the Jacobian of the inverse of the change of

variables. Since (2.73) holds for all bounded subsets R, the intensity function of
f (Ξ ) is
) −1 )
) ∂ f (y) )
ν(y) = λ f −1 (y) ) )
) ∂y ) . (2.74)

Orthogonal coordinate transformations are especially nice since the Jacobian is

identically one.
Example 2.8 Change of Variables. From (2.74) it is a straightforward calculation
to see that the linear transformation y = Ax + b, where the matrix A ∈ Rm×m
is invertible, transforms the PPP with intensity function λ(x) into the PPP with
intensity function

1 −1
ν(y) = λ A (y − b) , (2.75)
| A|

where | A | is the determinant of A. What if A is singular?

Example 2.9 Mapping Nonhomogeneous to Homogeneous PPPs. On the real
line, every nonhomogeneous PPP can be transformed to a homogeneous PPP
2.10 Nonlinear Transformations 43

[100, Chapter 4]. Suppose that Ξ is a PPP with intensity function λ(x) > 0 for
all x ∈ S ≡ R1 , and let
x
y = f (x) = λ(t) dt for − ∞ < x < ∞. (2.76)
0

The point process f (Ξ ) is a PPP with intensity one. To see this, use (2.74) to obtain

λ f −1 (y) λ(x)
ν(y) = = = 1,
| ∂ f (x)/∂ x| λ(x)

where the chain rule is used to show that | ∂ f −1 (x)/∂ x| = 1/| ∂ f (x)/∂ x|. An
alternative, but more direct way, to see the same thing is to observe that since f is
monotone, its inverse exists and the mean number of points in any bounded interval
[a, b] is

f −1 (b) b
d f (x) = dy = b − a . (2.77)
f −1 (a) a

Therefore, f (Ξ ) is homogeneous with intensity function ν(y) ≡ 1. Obvious modi-

fications are needed to make this method work for λ(y) ≥ 0.
A scalar multiple of the mapping (2.76) is used in the well known algorithm
for generating i.i.d. samples of a one dimensional random variable via the inverse
cumulative density function. The transformation fails for Rm , m ≥ 2, because the
inverse function is a “one to many” mapping. For the same reason, nonhomogeneous
PPPs on spaces of dimension more than two do not transform to homogeneous ones
of the same dimension.
Transformations may alter all the statistical properties of the original PPP, not
just the PPP intensity function. For instance, in Example 2.9, because f (Ξ ) is a
homogeneous PPP, the interval lengths between successive points of f (Ξ ) are inde-
pendent. (see Section 2.9.4.) However, the intervals between successive points of the
original nonhomogeneous PPP Ξ are not independent [63, p. 51]. In practice, it is
necessary to understand how the transformation affects all the statistical properties
deemed important in the application.
An important class of “many to one” mappings are the projections π from Rm
to Rκ , where κ ≤ m. Let π map the point x = (υ1 , . . . , υm ) ∈ Rm to the point
y = π(x) = (υ1 , . . . , υκ ) ∈ Rκ . The set of all x ∈ Rm that map to the point y is
π −1 (y). This set is a continuous manifold in Rm . Explicitly,

π −1 (y) = {(υ1 , . . . , υκ , υκ+1 , . . . , υm ) : υκ+1 ∈ R, . . . , υm ∈ R} . (2.78)

Integrating over the manifold π −1 (y) gives the intensity function

44 2 The Poisson Point Process

ν(υ1 , . . . , υκ ) = ··· λ(υ1 , . . . , υκ , υκ+1 , . . . , υm ) dυκ+1 · · · dυm .
R R
(2.79)

This is the intensity function of a PPP on Rκ denoted by π(Ξ ).

That the projection of a PPP is still a PPP is an instance of a general nonlinear
mapping property. The nonlinear mappings y = f (x) for which the result holds
are those for which the sets
* +
M(y) ≡ f −1 (y) : y ∈ Rκ ⊂ Rm (2.80)

are all commensurate, that is, all have the same intrinsic dimension. For these func-
tions, if Ξ is a PPP, then so is f (Ξ ). The intensity function of f (Ξ ) is

ν(x) = λ f −1 (y) dM(y), (2.81)
M(y)

where dM(y) is the differential in the tangent space at the point f −1 (y) of the set
M(y). The special case of projection mappings provides the basic intuitive insight
into the nonlinear mapping property of PPPs. To see that the result holds requires
a more careful and mathematically subtle analysis than is deemed appropriate here.
See [63, Section 2.3] for further details.
In practice, the sets M(y) are commensurate for most nonlinear mappings. For
example, it is easy to see that the projections have this property. However, some
nonlinear functions do not. As the next example shows, the problem with forbidden
mappings is that they lead to “intensities” that are generalized functions.
Example 2.10 A Forbidden Nonlinear Mapping. The sets M(y) of the function f :
R2 → R1 defined by
,
- 0, if x12 + x22 < 1
y = f (x1 , x2 ) =
x12 + x22 − 1, if x12 + x22 ≥ 1

are not commensurate for all y. Clearly

* +
M(0) = (x1 , x2 ) : x12 + x22 ≤ 1 ⊂ R2

is a disc of radius one and, for y > 0,

M(y) = { ((y + 1) cos θ, (y + 1) sin θ ) : 0 ≤ θ < 2π } ⊂ R2

is a circle of radius y + 1. The intrinsic dimension of f −1 (0) is two, and that of

f −1 (y) for y > 0 is one. Assume that Ξ is a PPP with intensity one on R2 . Then,
integrating over these sets gives the intensity
2.10 Nonlinear Transformations 45

ν(0) = 1 dx1 dx2 = π
M(0)

and

ν(y) = 1 dθ = 2π(y + 1), y > 0.
M(y)

This gives

ν(y) = π δ(y) + 2π(y + 1), y ≥ 0,

where δ(y) is the Dirac delta function.

Example 2.11 Polar Coordinate Projections. The change of variables from Carte-
sian to polar coordinates in the plane, given by

(y1 , y2 ) = f (x1 , x2 )
1/2
≡ x12 + x22 , arctan(x1 , x2 ) ,

maps a PPP with intensity function λ(x1 , x2 ) on R2 to a PPP with intensity function

ν(y1 , y2 ) = y1 λ(y1 cos y2 , y1 sin y2 )

on the semi-infinite strip

{(y1 , y2 ) : y1 > 0, 0 ≤ y2 < 2π } . (2.82)

If λ(x1 , x2 ) ≡ 1, then ν(y1 , y2 ) = y1 . From (2.79), the projection onto the range
y1 gives a PPP on [0, ∞) ⊂ R1 with intensity function ν(y1 ) = 2π y1 , and the
projection onto the angle y2 is of infinite intensity on [0, 2π ]. Alternatively, if
−1/2
λ(x1 , x2 ) = x12 + x22 , then ν(y1 , y2 ) ≡ 1. The projection onto range is
ν(y1 ) = 2π ; the projection onto angle is ∞.

Historical Note. Example 2.11 is the two dimensional (cylindrical propagation)

version of Olber’s famous paradox (1823) in astronomy. It asks, “Why is the sky
dark at night?” The argument is that if star locations form a homogeneous PPP
in R3 , at the time a seemingly reasonable model for stellar distributions, then an
easy calculation shows that the polar projection onto the unit sphere is a PPP with
infinite intensity. If stellar intensity falls off as the inverse square of distance (due
to spherical propagation), another easy calculation shows that the polar projection
still has infinite intensity. Resolving the paradox (e.g., by assuming the universe is
46 2 The Poisson Point Process

bounded) is evidently a nontrivial exercise requiring a careful study of the structure

of the universe. It is left as an exercise for the interested reader.

2.11 Stochastic Transformations

Target motion modeling and measurement are both important in many applications.
Suppose the targets (i.e., the points) of a PPP realization on the space S at time tk−1
move to another state in S at time tk according to a Markovian transition probability
function. The point process that comprises the targets after they transition is equiva-
lent to a realization of a PPP. The intensity function of the transitioned PPP is given
in Section 2.11.1 in terms of the initial target intensity function and the transition
function.
Similarly, if the errors in point measurements are distributed according to a spec-
ified probability density function conditioned on target state, then the point process
comprising the measured points is a PPP on the measurement space, denoted T . The
intensity function of this measurement process is given in Section 2.11.2 in terms of
the target intensity and the measurement conditional pdf.
The nice thing about both these results is that they hold for nonlinear target and
measurement models [118, 119]. Formulated as an input-output relationship, the
input is a target PPP on the state space S, while the output is a PPP on either
the target space S or the measurement space T . In this sense, the transition and
measurement processes are very similar.

2.11.1 Transition Processes

A PPP that undergoes a Markovian transition remains a PPP. Let Ψ be the transition
pdf, so that the likelihood that the point x in the state space S transforms to the
point y ∈ S is Ψ (y | x). Let Ξ be the PPP on S with intensity function λ(s), and let
ξ = (m, {x1 , . . . , xm }) be a realization of Ξ . After transitioning the constituent
points, this realization is η ≡ (m, {y1 , . . . , ym }), where y j is a realization of the
pdf Ψ ( · | x j ), j = 1, . . . , m. The realizations {y j } are independent. The transition
process, denoted by Ψ (Ξ ), is a PPP on S with intensity function

ν(y) = Ψ (y | x) λ(x) dx . (2.83)
S

To see this, let R be any bounded subset of S. Let μ = R λ(s) ds and observe
that the likelihood of the transition event η is, by construction,
2.11 Stochastic Transformations 47
⎛ ⎞

m
p(η) = ··· ⎝ Ψ (y j | x j )⎠ pΞ (m, {x1 , . . . , xm }) dx1 · · · dxm
R R j=1
⎛ ⎞⎛ ⎞
m m
e−μ ⎝
= ··· Ψ (y j | x j )⎠ ⎝ λ(x j )⎠ dx1 · · · dxm
m! R R j=1 j=1
−μ
m
e
= Ψ (y j | x j ) λ(x j ) dx j .
m! R
j=1

Substituting (2.83) gives

e−μ
m
p(η) = ν(y j ) . (2.84)
m!
j=1

Since

ν(y) dy = Ψ (y | x) λ(x) dx dy
R
R R
= λ(x) dx = μ , (2.85)
R

it follows from (2.12) that the transition Poisson process Ψ (Ξ ) is also a PPP.

2.11.2 Measurement Processes

The transition process result generalizes to sensor measurement processes. A sensor
system, comprising a sensor together with a signal processing suite, produces target
measurements. These measurements depend on the intimate details of the sensor
system and on the state of the target. The specific details of the sensor system, the
environment in which it is used, and the target are all built into a crucially important
function called the sensor conditional pdf. This conditional pdf is assumed suffi-
ciently accurate for the application at hand. In practice, there is almost inevitably
some mismatch between the theoretical model of the sensor pdf and that of the
real sensor, so the fidelity of the pdf model must be carefully examined in each
application.
Let the pdf of an arbitrary measurement z conditioned on target state x be (z | x).
This function includes the notion of measurement error. For example, a common
nonlinear measurement equation with additive error is

z = h(x) + w ,
48 2 The Poisson Point Process

where h(x) is the measurement the sensor produces of a target at x in the absence
of noise, and the error w is a zero mean Gaussian distributed with covariance matrix
Σ. The conditional pdf form of the very same equation is N (z | h(x), Σ). The pdf
form is general and not limited to additive noise, so it is used here. Because (z | x)
is a pdf,

(y | x) dy = 1
T

for every x ∈ S.
Now, as in the previous section, let ξ = (m, {x1 , . . . , xm }) be the PPP realiza-
tion and λ(x) the PPP intensity function. Each point x j is observed by a sensor. The
sensor generates a measurement z j ∈ T ≡ Rκ , κ ≥ 1 for the target x j . The pdf of
this measurement is (y | x). In words, (z j | x j ) is the pdf of z j conditioned on x j .
Let η = (m, {z 1 , . . . , z m }). Then η is a realization of a PPP defined on the range
T of the pdf . To see this, it is only necessary to follow the same reasoning used to
establish (2.83). The intensity function of this PPP is

ν(y) = (y | x) λ(x) dx , y ∈ T . (2.86)
S

The PPP (Ξ ) is called a “measurement” process because it includes the effects of
measurement errors. It is also an appropriate name for many applications, including
tracking. (It is called a translated process in [119, Chapter 3].)

Example 2.12 PPP Target Modeling. This example is multi-purpose. At the sim-
plest level, it is merely an example of a measurement process. Another purpose is
described shortly. For concreteness, the example is presented in terms of an active
sonar sensor. Such sensors generate a measurement of target location by transmitting
a “ping” and detecting the same ping after it reflects off a target, e.g., a ship. The sen-
sor estimates target direction θ from the arrival angle of the reflected ping, and it esti-
mates range r from the travel time difference between the transmitted and reflected
ping. In two dimensions, target measurements are range, r = (x 2 + y 2 )1/2 , and
angle, θ = arctan(x, y). In the notation above,
2
(x + y 2 )1/2
h(x, y) = . (2.87)
arctan(x, y)

The errors in these measurements are assumed to be additive zero mean Gaussian
distributed with variances σr2 and σθ2 , respectively. The measurement pdf condi-
tioned on target state is therefore

(r, θ | x, y) = N r ; (x 2 + y 2 )1/2 , σr2 N θ ; arctan(x, y), σθ2 . (2.88)
2.11 Stochastic Transformations 49

Now consider a stationary target modeled as a PPP with intensity function

λc (x, y) = c N x ; x0 , σx2 N y ; y0 , σ y2 , (2.89)

where c = 200, x0 = 6, and y0 = 0.

The other purpose of this example is to ask–but not to answer–a question: what
meaning, if any, can be assigned to a PPP model for physical targets? If λc (x, y)
were an a priori pdf, the target model would be interpreted in a standard Bayesian
manner. However, PPP intensity function is not a pdf. This important question is
answered in Chapter 6. Given the PPP target model, the predicted measurement
intensity function is, from (2.86),
∞ ∞
ν(r, θ ) = (r, θ | x, y)λc (x, y) dx dy . (2.90)
−∞ −∞

a b

c d

Fig. 2.4 The predicted measurement PPP intensity function in polar coordinates of a Gaussian
shaped PPP intensity function in the x-y plane: σx = σ y = 1, σr = 0.1, σθ = 0.15 (radians),
and c = 200, x0 = 6, y0 = 0
50 2 The Poisson Point Process

Figure 2.4a, b give the intensities (2.89) and (2.90), respectively. A realization of the
PPP with intensity function λc (x, y) generated by the two step procedure is given
in Fig. 2.4c. Randomly perturbing each of these samples gives the realization in
Fig. 2.4d. The predicted intensity ν(r, θ ) is nearly Gaussian in the r -θ plane. If the
likelihood function (2.88) is truncated to the semi-infinite strip (2.82), the predicted
intensity (2.90) is also restricted to the semi-infinite strip.

2.12 PPPs on Other Spaces

Defining PPPs on state spaces other than Rm enables them to model more complex
phenomena. The two spaces considered in more detail in this section are discrete
spaces and discrete-continuous spaces. Both are natural extensions of the underlying
PPP idea set. PPPs are defined on the discrete space of countably infinite isolated
points (e.g., lattices) in [17, Problem 2.4.3]. PPPs on discrete spaces are discussed
in Section 2.12.1.
PPPs are defined on a discrete-continuous augmented space in Section 2.12.2 .
These augmented spaces are used in Chapter 6 for joint detection and tracking. The
augmented space is S + ≡ S ∪φ, where φ is an arbitrary point not in S. Augmented
spaces have been used for many years for theoretical purposes, but are not so often
used in applications. The first use of S + in a tracking application seems to be due
to Kopec [64] in 1986.
It is straightforward to see from the discussion below that S is easily augmented
with any finite or countable number of discrete points. Multiply augmented spaces
are potentially useful provided the discrete points are meaningfully interpreted in
the application.
PPPs are defined on locally compact, separable, Hausdorff spaces in [79, p. 1]
and [57, p. 4]. Concrete examples of this general space include the spaces Rn and
the discrete and discrete-continuous spaces. This book is not the right place to delve
further into topological details (i.e., compact neighborhoods, separability, Hausdorff
spaces, etc.), except to say that separability implies that the general space has at most
a countable number of isolated points. A more relaxed discussion of what is needed
to define PPPs on spaces other than Rm is found in [63, Chapter 2].

2.12.1 Discrete Spaces

Let Φ = { φ1 , φ2 , . . .} denote a finite or countably infinite set of discrete isolated

points. The definition in [17] is for homogeneous PPPs on a lattice, that is, on an
equi-spaced grid of isolated points. More generally, a nonhomogeneous PPP Ξ on
a countable discrete space is defined as a sequence of independent Poisson random
variables {N1 , N2 , . . .} with (dimensionless) parameter vector

λ = {λ1 , λ2 , . . .}.
2.12 PPPs on Other Spaces 51

The pdf of N j is, from (2.4),

n
−λ j
λj j
p N j (n j ) = e , n j ≥ 0. (2.91)
n j!

The intensity of Ξ on the discrete space Φ is defined to be the intensity vector λ.

With this definition, realizations of Ξ on a specified finite subset R ⊂ Φ are gener-
ated by sampling independently each of the Poisson variates in R. The immediate
advantage of this definition is that it is very clear that PPP realizations have repeated
points in R, so it is not orderly. This contrasts sharply to PPPs on continuous spaces.
An essentially equivalent definition is consistent with the two-step generation
procedure for defining PPPS on continuous spaces. Let

μ(R) = λj .
j∈R

In Step 1, the total number of samples n is drawn from the Poisson random variable
with parameter μ(R). In Step 2, these n samples, denoted by φx j , are i.i.d. draws
from the multinomial distribution with pdf
% (
λj
: j ∈R .
μ(R)

The integers x j range over the set of indices of the discrete points in R, but they are
otherwise unrestricted. The PPP realization is

ξ = (n, {φx1 , . . . , φxn }).

Nothing prevents the same discrete point, say φ j ∈ R, from occurring more than
once in the list {φx1 , . . . , φxn }; that is, repeated samples of the points in R are per-
mitted. The number n j of occurrences of φ j ∈ R as a point of the PPP realization ξ
is a Poisson distributed random variable with parameter λ j and pdf (2.91). Because
of Poisson’s gambit, these Poisson variates are independent. The two definitions are
therefore equivalent.
The event space of PPPs on Φ is

E(R) = {(0, ∅)} ∪∞
n=1 (n, {φx1 , . . . , φxn }) : φx j ∈ R, j = 1, . . . , n .
(2.92)

Except for the small change in notation that highlights the indices x j , it is identical
to (2.1). The pdf of the unordered realization ξ is

pΞ (ξ ) = e− j∈R λj
λx j . (2.93)
j∈R
52 2 The Poisson Point Process

This is the discrete space analog of the continuous space expression (2.12). The
expectation operator is changed only in that integrals are everywhere replaced by
sums over the discrete points of R ⊂ Φ. The notions of superposition and thinning
are also unchanged.
The intensity functions of transition and measurement processes are similar to
(2.83) and (2.86), but are modified to accommodate discrete spaces. The transition
pdf Ψ (φ j | φi ) is now a transition matrix whose (i, j)-entry is the probability that
the discrete state φi maps to the discrete state φ j . The intensity of the transition
process Ψ (Ξ ) is

ν(φ j ) = Ψ (φ j | φi ) λ(φi ) , (2.94)

φi ∈Φ

where the vector λ is the intensity vector of Ξ .

Measurement processes are a little different from transition processes because the
conditioning is more general. Let the point measurement be φ j ∈ Φ. It is desirable
(see, e.g., Section 5.2) to define the conditioning variable to take values in either
discrete or continuous spaces, or both. For the conditioning variable taking values x
in the continuous space S, the measurement pdf (φ j | x) is probability of obtaining
the measurement φ j given that the underlying state is x ∈ S. The measurement
intensity vector is therefore

ν(φ j ) = (φ j | x) λ(x) dx , (2.95)
S

where λ(x) is the intensity function of a PPP, say Υ , on the state space S. If the
conditioning variable takes values u in a discrete space U, the pdf (φ j | u) is the
probability of φ j given u ∈ U and the measurement intensity vector is

ν(φ j ) = (φ j | u) λ(u), (2.96)

u∈U

where in this case λ(u) is the intensity vector of the discrete PPP defined on U. The
discrete-continuous case is discussed in the next section.

Example 2.13 Histograms. The cells {R j } of a histogram are probably the most
natural example of a set of discrete isolated points. Consider a PPP Ξ defined on
the underlying continuous space in which the histogram cells reside. Aggregating,
or quantizing, the i.i.d. points of realizations of Ξ into the nonoverlapping cells
{R j } and reporting only the total counts in each cell yields a realization of a PPP
on a discrete space with points φ j ≡ R j . The intensity vector of this discrete PPP,
call it Ξ H , are

λj = λc (s) ds ,
Rj
2.12 PPPs on Other Spaces 53

where λc (s) is the intensity function of Ξ . By the independent scattering, since the
histogram cells {R j } are disjoint, the number of elements in cell R j is Poisson
distributed with parameter λ j . The fact that the points φ j are, or can be, repeated in
realizations of the discrete PPP Ξ H hardly needs saying.
Concrete examples of discrete spaces occur in emission and transmission tomog-
raphy. In these examples, the points in Φ correspond to the individual detectors in a
detector array, and the number of occurrences of φ j in a realization is the number of
detected photons (or other particle) in the j-th detector. These topics are discussed
in Chapter 5.

2.12.2 Discrete-Continuous Spaces

This subsection begins with a discussion of the “augmented” space used in
Chapter 6 for joint detection and tracking. It concludes with an example showing
the relationship between multiply-augmented spaces and non-orderly PPPs.
The one point augmented space is S + ≡ S ∪ φ, where φ is a discrete (isolated)
point not in S. As is seen shortly, a PPP on S + is not orderly because repeated points
φ occur with nonzero probability.
Several straightforward modifications are needed for PPPs on S + . The intensity
is defined for all s ∈ S + . It is an intensity function on S, and hence orderly on S;
however, it is not orderly on the full space S + . The number λ(φ) is a dimensionless
quantity, unlike λ(s) for s ∈ S. The bounded sets of S + are R and R+ ≡ R ∪ φ,
where R is a bounded subset of S. Integrals of λ(s) over bounded subsets of S +
must be finite; thus, the requirement (2.2) holds and is supplemented by the discrete-
continuous integral

0 ≤ λ(s) ds
R+

≡ λ(φ) + λ(s) ds < ∞. (2.97)
R

The event space of a PPP on the augmented space is E(R+ ). The event space E(R)
is a proper subset of E(R+ ).
Realizations are generated as before for the bounded sets R. For bounded sets
R+ , the integrals in (2.4) are replaced by the integrals over R+ as defined in (2.97);
otherwise, Step 1 is unchanged. Step 2 is modified slightly. If n is the outcome of
Step 1, then n i.i.d. Bernoulli trials with probabilities

λ(φ)
Pr[φ] =
λ(φ) + R λ(s) ds

R λ(s)

ds
Pr[R] =
λ(φ) + R λ(s) ds
54 2 The Poisson Point Process

are performed. The number n(φ) is the number of occurrences of φ in the realiza-
tion. The number of i.i.d. samples drawn from R is n − n(φ).
The number n(φ) is a realization of a random variable, denoted by N (φ), that is
Poisson distributed with parameter λ(φ). This is seen from the discussion in Sec-
tion 2.9.2. The expected number of occurrences of φ is λ(φ). Also, probability of
repeated occurrences of φ is never zero. The possibility of repeated occurrences of
φ is important to understanding augmented PPP models for applications such as
multitarget tracking.
The probability that the list {x1 , . . . , xn } is a set is the probability that no more
than one realization of φ occurs in the n Bernoulli trials. Consequently, if λ(φ) > 0,
the probability that the list {x1 , . . . , xn } is a set is strictly less than one. In aug-
mented spaces, random finite sets are more accurately describes as random finite
lists.
The likelihood function and expectation operator are unchanged, except that the
integrals are over either R or R+ , as the case may be. Superposition and thinning
are unchanged. The intensity of the diffusion and prediction processes are also
unchanged from (2.83) and (2.86), except that the integrals are over S + .
It is necessary to define the transitions Ψ (y | φ) and Ψ (φ | y) for all y ∈ S,
as well as Ψ (φ | φ) = Pr[φ | φ]. The measurement, or data, likelihood function
L( · | φ) must also be defined. These quantities have natural interpretations in target
tracking.
Example 2.14 Tracking Interpretations. A one-point augmented space is used in
Chapter 6. The state φ is the hypothesis that no target is present in the tracking
region R, and the point x ∈ R is the hypothesis that a target is present with state x.
State transitions and the measurement likelihood function are interpreted in tracking
applications as follows:

• Ψ (y | φ) is the likelihood that the transition initiates a target at the point y ∈ R.

• Ψ (φ | y) is the probability that the transition terminates a target at the point
y ∈ R.
• Ψ (φ | φ) is the probability that no target is present both before or after transition.
• L( · | φ) is the likelihood that the data are clutter-originated, i.e., the likelihood
function of the data conditioned on the absence of a target in R.

Initiation and termination of target track is therefore an intrinsic part of the tracking
function when using a Bayesian tracking method (see Appendix C) on an augmented
state space S + .
As is seen in Chapter 6, augmented spaces play an important role in simplifying
difficult enumerations related to joint detection and tracking of targets. Only one
state φ is considered here, but there is no intrinsic limitation.

Example 2.15 Non-Orderly PPPs. The intensity of general PPPs on a continuous

space S is given in (2.3) as the sum of an ordinary function λ(s) and a countable
2.12 PPPs on Other Spaces 55

number of weighted Dirac delta functions located at the isolated points {a j }. The
points {a j } are identified with the discrete points Φ = {φ j }. Let S + = S ∪ Φ.
Realizations on the augmented space S + generated in the manner outlined above for
the one point augmented case map directly to realizations of the non-orderly PPP
on S via the identification φ j ↔ a j . Other matters are similarly handled.
Chapter 3
Intensity Estimation

How can we know the dancer from the dance?

William Butler Yeats, Among School Children, 1928

Abstract PPPs are characterized, or parameterized, by their intensity function.

When the intensity is not fully specified by application domain knowledge (e.g.,
the physics), it is necessary to estimate it from data. These inverse problems
are addressed by the method of maximum likelihood (ML). The Expectation-
Maximization method is used to obtain estimators for problems involving super-
position of PPPs. Gaussian sums are developed as an important special case. Esti-
mators are obtained for both sample and histogram data. Regularization methods
are presented for stabilizing Gaussian sum algorithms.

Keywords Maximum likelihood estimation · Gaussian crosshairs · Edge effects ·

Expectation-Maximization (EM) · Superposed intensities · Affine Gaussian
sums · Histogram data · Regularization · Parametric tying · Luginbuhl’s broadband
harmonic spectrum · Bayesian methods

The basic geometric and algebraic properties of PPPs are discussed in Chapter 2.
These fundamental properties are exploited in this chapter to obtain algorithms for
estimating the PPP intensity function from data using the method of maximum
likelihood (ML). Estimation algorithms are critically important in applications in
which intensity functions are not fully determined a priori. Many heuristic methods
have also been proposed in the literature; though not without interest, they are not
discussed here.
The fundamental notion is that the intensity function is specified parametrically,
and that appropriate numerical values of one or more of the parameters are unknown
and must be estimated from collected data. Two kinds of intensity models are nat-
ural for PPPs: those that do not involve superposition, and those that do. Moreover,
two kinds of data are natural for PPPs: sample data and histogram data (see below
in Section 3.1). This gives four combinations, each of which requires a different,
though closely related, ML algorithm. In all cases, the parameterized intensity is
written λ(s; θ ), where the parameter vector θ is estimated from data.
The first section of this chapter discusses intensity models that do not involve
superposition. The natural method of ML estimation is used: Find the right

R.L. Streit, Poisson Point Processes, DOI 10.1007/978-1-4419-6923-1_3, 57

C Springer Science+Business Media, LLC 2010
58 3 Intensity Estimation

likelihood function for the available data, set the gradient with respect to the param-
eters to be estimated equal to zero, and solve. The likelihood functions discussed in
this section correspond to PPP sample and histogram data. This takes care of two of
the four combinations mentioned above.
The remaining sections are all about intensity functions that involve superposi-
tion, or linear combinations, of intensity functions. The EM method is the natural
method for deriving ML estimation algorithms, or estimators, in this case. Read-
ers with little or no familiarity with it may want to consult Appendix A or other
references before reading these sections. Other readers, who wish merely to get
quickly to the algorithm, are provided with two tables that outline the steps of the
EM algorithm for affine Gaussian sums, perhaps the most important example. All
readers, even those for whom EM may have lost some of its charm, are provided
with enough details to “read along” with little need to lift a pencil. These details
reside primarily in the E-step and can be skipped at the reader’s pleasure with little
loss of continuity or essential content.
Parametric models of superposed intensities come in at least two forms—
Gaussian sums and step functions. In the latter, the steps often correspond to pixels
(or voxels) in an image. With sufficiently many terms, both models can approximate
any continuous intensity function arbitrarily closely. For that reason they are some-
times described as nonparametric even though they have parameters that must be
either specified or estimated. To distinguish them from truly nonparametric “mod-
ern” sequential Monte Carlo (SMC) models involving particles, Gaussian sums and
step functions are herein referred to as parametric models.
Gaussian sum models are the main objects of study in this chapter. Step function
models are also very important, but they are discussed mostly in the Chapter 5 on
medical imaging. SMC methods are discussed in the context of the tracking appli-
cations in Section 6.3.

3.1 Maximum Likelihood Algorithms

The ML estimate is, by definition, the global maximum. The problem in practice is
that the likelihood function is often plagued by multiple local maxima. For example,
when the intensity is a Gaussian shape of unknown location plus a constant (see
(4.38) below), the likelihood function of the location parameter may have multiple
local maxima. When it is important to find the global maximum, locally convergent
algorithms are typically restarted with different initializations, and the ML estimate
is taken to be the estimate with the largest likelihood found. This chapter is con-
tent to investigate algorithms that converge to a local maximum of the likelihood
function.
Two important kinds of data for PPPs are considered. One kind is PPP sample
data, sometimes called “count record” data, which means simply that the available
data set is a realization of a PPP with intensity λ(s; θ ). The other is histogram data,
in which a realization of the PPP is represented not by the points themselves, but by
3.1 Maximum Likelihood Algorithms 59

the number of points that fall in a fixed number of nonoverlapping cells that partition
the space S. Such data are equivalent to a realization of a PPP on a discrete space.
In practice, both kinds of data must be carefully collected to avoid unintentionally
altering its essential Poisson character. The quality of the ML estimators may be
adversely affected if the data do not match the underlying PPP assumption.

3.1.1 Necessary Conditions

Sample data comprise the points x ≡ (x1 , . . . , xm ), m ≥ 1, of a PPP realization

on R ⊂ S ⊂ Rn x , where n x is the dimension of the points of the PPP. These data are
conditionally independent given their total number m. It is implicit in this statement
that the points x j are in R and that the intensity of the PPP is estimated only on
R, not the full set S. The order of the points in x is irrelevant. From (2.12), the
logarithm of the pdf of x given the PPP intensity λ(s ; θ ) is

Liid (θ ; x) = log p (x ; θ )

m

= − λ (s ; θ ) ds + log λ x j ; θ . (3.1)
R j=1

The maximum likelihood estimate of θ is

θ̂ M L ≡ arg max Liid (θ ; x). (3.2)

Assuming differentiability, the natural way to compute θ̂ M L is to solve the necessary

conditions ∇θ L (θ ; x) = 0:

m
1
∇θ λ x j ; θ = ∇θ λ (s ; θ ) ds. (3.3)
j=1
λ xj ; θ R

This system of equations may have multiple solutions, depending on the form of the
intensity λ(x ; θ ) and the particular PPP sample data x.
A similar system of equations holds for histogram data. Adopting the notation of
Section 2.9.1 for a K cell histogram with counts data m 1:K = {m 1 , . . . , m K }, the
logarithm of the pdf is, from (2.62),

K
Lhist (θ ; m 1:K ) = − λ (s ; θ ) ds + log(m j !)
R j=1
& '

K
+ m j log λ (s ; θ ) ds . (3.4)
j=1 Rj
60 3 Intensity Estimation

Taking the gradient gives the necessary conditions

K
mj
∇θ λ(s ; θ ) ds = ∇θ λ (s ; θ ) ds. (3.5)
j=1 R j λ(s ; θ ) ds Rj R

As for PPP sample data, the system may have multiple solutions.
It is necessary to verify that the loglikelihood function of the data is concave
at the ML solution, that is, that the negative Hessian matrix of the loglikelihood
function is positive definite. In practice, intuition often replaces verification.

3.1.2 Gaussian Crosshairs and Edge Effects

This example is a two dimensional problem that involves training the “crosshairs” of
a receiver on a dim light source. In optical communications, this means estimating
the brightest direction to the source [114, 115, 119, Section 4.5]. The receiver in this
case is a photodetector with a (flat) photoemissive surface. (Photoemissive materials
give off electrons when they absorb energetic photons.) Photons are detected over
a specified finite time period. Recording
the number and locations
of each detected
photon provides i.i.d. data x = x j ∈ R2 : j = 1, . . . , m . This kind of data may
or may not be practical for large m, depending on the application. Feasible or not,
it is nonetheless interesting to consider. In practice, the photodetector surface R is
often divided into a number of disjoint regions that constitute histogram cells, and a
count of the number of photons arriving in each cell is recorded. The photodetector
surface is assumed to be rectangular of known size, and its center is taken as the
coordinate system origin. The axes are taken parallel to the sides of R.

Example 3.1 Sample Data. The intensity of the light distributed across the photode-
tector surface is proportional to a Gaussian pdf. The 2 × 2 covariance matrix Σ
determines the elliptical shape of the “spotlight”. For
simplicity,
suppose the shape
is circular with known width ρ, so that Σ = Diag ρ 2 , ρ 2 . The light intensity is

λ(x ; I0 , μ) = I0 N (x ; μ, Σ) , (3.6)

where I0 /(2 π ρ 2 ) is the peak intensity and the vector μ = [μ1 , μ2 ]T ∈ R2 is the
location of the peak. The parameters to be estimated are θ = (I0 , μ1 , μ2 ). The
only constraint is that I0 > 0.
For sample data, the necessary conditions yield three equations in three
unknowns. Setting the derivative in (3.3) with respect to I0 to zero gives the ML
estimate
m
Iˆ0 = , (3.7)
R N (s ; μ, Σ) ds
3.1 Maximum Likelihood Algorithms 61

where the integral is a double integral over the photoemissive surface. The estimate
Iˆ0 automatically satisfies the nonnegativity constraint. The value of μ in (3.7) is
the ML estimate μ̂, so it is coupled to the necessary equations for μ. Setting the
gradient1 in (3.3) with respect to the vector μ equal to zero, substituting the estimate
(3.7), and rearranging terms gives [114]

1

m
s N (s ; μ, Σ) ds
R = xj . (3.8)
R N (s ; μ, Σ) ds m
j=1

The left hand side is the mean vector of the Gaussian pdf restricted to the set R.
The equation thus says that ML estimate μ̂ is such that the conditional mean on
R equals the sample mean. The conditional mean equation is uniquely solvable for
rectangular domains, as shown in Appendix B.
If the bulk of the source distribution is visible on the photoemissive surface, then
ˆ
R N (s ; μ, Σ) ds ≈ 1. This gives the approximation I0 ≈ m. Also, the left
hand side of (3.8) is approximately the unconditional mean μ, so the ML estimate
μ̂ approximates the sample mean in this case.
Example 3.2 Compensating for Edge Effects. When the peak of the light distribu-
tion lies near the edge of the photodetector, many source photons are not counted.
This example shows that the estimator (3.8) automatically compensates for the
missed measurements. Intuitively, this is because the sample mean in (3.8) is, by
its very nature, an estimate of the conditional mean expressed analytically by the
left hand side.
A realization of the PPP (3.6) with intensity I0 = 250 and mean μ =
[0.8, 0.5]T and ρ = 0.5 is generated. Only 143 of the points in the realization
fall on the photodetector surface, which is taken to be the square (−1, 1) × (−1, 1).
These points are depicted in the left hand side of Figure 3.2 . The ML estimated
mean is computed by solving (3.8) by the method discussed in Appendix B,
giving μ̂ M L = [0.75007, 0.53021]T . The error is remarkably small, especially
compared to the error in the sample mean of the 143 detected photons, namely
[0.49600, 0.37698]T . The estimated intensity, computed from (3.7) with μ̂ M L ,
gives Iˆ0 = 250.72 .
The right hand side of Fig. 3.1 repeats the procedure but with the mean shifted
further into the corner to [0.8, 0.8]T . Only 112 of the PPP samples fall on the
photodetector. The ML estimate is μ̂ M L = [0.74687, 0.79664]T . The sample mean
for the 112 detected photons is, by contrast, [0.49445, 0.51792]T . The estimated
intensity is Iˆ0 = 245.57.
Example 3.3 Histogram Data. ML estimates of the parameters of (3.6) are now
given for histogram data. The necessary condition (3.5) for λ0 gives

1 Use the identity ∇μ N (s ; μ, R) = N (s ; μ, R) R −1 (s − μ) .

62 3 Intensity Estimation

Fig. 3.1 ML mean and coefficient estimates for the model (3.6) obtained by solving equation (3.8).
The mean in the left hand figure is (0.8, 0.5). The mean in the right hand figure is (0.8, 0.8). Filled
circle = μ̂ M L ; Open circle = uncorrected sample mean; Square = true mean

K
mj
Iˆ0 =
j=1
. (3.9)
R N (s ; μ, Σ) ds
K
This estimator is identical to (3.7) since m = j=1 m j . It is also coupled to the
estimate μ̂. Manipulating the necessary conditions for μ in much the same way as
done in (3.8) gives

R j s N (s ; μ, Σ) ds
K
s N (s ; μ, Σ) ds 1
R = mj . (3.10)
R N (s ; μ, Σ) ds m R j N (s ; μ, Σ) ds
j=1

In general, this equation is solved for μ̂ by numerical methods. Unlike (3.8), it is

not clear whether or not it has a unique solution.
As before, suppose that most of the light from the source falls on the photoemis-
sive surface, so that R N (s ; μ, Σ) ds ≈ 1. Then left hand side approximates
the ML estimate μ̂, and the right hand side approximates the histogram mean, that is,

K
μ̂ ≈ m j γ j (μ̂) ,
m
j=1

where

R j s N s ; μ̂, Σ ds
γ j μ̂ =
R j N s ; μ̂, Σ ds

is the conditional mean of cell R j . Since γ j μ̂ ∈ R j regardless of μ̂, replace
the Gaussian density in each cell with a uniform density. This final approximation,
3.2 Superposed Intensities with Sample Data 63

reasonably good when all histogram cells are small, gives

K
μ̂ ≈ m j γ̄ j ,
m
j=1

where

1
γ̄ j = s ds
Rj Rj

is the center of cell R j .

3.2 Superposed Intensities with Sample Data

Parameter estimates obtained by solving the necessary conditions (3.3) and (3.5)
are coupled, as Examples 3.1 and 3.3 show. Generally, the more parameters are
estimated the more difficult the necessary conditions are to solve, and the more data
that are needed to obtain good parameter estimates.
An alternative to solving the ML necessary conditions directly is the EM method.
Background on this method is given in Appendix A. The EM method is a hand-in-
glove fit for very general applications involving superposition. In the case of PPPs,
given the total number of data, the sample data are conditionally independent, and
the missing data are especially simple—it consists of discrete indices that identify
the components of the PPP superposition that generate the available data. (Other
choices of missing data are possible, but this particular choice is by far the most
common.) The EM method, by identifying an appropriate set of missing data, breaks
the coupling of the necessary conditions in such a way that the parameters of each
PPP in the superposition are estimated separately. These smaller systems are often—
but not always—easy to solve. The resulting reduction in the number of equations
that are solved simultaneously is a big advantage numerically.
Because coupling is intrinsic to the estimation problem, there is a price to be
paid for the dimensional advantage that EM provides. One price for decoupling into
smaller systems is iteration, that is, the smaller systems of equations are solved
repeatedly in a coordinate relaxation procedure. The other price is convergence.
The good news is that theory guarantees convergence to a stationary point of the
likelihood function, provided the likelihood function is bounded above. For more
on the importance of boundedness, see Section 3.4. In practice, if it is bounded, the
EM iteration nearly always converges to a local maximum. The bad news is that the
convergence rate is ultimately very slow, only linear in fact. In practice, however,
the first few iterations are commonly observed to result in significant improvements
in the likelihood function.
A basic review of the EM method is provided in Appendix A.2. As is discussed
there, the EM method iterates over two fundamental procedures, called the E-step
and the M-step. The E-step is typically a symbolic step which is only performed
64 3 Intensity Estimation

once, while the M-step is the parameter update step. For readers unfamiliar with
EM, there may no easier way to learn it than by first reading the appendix, or some
equivalent discussion elsewhere, and subsequently plunging headlong into the prob-
lem addressed in the next section. This problem is an excellent first application of
EM. Readers new to EM are forewarned that much of the action lies in understand-
ing and manipulating subscripts. Further background on EM and its many variations
are given in [80, 81].

3.2.1 EM Method with Sample Data

The intensity λ(x ; θ ) is the superposition of the intensity functions of L indepen-
dent PPPs:

L
λ(x ; θ ) = λ (x ; θ ) , (3.11)
=1

where the parameter vector of the -th PPP is θ , and θ = (θ1 , . . . , θ L ) is the
parameter vector for the superposition. Different components of λ(x ; θ ) can take
different parametric forms. The parameters of the intensities λ (x ; θ ) are assumed
parametrically untied, i.e., there is no functional relationship between the vectors θi
and θ j for i = j. This simplifying assumption is unnecessary theoretically, as well
as inappropriate in some applications. An example of the latter is when the centroids
of λ (x ; θ ) are required to form an equispaced grid whose orientation and spacing
are to be estimated from data.

3.2.1.1 E-step
The natural choice of the “missing data” are the conditionally independent random
indices k j , 1 ≤ k j ≤ L , that identify which of the superposed PPPs generated the
point x j . Let

xc = {(x1 , k1 ), . . . , (xm , km )} (3.12)

denote the complete data. (In the language of Section 8.1, xc is a realization of a
marked PPP.) For = 1, . . . , L, let

xc () = (x j , k j ) : k j = . (3.13)

Let n c () ≥ 0 denote the number of indices j such that k j = , and let

ξc () = (n c (), xc ()) . (3.14)

3.2 Superposed Intensities with Sample Data 65

It follows from the definition of k j that ξc () is a realization of the PPP whose
intensity is λ ( · ). The pdf of ξc () is (from (2.12))

p (ξc () ; θ ) = exp − λ (s ; θ ) ds λ x j ; θ .
R j : k j =

The superposed PPPs are independent, so

L
p (xc ; θ ) = p (ξc () ; θ )
=1

m

= exp − λ(s ; θ ) ds λk j x j ; θk j .
R j=1

The loglikelihood function of θ given xc is

L (θ ; xc ) = log p(xc ; θ )

m

= − λ (s ; θ ) ds + log λk j x j ; θk j . (3.15)
R j=1

The conditional pdf of the missing data (k1 , . . . , k L ) is

p (xc ; θ ) m
λk j x j ; θk j
p (k1 , . . . , k L ; θ ) = = , (3.16)
p (x ; θ ) λ xj ; θ
j=1

where log p (x ; θ ) is given by (3.1). From (3.11), it is easily verified that

L m
λk j x j ; θk j
= 1, (3.17)
k1 , ..., kn = 1 j=1
λ xj ; θ

and

L
λk j x j ; θk j λk j x j ; θk j
= . (3.18)
k1 , ..., k j−1 , k j+1 , ..., km = 1
λ xj ; θ λ xj; θ

These identities are very useful in the E-step of the EM method.

Let n = 0, 1, . .. denote the EM iteration index, and let the initial feasible
(n) (n) (n)
value of θ be θ = θ1 , . . . , θ L , the EM auxiliary function is the conditional
expectation defined by
66 3 Intensity Estimation

Q θ ; θ (n) = E L (θ ; xc ) ; θ (n)

m λk (n)

L j x j ; θkj
≡ ··· L (θ ; xc ) . (3.19)
λ xj ; θ (n)
k1 =1 km =1 j=1

Substituting (3.15) gives, after interchanging the summation order and using (3.17)
and (3.18),

m
L λ x j ; θ (n)

Q θ ; θ (n) = − λ(s ; θ ) ds +
(n)
log λ x j ; θ .
R j=1 =1
λ xj ; θ
(3.20)

Equivalently, Q θ ; θ (n) separates into the L term sum,

L
Q θ ; θ (n) = Q θ ; θ (n) , (3.21)
=1

where

λ

j
m x ; θ
(n)

Q θ ; θ (n) = − λ (s ; θ ) ds +
(n)
log λ x j ; θ .
R j=1
λ xj ; θ
(3.22)

This completes the E-step.

3.2.1.2 M-step
The M-step maximizes (3.21) over all feasible θ , that is,

θ (n+1) = arg max Q θ ; θ (n) .
θ

In general the maximization requires solving a coupled system of equations involv-

ing every parameter in θ . However, for superposed intensities, the EM method gives
the separable auxiliary function (3.21). Therefore,
the
required M-step maximum is
found by maximizing the expressions Q θ ; θ (n) separately. Let

(n+1)
θ = arg max Q θ ; θ (n) , 1≤≤ L. (3.23)
θ

(n+1)
Then θ satisfies the necessary conditions:
3.2 Superposed Intensities with Sample Data 67

m
1
w x j ; θ (n) ∇θ λ x j ; θ = ∇θ λ (s ; θ ) ds , (3.24)
j=1
λ x j ; θ R

where the (dimensionless) weights are given by

(n)
λ x j ; θ
w x j ; θ (n) = . (3.25)
λ x j ; θ (n)

Solving the
uncoupled systems (3.24) gives the recursive update for θ , namely
(n+1) (n+1)
θ (n+1) = θ1 , . . . , θL . This completes the M-step.

3.2.2 Interpreting the Weights

The weight w x j ; θ (n) is the probability that the point x j is generated by the -th
PPP given the current parameter set θ (n) . To see this, let dx denote a multidimen-
sional infinitesimal located at the point x j . The probability that a point is generated
by the -th PPP in the infinitesimal (x j , x j + dx) is

λ x j ; θ(n) |dx| ,

where |dx| is the infinitesimal volume. On the other hand, the probability that a
point is generated by the superposed PPP in the same infinitesimal is

λ x j ; θ (n) |dx| .

Therefore, the probability that x j is generated by the -th PPP conditioned on the
event that it was generated by the superposed PPP is the ratio of these two proba-
bilities. Cancelling |dx| in the ratio gives the weight w x j ; θ (n) . A more careful
proof that avoids the use of infinitesimals is omitted.
The solution of (3.24) depends on the form of λ (x; θ ). It differs from the
direct
ML necessary
conditions (3.3) primarily by the presence of the weights
w x j ; θ (n) . Several illustrative examples are now given for the components of
the superposed intensity λ(x ; θ ) in (3.11).

3.2.3 Simple Examples

Example 3.4 Homogeneous Component. Let the -th PPP be homogeneous with
intensity
λ (s ; θ ) ≡ I for all s ∈ R. (3.26)
68 3 Intensity Estimation

The only parameter is θ = I . The full parameter vector θ includes not only θ
but also the parameters θ j , j = , of the other PPPs. Given θ (n) and, in particular,
θ(n) = I(n) , the EM update I(n+1) is, from (3.24),

I(n)

m
(n+1) 1
I = , (3.27)
|R| λ x ; θ (n)
j=1 j

where |R| = R ds. The fraction
(n)
I
w x j ; θ (n) =
λ x j ; θ (n)

is the probability that the point x j is generated by the -th PPP, so the summation
(3.27) is the expected number of points generated by the PPP conditioned on the
current parameter set θ (n) . The division by |R| converts this number to intensity.
EM updates for θ j , j = , depend on the form of the PPP intensities.

Example 3.5 Scaling a Known Component. Let the -th PPP intensity be

λ (s ; θ ) = I f (s) , (3.28)

where I is unknown and f (s) is a specified intensity on R. In contrast to Example

3.4, I is dimensionless because the units are carried by f (s). The estimated param-
(n) (n)
eter is θ = I . Given θ (n) and thus θ = I , the EM update is, from (3.24),

1
m
(n+1)
I = w x j ; θ (n) , (3.29)
R f (s) ds j=1

where the weights are, from (3.25),

(n)
(n) I f x j
w x j ; θ = .
λ x j ; θ (n)

The denominator in (3.29) corrects for the intensity that lies outside of R. Substi-
tuting into (3.29) and moving I(n) outside the sum gives

(n)
(n+1) I
m
f x j
I =
(n)
, (3.30)
R f (s) ds j=1 λ x j ; θ

an expression that reduces to (3.27) for f (s) ≡ 1. This expression is identical to

(3.7) when L = 1 and f (·) = N (·).
3.2 Superposed Intensities with Sample Data 69

Example 3.6 Histograms. It is amusing to examine the superposed intensity that is

a sum of step functions. Let R j ⊂ R, j = 1, . . . , K , denote the specified cells on
which the step function is defined. The step function on the cell R is
%
1, if s ∈ R
f (s) = (3.31)
0, if s ∈
/ R .

The EM recursion for the scaling coefficient I is, from (3.30),

(n)

(n+1) I f x j
I = . (3.32)
|R | λ x j ; θ (n)
x j ∈R

It is seen through the notational fog that the EM algorithm converges in a single
step to

{ j : x j ∈ R j }
Iˆ = ,
|R |

where { · } is the number of points in the set. In other words, the ML estimator is
proportional to the histogram if the cells are of equal size.

3.2.4 Affine Gaussian Sums

Estimating the parameters of Gaussian sums is important in widespread applica-

tions. There is no need when using the EM method to require all the components in
the sum to be Gaussian shaped, i.e., proportional to a Gaussian pdf. Therefore, the
notation of the previous section is continued here to preserve generality. Let the -th
PPP intensity be proportional to a Gaussian pdf, that is, let

λ (s ; θ ) = I N (s ; μ , Σ ) , (3.33)

where the constant I is the total “signal level.” In some applications one or more of
the parameters {I , μ, Σ} are known. The more common possibilities are consid-
ered here.
Case 1. If I is the only estimated parameter because μ and Σ are known, then
θ = I is estimated using the recursion (3.30) with f (x) = N (x ; μ , Σ ).
Case 2. If signal level and location vector μ are estimated,
and Σ is known,
(n) (n) (n) (n) (n+1)
then θ = (I , μ ). Given θ and thus θ = I , μ , the EM updates I
(n+1) (n+1)
and μ are coupled. They are manipulated into a nested form in which μ is
(n+1)
computed as the solution of a nonlinear equation, and then I is computed from
(n+1)
μ . To see this, proceed in the same manner as (3.29) to obtain
70 3 Intensity Estimation

1
m
I(n+1) = w x j ; θ (n) , (3.34)
R N (s ; μ , Σ ) ds j=1

where in this equation μ is the sought after EM update, and the weights are

(n) (n)
I N x j ; μ , Σ
w x j ; θ (n) = . (3.35)
λ x j ; θ (n)

The necessary condition with respect to μ in (3.24) is, after multiplying both sides
by Σ ,

m
(n)

w x j ; θ x j − μ = I N (s ; μ , Σ ) (s − μ ) ds . (3.36)
j=1 R

(n+1) (n+1)
Substituting for I the estimate I from (3.34) and simplifying gives μ as
the solution of a nonlinear equation:
m
(n) x
R s N (s ; μ , Σ ) ds j=1 w x j ; θ j
= m . (3.37)
N (s ; μ , Σ ) ds w x ; θ (n)
R j=1 j

(Compare this expression to the direct ML equation (3.8) to see how the EM method
exploits Bayes Theorem to “split” the data into L parts, one for each component.)
The right hand side of this equation is a probabilistic mean of the data, that is, a
convex combination of {x j }. In general, solving (3.37) for μ(n+1)
requires numerical
methods. The estimate μ (n+1)
is then substituted into (3.34) to
evaluate I(n+1) .
If it is known that R N (s ; μ , Σ ) ds ≈ 1, then R s N (s ; μ , Σ )
ds ≈ μ . Given this approximation, the EM updates to (3.34) and (3.37) are “self
solving”, so that

(n)

m m N

s ; μ , Σ
I(n+1) ≈ w x j ; θ (n) = I(n) (3.38)
j=1 j=1
λ x j ; θ (n)
m
j=1 w x j ; θ (n) x j
μ(n+1)
≈ m . (3.39)
j=1 w x j ; θ (n)

The approximate iteration requires that R N s ; μ(n) , Σ ds ≈ 1 at every EM
(n)
iteration, so it cannot be used if the iterates μ drift too close to the edge of the
set R.
Case 3. Finally, if the signal level, mean vector, and covariance
matrix are all

estimated, then θ = (I , μ , Σ ). Given θ and thus θ = I , μ(n)
(n) (n) (n)
, Σ (n)
,
3.2 Superposed Intensities with Sample Data 71

the EM updates I(n+1) , μ(n+1)

, and Σ(n+1) are coupled. They are manipulated so
that μ(n+1)
and Σ(n+1) are decoupled from I(n+1). However, μ(n+1)
and Σ(n+1)
remain coupled.
To see this, note that equation (3.34) holds here, but with μ and Σ being the
desired EM updates and with the weights now given by

(n) (n) (n)
I N x j ; μ , Σ
w x j ; θ (n) = . (3.40)
λ(x j ; θ (n) )

(n+1)
The gradient in (3.24) with respect to μ and Σ 2 involve I . The gradient
equation for μ is essentially identical to (3.36); the gradient for Σ is very similar
(n+1)
but messier. Substituting the estimate I for I in both equations using (3.34)
(n+1) (n+1)
and simplifying gives μ and Σ as the solution of a coupled system of
nonlinear equations:

m
(n) x
R s N (s ; μ , Σ ) ds j=1 w x j ; θ j
= m (3.41)
(n)
R N (s ; μ , Σ ) ds j=1 w x j ; θ

m T
(n) x − μ
T
R N (s; μ , Σ ) (s − μ ) (s − μ ) ds = j=1 w x j ; θ j x j − μ
m .
(n)
R N (s; μ , Σ ) ds j=1 w x j ; θ
(3.42)

The updated signal level

m
(n+1)
I = w x j ; θ (n) . (3.43)
(n+1) (n+1)
R N s ; μ , Σ ds j=1

is evaluated after
solving for μ(n+1)
and Σ(n+1) .
Again, if R N (s ; μ , Σ ) ds ≈ 1, the left hand sides of the equations (3.41)
and (3.42) are the mean vector μ and covariance matrix Σ , respectively. Given
this approximation, the EM update equations are simply

2Use the matrix identity ∇ R N (s ; μ, R) = 12 N (s ; μ, R) −R −1 + R −1 (s − μ)(s − μ)T

R −1 , where the gradient is written in matrix form as ∇ R = ∂ ρ∂ i j , where R = ρi j .
72 3 Intensity Estimation

m
I(n+1) ≈ w x j ; θ (n) (3.44)
j=1
m
(n+1) j=1 w x j ; θ (n) x j
μ ≈ m (3.45)
j=1 w x j ; θ (n)
m T
j=1 w x j ; θ (n) x j − μ(n+1)
x j − μ(n+1)

(n+1)
Σ ≈ m . (3.46)
(n)
j=1 w x j ; θ

Table 3.1 EM algorithm for affine Gaussian sum with PPP sample data
Given data: x1:m = {x1 , . . . , xm }
L
Fit the intensity: λ(s ; θ) = λbgnd (s) + =1 I N (s ; μ , Σ ), s ∈ R
Estimate the parameters: θ = {(I , μ , Σ ) : = 1, . . . , L}
• FOR = 1 : L, initialize coefficients, means, and covariance matrices:
I (0) > 0, μ (0) ∈ R, and Σ (0) positive definite
• END FOR
• FOR EM iteration index n = 0, 1, 2, . . . until convergence:
– FOR j = 1 : m and = 1 : L, compute:
I (n) N (x j ; μ (n), Σ (n))
w j (n) = L
λbgnd (x j ) + =1 I (n) N (x j ; μ (n), Σ (n))

– END FOR
– FOR = 1 : L, compute:
• N (n) = mj=1 w j (n)
m
• E (n) = N1(n) w j (n) x j
mj=1
• V (n) = N (n)
1
j=1 w j (n) (x j − μ (n)(x j − μ (n))
T

• Solve the coupled system of equations:

N (s ; μ , Σ ) s ds
R = E (n)
R N (s ; μ , Σ ) ds

R N (s ; μ , Σ ) (s − μ )(s − μ )T ds
= V (n)
R N (s ; μ , Σ ) ds

for μ (n + 1) and Σ (n + 1)
• Compute:
N (n)
I (n + 1) =
R N (s ; μ (n + 1), Σ (n + 1)) ds

– END FOR
• END FOR EM iteration (Test for convergence)
• If converged: FOR = 1 : L, Iˆ = I (nlast ), μ̂ = μ (nlast ), Σ̂ = Σ (nlast ) END FOR

The mean and covariance updates are nested in the approximate EM update, that
is, the update μ(n+1)
is used in Σ(n+1) .
Table 3.1 outlines the steps of the EM algorithm for affine Gaussian sums, that
is, for a PPP that is a Gaussian sum plus an arbitrary intensity, λbgnd (s). This
3.3 Superposed Intensities with Histogram Data 73

intensity models the superposition of a PPP whose intensity is a Gaussian sum and
a background PPP of known intensity. In many applications, the background inten-
sity is assumed homogeneous, but this restriction is unnecessary for the estimation
algorithm to converge.

3.3 Superposed Intensities with Histogram Data

The EM method is widely used for superposition (mixture) problems with either
independent or conditionally independent data, but it is versatile and is applicable
to any problem in which useful missing data can be found. In this section EM is
used for histogram data.

3.3.1 EM Method with Histogram Data

Histogram data are difficult to treat because the loglikelihood function (3.4) involves
integrals over the individual histogram cells. The key insight, for those who wish to
skip the notationally burdensome details until need arises, is that the points of the
PPP are the missing data. Said another way, the integrals are sums, and the variables
of integration are—like the index of the summation in the superposition—a very
appropriate choice for the missing data. Other choices are possible, but this is the
choice made here.

3.3.1.1 E-step
The histogram notation of Section 3.1 is retained, as is the superposed intensity
(3.11). In EM parlance, the incomplete data loglikelihood function is (2.62). Missing
data arise from the histogram nature of the data and from the intensity superposition.
For j = 1, . . . , K , the count m j in cell R j corresponds to m j points in cell R j , but
the precise locations of these points are not observed. Let

ξ j ≡ x j1 , . . . , x jm j , x jr ∈ R j , r = 1, . . . , m j ,

denote these locations. Denote the collection of all missing points by ξ1:K =
(ξ1 , . . . , ξ K ). The points in ξ1:K are the (missing) points of the PPP realization
from which the histogram data are generated. The missing data that arise from the
superposition are the same as before. The point x jr is generated by one of the com-
ponents of the superposition;denote the index of this component by k jr , where
1 ≤ k jr ≤ L. Let K j = K j1 , . . . , K jm j , and denote all missing indices by
K1:K = (K1 . . . , K K ).
The complete data are (m 1:K , ξ1:K , K1:K ). Because the points x jr are equivalent
to points of the PPP realization, and the indices k jr indicate the components of the
superposition that generated them, the definition of the complete data likelihood
function is
74 3 Intensity Estimation

mj
K

phc ( m 1:K , ξ1:K , K1:K ; θ ) = exp − λ (s ; θ ) ds λk jr x jr ; θ .
R j=1 r =1
(3.47)

Missing data do not affect the integral over all R because exp − R λ(s ; θ ) ds
is the normalization constant that makes phc ( · ) a pdf. The conditional pdf of the
missing data is, by Bayes Theorem,

phc (m 1:K , ξ1:K , K1:K ; θ )

p (ξ1:K , K1:K ; θ ) =
phist (m 1:K ; θ )
mj

K

= wk jr x jr ; θ , (3.48)
j=1 r =1

where the weights are

λ x jr ; θk jr
w x jr ; θ = , 1≤≤ L. (3.49)
Rj λ (s ; θ ) ds

Unlike the weights (3.25), these weights are not dimensionless; they carry the same
units as the intensity. For j = 1, . . . , K , let

··· dx j1 · · · dx jm j ≡ dξ j ,
mj
Rj Rj (R j )

where dξ j = dx j1 · · · dx jm j , and

··· dξ1 · · · dξ K ≡ dξ1:K ,
(R1 )m 1 (R K )m K (R1 )m 1 ×···×(R K )m K

where dξ1:K ≡ dξ1 · · · dξ K . It is easy to verify from the definition of the weights
that

mj
K

wk jr x jr ; θ dξ1:K = 1 ,
K1:K (R1 )m 1 ×···×(R K )m K j=1 r =1

where the sum over the indices K1:K is

⎛ ⎞ ⎛ ⎞

L
≡ ⎝ ··· ⎠ ··· ⎝ ··· ⎠.
K1:K k11 =1 k1m 1 =1 k K 1 =1 k K m K =1

Integrating and summing over all missing data except x jr and k jr shows that
3.3 Superposed Intensities with Histogram Data 75

mj
K

wk j r x j r ; θ d ξ1:K \ x jr
K1:K \k jr (R1 )m 1 ×···×(R K )m K \R j j =1 r =1

= wk jr x jr ; θ . (3.50)

This identity is analogous to the identity (3.18).

3.3.1.2 M-step
Let n ≥ 0 denote the EM iteration index, and let θ (0) be given. The auxiliary
function is, by definition of the E-step,

Q θ ; θ (n) = E log phc ( m 1:K , ξ1:K , K1:K ; θ ) | θ (n) (3.51)

K mj
≡ log phc (m 1:K , ξ1:K , K1:K ; θ ) wk jr x jr ; θ (n) dξ1:K .
K1:K (R1 ) ×···×(R K )
m1 mK
j=1 r =1

Substituting the logarithm of (3.47) and paying attention to the algebra gives

Q θ ; θ (n) = − λ (s ; θ ) ds
R

mj
K
L

+ log λk jr x jr ; θk jr wk jr x jr ; θ (n) dx jr
j=1 r =1 k jr =1 R j
⎧ ⎫
⎨
⎬
× wk j r x j r ; θ (n) d ξ1:K \ x jr .
⎩ ⎭
K1:K \k jr (R1 ) ×···×(R K )
m1 mK \ R
j j = j r =r

From the identity (3.50), the term in braces is identically one. Carefully examining
the sum over k jr shows that the index can be replaced by an index, say , that does
not depend on j and r . Making the sum over the first sum, and recognizing that
the integrals over x jr for each index r are identical, gives the simplified auxiliary
function

K
Q θ ; θ (n) = − λ (s ; θ ) ds + mj w s ; θ (n) log λ (s ; θ ) ds.
R =1 j=1 Rj
(3.52)

Finally, recalling (3.11), the auxiliary function is written

76 3 Intensity Estimation

L
Q θ ; θ (n) = Q θ ; θ (n) , (3.53)
=1

where

K
(n)
Q θ ; θ = − λ (s ; θ ) ds + mj w s ; θ (n) log λ (s ; θ ) ds.
R j=1 Rj
(3.54)

This completes the E-step.

The M-step separates into L independent maximization procedures, just as it
did with conditionally independent PPP data. The similarity of (3.54) and (3.4) is
striking. The details of the maximization of Q ( · ) depends on the parametric form
of λ (· ; θ ).

3.3.2 Affine Gaussian Sums

The Gaussian case considered here is

λ (s ; θ ) = I N (s ; μ , Σ ) , (3.55)

(n) (n) (n)

where I , μ , and Σ are estimated. Given I , μ , and Σ for EM iteration n,
the M-step necessary equation for I is
K
j=1 m j R j w s ; θ (n) ds
I = . (3.56)
R N (s ; μ, Σ) ds

The necessary equations for μ and Σ are coupled:

K
(n) ds
R s N (s ; μ , Σ ) ds j=1 m j R j s w s ; θ
= K (3.57)
R N (s ; μ , Σ ) ds
(n) ds
j=1 m j R j w s ; θ

and

K (n)

j=1 m j R j w s ; θ (s−μ )(s−μ )T ds
R sN (s;μ ,Σ )(s−μ )(s−μ )T ds
= K . (3.58)
R N (s ; μ , Σ ) ds j=1 m j R j w (s ; θ (n) ) ds

(n+1) (n+1)
The equations (3.57) and (3.58) are solved jointly for μ and Σ . Then,
(n+1)
I is evaluated using (3.56). This completes the M-step.
3.3 Superposed Intensities with Histogram Data 77

If R N (s ; μ , Σ ) ds ≈ 1, the updates simplify significantly. The left hand
side of (3.57) is the mean, so that
K
(n)

j=1 m j R j s w s ; θ ds
μ(n+1)
≈ K . (3.59)
(n)
j=1 m j R j w s ; θ ds

Using the approximation in (3.58) gives

K (n)
(n+1)

(n+1) T
j=1 m j Rj w s ; θ s − μ s − μ ds
(n+1)
Σ ≈ K . (3.60)
(n) ds
j=1 m j R j w s ; θ

The EM update μ(n+1) is used in (3.60).

Table 3.2 EM algorithm for affine Gaussian sum with PPP histogram data
Given data: m 1:K = {m 1 , . . . , m K } (K is the number of histogram cells)
L
Fit the intensity: λ(s ; θ) = λbgnd (s) + =1 I N (s ; μ , Σ ), s ∈ R
Estimate the parameters: θ = {(I , μ , Σ ) : = 1, . . . , L}
• FOR = 1 : L, initialize coefficients, means, and covariance matrices:
– I (0) > 0, μ (0) ∈ R, and Σ (0) positive definite
• END FOR
• FOR EM iteration index n = 0, 1, 2, . . . until convergence:
– FOR = 1 : L and s ∈ R, define (be able to evaluate):
I (n) N (s ; μ (n), Σ (n))
w (s ; n) = L
R λbgnd (s) + =1 I (n) N (s ; μ (n), Σ (n)) ds

– END FOR
– FOR = 1 : L, compute:
K
• N (n) = j=1 m j R j w (s ; n) ds
K
• E (n) = N1(n) j=1 m j R j s w (s ; n) ds
K
• V (n) = N1(n) j=1 m j R j w (s ; n) (s − μ )(s − μ ) ds
T

• Solve the coupled system of equations:

s N (s ; μ , Σ ) ds
R = E (n)
R N (s ; μ , Σ ) ds

R N (s ; μ , Σ ) (s − μ )(s − μ )T ds
= V (n)
R N (s ; μ , Σ ) ds

for μ (n + 1) and Σ (n + 1)
• Compute:
N (n)
I (n + 1) =
R N (s ; μ (n + 1), Σ (n + 1)) ds

– END FOR
• END FOR EM Test for convergence
• If converged: FOR = 1 : L, Iˆ = I (nlast ), μ̂ = μ (nlast ), Σ̂ = Σ (nlast ) END FOR
78 3 Intensity Estimation

Table 3.2 outlines the steps of the EM algorithm for affine Gaussian sums with
histogram data. It is structurally similar to Table 3.1.

3.4 Regularization

The EM method guarantees convergence only if the likelihood function is bounded

above, an often overlooked fact of life. Consider, for instance, the general affine
Gaussian sum

L
λ(s) = λbgnd (s) + I N (s ; μl , Σ ) . (3.61)
=1

These heteroscedastic
sums, as they arepicturesquely called in the statistics litera-
ture, have L 2 n x (n x + 1) + n x + 1 free parameters. There are so many param-
1

eters, in fact, that the loglikelihood function (3.1) is unbounded for conditionally
independent data for L ≥ 2. It is bounded only for L = 1, and then only if the
data are full rank.
As often as not, the ugly fact of unboundedness intrudes during the EM iteration
when a covariance matrix abruptly becomes numerically singular. What is happen-
ing is that the covariance matrix is “shrink-wrapping” itself onto some less than
full n x -dimensional data subspace. Over-fitting is a more classical name for the
phenomenon. The likelihood of the corresponding Gaussian component therefore
grows without bound as the EM algorithm bravely iterates toward a maximum it
cannot attain. In practice, unfortunately, it is all too easy to encounter initializations
that are in the domain of attraction of an unbounded point of the likelihood function.
Hence, the need for regularization.

3.4.1 Parametric Tying

Initialization difficulties are greatly improved by reducing the parameter count.

Parameter tying reduces the effective parameter count by placing one or more con-
straints on the model parameters. A constraint typically reduces the parameter count
by one, so the more constraints the better, provided the fitting capability of the
constrained model is not significantly reduced. Two parametric tying methods for
Gaussian sums are discussed. Neither method bounds the likelihood function, but in
practice both greatly reduce initialization difficulties.
By far the best known parametric tie requires the covariance matrices to be iden-
tical, that is, Σ ≡ Σ for all . This kind of tied sum is called a homoscedastic
sum, and its parameter count is only 12 n x (n x + 1) + L (n x + 1). Unfortunately,
in many applications, homoscedastic Gaussian sums are good fits to the data only
3.4 Regularization 79

when the number of components L is relatively large. An ML estimator Σ̂ can be

developed via EM. Details are omitted.
Strophoscedastic sums fall in between homoscedastic and heteroscedastic sums
in terms of parameter count. Their fitting capability is similar to that of heteroscedas-
tic sums, and their numerical behavior is almost as good as homoscedastic sums.
Both attributes are potentially significant advantages in practice. Strophoscedastic
Gaussian sums are sums in which Σ = UT D U for some orthogonal matrix U ,
where the positive diagonal matrix D is the same for all Gaussian components. The
matrix D and the matrices U are estimated by an ML algorithm that can be derived
via EM (see [129]).
Other interesting parameterized models have been proposed for specialized pur-
poses. Two are mentioned here, but the details are omitted.

Example 3.7 Contaminated Data. Tukey proposed an interesting mixture model for
data contaminated with outliers. The model is sometimes called a homothetic Gaus-
sian sum, after the Greek thetos, meaning “placed.” The components of a homo-
thetic Gaussian sum have the same mean vector, called the homothetic center. The
covariance matrices are linearly proportional to a common covariance matrix, so the
general form is

L
λ(s) = I N s ; μ, ρ2 Σ . (3.62)
=1

The scale factors, or similitude ratios, ρ , are specified. The estimated parameters
are the mean μ and the coefficients {}, as well as the matrix Σ if it is not specified
in the application. In some settings, it may be useful to allow components of the
homothetic model to have different covariance matrices.

Example 3.8 Luginbuhl’s Broadband Harmonic Spectrum. The spectrum of a truly

narrow band signal with fundamental frequency ω0 and L − 1 integer harmonics is
a sum of Dirac delta functions:

L
λ(s) = N (s) + I δ (s ; ω0 ) ,
=1

where s denotes frequency and N (s) is a known noise spectrum. However, if the
fundamental is a randomly modulated “narrow” broadband signal with mean fun-
damental frequency ω0 and spectral width σ0 , a reasonable model of the power
spectrum is

L
λ(s) = N (s) + I N s ; ω0 , 2 σ02 . (3.63)
=1
80 3 Intensity Estimation

This model differs from the homothetic model because both the locations and the
widths of the harmonics are multiples of the location ω and width σ0 of the funda-
mental. The objective is to estimate ω̂0 and σ̂0 from DFT data. In low signal to noise
ratio (SNR) applications, and in adverse environments where individual harmonics
can fade in and out over time, measurements of the full harmonic structure over
a sliding time window may improve the stability and accuracy of these estimates.
An immediate technical problem arises—DFT data are real numbers, not integer
counts. However, artificially quantizing the DFT measurements renders the data as
nonnegative integers. Estimates of the fundamental frequency can be computed by
thinking of the DFT cells as cells of a histogram and the integer data as histogram
counts of a PPP. A partial justification for this interpretation is given in Appendix
F. The spectral model (3.63) was proposed for generalized (noninteger) harmonic
structure in [73].

3.4.2 Bayesian Methods

As the above discussion shows, the data likelihood function for Gaussian sums is
bounded above only if the covariance matrices are bounded away from singularity.
One way to do this is the force the condition number of the covariance matrices to
be greater than a specified threshold. In practice, this strategy is easily implemented
(n)
in every EM iteration by appropriately modifying the eigenvalues of Σ̂ , the EM
covariance matrix estimate at iteration n. However, this abusive practice destroys
the convergence properties of EM.
A principled method that avoids singularities entirely while also preserving EM
convergence properties employs a Bayesian methodology. In a Bayesian method,
an appropriate prior pdf is assigned to each of the parameters of the Gaussian sum.
For the covariance matrices, the natural prior is the Wishart matrix-valued pdf. The
hyperparameters of the Wishart density is the specified number of degrees of free-
dom and a positive definite target matrix. Similarly, the natural priors for the coeffi-
cients and mean vectors of the sum are Dirichlet and Gaussian densities, respec-
tively. Incorporating these Bayesian priors into the PPP likelihood function and
invoking the EM method leads to Bayesian estimators. In particular, the Bayesian
estimate for the covariance matrices of the Gaussian sum are necessarily bounded
away from singularity because of the Wishart target matrix is positive definite. The
details for Gaussian mixtures, that is, for Gaussian sums that integrate to one, are
given in [99].
The full potential of the Bayesian method is not really exploited here since the
prior densities are used to avoid matrix singularities and other numerical issues.
These methods add robustness to the numerical procedures, so they are certainly
valuable. Nonetheless, in practice, this kind of Bayesian parameter estimate does not
directly address the over-fitting problem itself. The story changes entirely however
if there are application-specific justifications for invoking Bayesian priors.
Chapter 4
Cramér-Rao Bound (CRB) for Intensity
Estimates

It is a dreadful thing for the inhabitants of a house not to know

how it is made.
Ristoro d’Arezzo Composizione del Mondo,1 1282

Abstract The quality of unbiased estimators of intensity is analyzed in terms of the

Cramér-Rao Bound on the smallest possible estimation variance. No estimator is
needed to find the bound. The CRB for the intensity of a PPP takes a remarkably
simple explicit form. The CRB is examined for several problems. One problem ana-
lyzes the effect of gating on estimating the mean of a Gaussian signal in the presence
of clutter, or outliers. Another generalizes this to find the CRB of the parameters of
a Gaussian sum.

Keywords Cramer-Rao bound (CRB) · Unbiased estimators · Fisher information

matrix (FIM) · Cauchy-Schwarz · CRB of intensity from sample data · CRB of
intensity from histogram data · CRB of intensity on discrete space · Gating · Joint
CRB for Gaussian sums · Observed information matrix (OIM)

The most common measure of estimator quality is the Cramér-Rao bound (CRB)
on estimation variance. It is an important bound in both theory and practice. It
is important theoretically because no unbiased estimator of θ can have a smaller
variance than the CRB. Versions of the CRB are known for biased estimators, but
the bound in this case depends on the specific estimator used.
An estimator whose variance equals the CRB is said to be an “efficient” esti-
mator. Although efficient estimators do not exist for every problem, the CRB is still
useful in practice because, under mild assumptions, maximum likelihood estimators
are asymptotically unbiased and efficient. The CRB is also useful in another way.
Approximate estimation algorithms are often implemented to reduce computational
complexity or accommodate data quality, and it is highly desirable to know how
close these approximate estimators are to the CRB. Such studies, performed by
simulation or otherwise, may reveal that the approximate estimator is good enough,

1 The oldest surviving astronomical text in dialetto toscano (the Italian dialect spoken in Tuscany).

R.L. Streit, Poisson Point Processes, DOI 10.1007/978-1-4419-6923-1_4, 81

C Springer Science+Business Media, LLC 2010
82 4 Cramér-Rao Bound (CRB) for Intensity Estimates

or that there is a performance shortfall that requires further investigation. Either

outcome is important in the application.
The CRB is reviewed in a general setting in the first section. The CRB for PPP
intensity function estimation is presented in the second section using both PPP
sample data and histogram data. As will be seen, the CRB is surprising simple and
tractable in very general problems for both kinds of data.
The CRB is especially simple, even elegant, for affine Gaussian sums. Such mod-
els are important because of their widespread use in applications. The problem of
estimating the location of a Gaussian shaped intensity inside a gate, or window,
with superposed clutter is analyzed explicitly in terms of the signal to noise ratio
(SNR). The simplicity of the CRB for these problems is in stark contrast to the sig-
nificant difficulties encountered in evaluating the CRB for closely related problems
involving Gaussian mixture pdfs.
The last section discusses observed information matrices (OIMs). These matrices
are widely used in difficult estimation problems as surrogates for Fisher information
matrices (FIMs) when the latter are either unknown or cannot be evaluated. OIMs
are the negative of the Hessian matrix of the loglikelihood function evaluated at the
maximum likelihood estimate.
Applications of the CRB to PPP intensity function estimation problems are not
widely discussed in the literature. This is unfortunate for possible applications. One
text that does discuss the CRB for PPPs is [119]. This prescient book gives not only
the theory but also many interesting and useful examples of parameterized PPP
intensity functions and their CRBs. Several journal papers also discuss the topic.
See, for example, [97] and the references therein.

4.1 Background

Several facts about the CRB are worth mentioning explicitly at the outset. Perhaps
the most interesting to readers new to the CRB is that the CRB is determined solely
by the data pdf. In other words, the CRB does not involve any actual data. Data
influence the CRB only via the parametric form of the data pdf. For the CRB to be
useful, it is imperative that the pdf accurately describe the real data.
Classical thinking says the only good estimators are the unbiased estimators
which are defined in the next section. Such thinking is incorrect—there are use-
ful biased estimators in several applications. Moreover, techniques are known for
trading off variance and bias to minimize the mean squared error, a quantity of great
utility in practice. Nonetheless, the classical CRB for unbiased estimators is the
focus to the discussion here. The CRB for biased estimators is given in Section 4.1.4
for estimators with known bias.
For the special case of one parameter, the crucial insight that makes the CRB
tick is the Cauchy-Schwarz inequality. Multiparameter problems require a modi-
fied approach because of the necessity of defining what a lower bound means in
more than one dimension. Maximizing a ratio of positive definite quadratic forms
(a Rayleigh quotient) replaces the Cauchy-Schwarz inequality in the multiparameter
4.1 Background 83

case. This approach parallels the one given in [46]. The CRB for multiple parameters
is discussed in this section in a general context.
Let p X (x ; θ ) denote the parameterized data pdf, where X is the random data, x
is a realization of the data, and θ is a real valued parameter vector.
Let

θ ∈ Θ ⊂ Rn θ ,

where n θ denotes the number of parameters and Θ is the set of all valid parameter
vectors. The pdf p X (x ; θ ) is assumed differentiable with respect to every compo-
nent of the vector θ for all x ∈ R ⊂ Rn x , where n x is the dimension of the data
space. The data space R is independent of θ . The gradient ∇θ p X (x ; θ ) is a column
vector in Rn θ .

4.1.1 Unbiased Estimators

Let the data X be accurately modeled by the pdf p X (x ; θ0 ), where θ0 ∈ Θ is

an unknown but fixed value of the parameter vector. Let θ̂ (x) denote an arbitrary
function of the data x. The function θ̂(x) is called an estimator. It is an unbiased
estimator if and only if

E θ̂(X ) = θ̂(x) p X (x ; θ0 ) dx = θ0 . (4.1)
R

The covariance matrix of an unbiased estimator θ̂ (x) is

T
Var θ̂ = E θ̂ (X ) − θ0 θ̂(X ) − θ0
T
= θ̂ (x) − θ0 θ̂ (x) − θ0 p X (x ; θ0 ) dx .
R

Unbiased estimators are compared in terms of their covariance matrices. Define

Σ1 ≤ Σ2 if the difference Σ2 − Σ1 is positive semidefinite.2 The CRB is based
on this interpretation of matrix inequality. For n θ = 1, the definition reduces to the
usual inequality for real numbers.
For biased estimators, the CRB depends on the estimator. It is given in
Section 4.1.4.

4.1.2 Fisher Information Matrix and the Score Vector

For any parameter vector θ ∈ Θ, the Fisher Information Matrix (FIM) of θ is the
n θ × n θ matrix defined by

2 A square (real) matrix A is positive semidefinite if and only if c T A c ≥ 0 for all vectors c.
84 4 Cramér-Rao Bound (CRB) for Intensity Estimates

J (θ ) = E (∇θ log p X (x ; θ )) (∇θ log p X (x ; θ ))T (4.2)

≡ (∇θ log p X (x ; θ )) (∇θ log p X (x ; θ ))T p X (x ; θ ) dx , (4.3)
R

where ∇θ is the gradient with respect to θ . The expectation operator is wonderfully

succinct, and writing it explicitly as an integral in (4.3) adds clarity. If the data X
are discrete or have both discrete and components, the integral in (4.3) should be
replaced by the appropriate sums and integrals.
The matrix J (θ ) is the covariance matrix of the score vector,

s(x ; θ ) ≡ ∇θ log p X (x ; θ ) ∈ Rn θ . (4.4)

To see this, first note that s(x ; θ ) is zero mean since

E [s(X ; θ )] ≡ (∇θ log p X (x ; θ )) p X (x ; θ ) dx
R

= ∇θ p X (x ; θ ) dx = ∇θ p X (x ; θ ) dx = ∇θ 1 = 0 .
R R
(4.5)

Therefore, by definition of covariance matrix,

J (θ ) = Var [s(X ; θ )] . (4.6)

In words, the FIM is the covariance matrix of the score vector.

4.1.3 CRB and the Cauchy-Schwarz Inequality

The n θ ×n θ matrix C R B(θ0 ) ≡ J −1 (θ0 ) is a lower bound on the covariance matrix

of any unbiased estimator of θ0 , provided the FIM is nonsingular. Thus, for all unbi-
ased estimators θ̂ (x),

Var θ̂ ≥ C R B(θ0 ) ≡ J −1 (θ0 ) = (Var [s(θ0 )])−1 . (4.7)

Several lower bounds on the variance of unbiased estimators are available, but the
best known and most widely used is the CRB. To see that (4.7) holds, first write the
equations that are equivalent to the statement that θ̂ (X ) is unbiased:

θ̂i (x) p X (x ; θ ) dx = θi , i = 1, . . . , n θ ,
R

where θi and θ̂i (X ) are the i-th components of θ and θ̂(X ), respectively. Differenti-
ating each of these equations with respect to θ j , j = 1, . . . , n θ , gives
4.1 Background 85

∂
θ̂i (x) p X (x ; θ ) log p X (x ; θ ) dx = δi j ,
R ∂θ j

where δi j is Kronecker’s delta: δi j = 1 if i = j and 0 if i = j. Evaluating this

system of n 2θ equations at the point θ = θ0 and writing it in matrix form gives

θ̂ (x) (s(x ; θ0 ))T p X (x ; θ0 ) dx = I n θ ×n θ , (4.8)
R

where I n θ ×n θ is the identity matrix.

Consider the one-dimensional random variables U (X ) = a T θ̂ (X ) and V (X ) =
b s(X ; θ0 ), where the constant vectors a and b are both nonzero. Because θ̂ (X ) is
T

unbiased, the mean of U (X ) is a T θ0 . The mean of V (X ) is b T E [s(X ; θ )] =

0,
as seen from (4.5).
By definition, the covariance of U (X ) and V (X ) is
R U (x) − a T θ
0 V (x) p X (x ; θ0 ) dx. The well known covariance inequality
(the Cauchy-Schwarz inequality in disguise) is

U (x) − a T θ0 V (x) p X (x ; θ0 ) dx
R
1 ≤ 1 . (4.9)
2 2 1
R U (x) − a θ0 p X (x ; θ0 ) dx R V (x) p X (x ; θ0 ) dx
T 2 2

Now substitute U (x) = a T θ̂(x) and V (x) = b T s(x ; θ0 ). The numerator simpli-
fies to a T b using (4.5) and (4.8). The first factor in the denominator is a T Var θ̂ a.
From (4.6), the second factor in the denominator is b T J (θ0 ) b. Making these substi-

tutions, squaring both sides of (4.9) and then multiplying through by a T Var θ̂ a
gives
T 2
a b
≤ a T
Var θ̂ a . (4.10)
b T J (θ0 ) b

The inequality (4.10) holds for all nonzero vectors b, so

bT a a T b
max T ≤ a T Var θ̂ a . (4.11)
b = 0 b J (θ0 ) b

The maximum value of the Rayleigh quotient on the left hand side is unchanged
if b is multiplied by an arbitrary nonzero real number. The equality constraint
b T J (θ0 ) b = 1 eliminates this scale factor. The maximum value is attained at
the solution of the constrained optimization problem:

max b T a a T b subject to b T J (θ0 ) b = 1.

The method of Lagrange multipliers gives the solution,

1
b∗ = J −1 (θ0 ) a .
a T J (θ0 ) a
86 4 Cramér-Rao Bound (CRB) for Intensity Estimates

Substituting b = b∗ into (4.11) yields the inequality

a T J −1 (θ0 ) a ≤ a T Var θ̂ a , (4.12)

from which it follows immediately that

aT Var θ̂ − J −1 (θ0 ) a ≥ 0 (4.13)

for all nonzero vectors a. Hence, Var θ̂ − J −1 (θ0 ) is positive semidefinite, and
this establishes (4.7).

4.1.4 Spinoffs
There For all i and j, denote the (i, j)-th elements of
is more to learn from (4.13).
Var θ̂ and J (θ0 ) by Vari j θ̂ and Ji−1
−1
j (θ0 ), respectively. Suppose all the com-

of the vector a ≡ (a1 , . . . , an θ ) are zero except for a j = 0. Then (4.13)

ponents T

is a j Var θ̂ − J −1 (θ0 ) a j ≥ 0. Equivalently, since a j = 0,

Var j j θ̂ ≥ J j−1
j (θ0 ) . (4.14)

In words, this says that the smallest variance of any unbiased estimator of the j-th
parameter θ j is the ( j, j)-th element of the inverse of the FIM of θ . The result is
important in the many applications in which the off-diagonal elements of the CRB
are of no intrinsic interest.
The inequality (4.13) yields still more. If all the components of a are zero except
for the subset with indices in M ⊂ {1, . . . , n θ }, then

−1
Var M×M θ̂ ≥ J M×M (θ0 ) . (4.15)

Here, Var M×M θ̂ denotes the M × M submatrix of Var θ̂ obtained by removing

all rows and columns of Var θ̂ that are not in the index set M, and similarly for
−1
J M×M (θ0 ). The inequality (4.15) allows selected off-diagonal elements of the CRB
to be evaluated as needed in the application.
If p X (x ; θ ) is twice differentiable, then it can be shown that

J (θ0 ) = − E ∇θ (∇θ )T log p X (x ; θ0 ) , (4.16)

or, written in terms of the (i, j)-th element,

4.1 Background 87

∂2
Ji j (θ0 ) = − log p X (x ; θ0 ) p X (x ; θ0 ) dx . (4.17)
R ∂ θi ∂ θ j

This form of the FIM is widely known and used in many applications. It is also the
inspiration behind the observed information matrix (OIM) that is often used when
the FIM is unavailable. For more details in a PPP
context, see Section 4.7.
An unbiased estimator is efficient if Var θ̂ = C R B(θ0 ). This definition is
standard in the statistical literature, but it is misleading in one respect: the best
unbiased estimator is not necessarily efficient. Said more carefully, there are pdfs
p X (x ; θ ) for which the unbiased estimator with the smallest covariance matrix is
known explicitly, but this estimator does not achieve the CRB.
An estimator θ̂ (x) is biased if

E θ̂ = θ0 + b(θ0 ) (4.18)

and b(θ0 ) is nonzero. The nonzero term b(θ0 ) is called the estimator bias. The bias
clearly depends on the particular estimator, and it is often difficult to evaluate. If the
form of b(θ0 ) is known, and b(θ0 ) is differentiable with respect to θ , then the CRB
for the biased estimator θ̂ is
T
Var θ̂ − θ0 − b(θ0 ) = I + ∇θ b T (θ ) J −1 (θ0 ) I + ∇θ b T (θ ) ,
θ=θ0 θ=θ0
(4.19)

where I is the n x × n x identity matrix, and the gradients are evaluated at the true
value θ0 . The matrix dimensions are consistent since b T (θ ) = b1 (θ ), . . . , bn x (θ )
is a row, and its gradient is the n x × n x matrix

∇θ b1 (θ ), . . . , ∇θ bn x (θ ) .

In other words, ∇θ b T (θ ) is a matrix whose (i, j)-th entry is ∂b j (θ )/∂θi . Since

Var θ̂ − θ0 − b(θ0 ) = Var θ̂ − θ0 − b(θ0 ) b T (θ0 ) , (4.20)

the bound (4.19) is often written

T
Var θ̂ − θ0 = b(θ0 )b T (θ0 ) + I + ∇θ b T (θ ) J −1 (θ0 ) I + ∇θ b T (θ ) .
θ=θ0 θ=θ0

The matrix J (θ0 ) is the FIM for θ0 . The bound depends on the estimator via the
derivative of the bias.
A Bayesian version of the CRB called the posterior CRB (PCRB) is useful when
the parameter θ is a random variable with a specified prior pdf. A good discussion
of the PCRB, and in fact the very first discussion of it anywhere, is found in Van
88 4 Cramér-Rao Bound (CRB) for Intensity Estimates

Trees [134, pp. 66–73]. An excellent collection of papers on PCRBs is provided in

the book by Van Trees and Bell [135].
Recursive PCRBs for use in tracking and filtering applications are available. The
Tichavský recursion, as it is often called, is derived in [112, 133] and also in [104,
Chapter 4]. The classic example in tracking applications is for linear Gaussian target
motion and measurement models, in which case the Tichavský recursion is identical
to the Kalman filter recursion. The recursive PCRB is potentially useful for PPP
intensity estimation in filtering applications.
In the case of PPPs, the expectation in the definition of the FIM is the expec-
tation with respect to the PPP. This expectation comprises a sum over n of n-fold
integrals, so evaluating the FIM explicitly looks formidable in even the simplest
cases. Fortunately, as is seen in the next section, appearances are deceiving.

4.2 CRB for PPP Intensity with Sample Data

For PPP sample data on the bounded set R, the FIM is amazingly simple. If
λ(s ; θ ) > 0 for all s ∈ R, then the FIM for unbiased estimators of θ is

1
J (θ ) = [∇θ λ(s ; θ )] [∇θ λ(s ; θ )]T ds . (4.21)
R λ(s ; θ )

It is also delightfully straightforward to derive. From (2.12),

n
log pΞ (ξ ; θ ) = − log n! − λ(s ; θ ) ds + log λ(x j ; θ ) . (4.22)
R j=1

The gradient of (4.22) is

n
1
∇θ log pΞ (ξ ; θ ) = − ∇θ λ(s ; θ ) ds + ∇θ λ(x j ; θ ) . (4.23)
R λ(x j ; θ )
j=1

The FIM is the expectation of the outer product

T
∇θ log pΞ (ξ ; θ ) ∇θ log pΞ (ξ ; θ )
T
= ∇θ λ(s ; θ ) ds ∇θ λ(s ; θ ) ds
R R
⎛ ⎞T

n
⎝ 1
−2 ∇θ λ(s ; θ ) ds ∇θ λ(x j ; θ )⎠ (4.24)
R λ(x j ; θ )
j=1
& n '⎛ n ⎞T

1
1
+ ∇θ λ(x j ; θ ) ⎝ ∇θ λ(x j ; θ )⎠ .
λ(x j ; θ ) λ(x j ; θ )
i=1 j=1
4.2 CRB for PPP Intensity with Sample Data 89

The FIM is the sum of the expected values of the three terms in (4.24), where the
expectation is given by (2.23). The expectation of the first term is trivial—it is the
expectation of a constant. The other two expectations look formidable, but they are
not. 4The sums in both terms are the same form as (2.30), so the expectation of the
second term is, from (2.32),
T
−2 ∇θ λ(s ; θ ) ds ∇θ λ(s ; θ ) ds , (4.25)
R R

and that of the third term is, from (2.34),

T
∇θ λ(s ; θ ) ds ∇θ λ(s ; θ ) ds + J (θ ), (4.26)
R R

where J (θ ) is given by (4.21). Upon collecting terms, everything cancels except

J (θ ).
Example 4.1 Scaled Intensity. Let λ(x ; θ ) = I f (x), where f (x) > 0 is a known
factor is estimated, so θ ≡ I . From (4.21),
intensity function and the constant scale
the Fisher information is J (I ) = R f (s) ds /I , so the CRB for unbiased esti-
mators is the reciprocal

I
C R B(I ) = .
R f (s) ds

The ML estimator is
m
IˆM L = ,
R (s) ds
f

where m is the number of points in a realization. It is unbiased since

E[m] I f (s) ds
E IˆM L = = R = I.
R f (s) ds R f (s) ds

The variance of IˆM L is, using (2.28),

Var[m] R I f (s) ds
Var IˆM L = 2 = 2 = C R B(I ) ,
R f (s) ds R f (s) ds

so the ML estimator attains the lower bound and is, by definition, an efficient esti-
mator.
Example 4.2 Linear Combination of Known Intensities. Generalizing the previous
example, let λ(x ; θ ) = I T f (x), where the components of the vector f (x) ≡
90 4 Cramér-Rao Bound (CRB) for Intensity Estimates

( f 1 (x), . . . , f L (x))T are specified and the vector of constants I = (I1 , . . . , I L )T ∈

R L is estimated. Then, from (4.21), the FIM for unbiased estimators of I is

1
J (I ) = f (s) f T (s) ds ∈ R L×L .
R IT f (s)

Evaluating F I M(I ) and its inverse C R B(I ) requires numerical methods. The ML
estimator IˆM L is also not explicitly known, but is found numerically by solving
the necessary equations (3.3), or by the EM recursion (3.29). Even in this simple
example, it is not clear whether or not IˆM L is unbiased.

Example 4.3 Oops. For the parameter vector θ ∈ R2 , define the intensity

λ(x) = (θ − c)T (θ − c) = θ − c2 , x ∈ R ⊂ R2 , (4.27)

where c ∈ R2 is given and R = [0, 1] × [0, 1]. The FIM is, from (4.21),

(θ − c)(θ − c)T
J (θ ) = .
θ − c2

The matrix J (θ ) is clearly rank one, so the CRB fails to exist. The problem lies in
the parameterization, not the FIM. The intensity function (4.27) is constant on R, so
the “intrinsic” dimensionality of the intensity function parameter is only one. The
dimension of θ is two.

4.3 CRB for PPP Intensity with Histogram Data

The FIM for histogram data is somewhat more complicated than for PPP sample
data. As in Section 3.1, the histogram cells are R1 , . . . , R K , where Ri ∩ R j = ∅
for i = j. The CRB of θ for histogram data is the inverse of the FIM given by

. / . /T

K
1
J (θ ) = ∇θ λ(s ; θ ) ds ∇θ λ(s ; θ ) ds .
j=1 R j λ(s ; θ ) ds Rj Rj
(4.28)

This expression reduces to (4.21) in the limit as the number histogram cells goes to
infinity and their size goes to zero. To see (4.28) requires nothing more than matrix
algebra, but it is worth presenting nonetheless. The rest of the section up to Example
4.4 can be skipped on a first reading.
Start with the loglikelihood function of the data. Let n 1:K = (n 1 , . . . , n K )
denote integer counts in the histogram cells. From (2.62), the pdf of the data is
4.3 CRB for PPP Intensity with Histogram Data 91

& '

K
log p(n 1:K ; θ ) = − n j! − λ(s ; θ ) ds + n j log λ(s ; θ ) ds .
j=1 R j=1 Rj

The gradient with respect to θ is

K
nj
∇θ log p(n 1:K ; θ ) = − ∇θ λ(s ; θ ) ds + ∇θ λ(s ; θ ) ds.
R Rj λ(s ; θ ) ds Rj
j=1
(4.29)

The FIM is the expectation of the outer product of this gradient with itself. As was
done with conditionally independent data, the outer product is written as the sum of
three terms:

T
∇θ log p(n 1:K ; θ ) ∇θ log p(n 1:K ; θ )
T
= ∇θ λ(s ; θ ) ds ∇θ λ(s ; θ ) ds
R R
⎛ ⎞T

K
nj
−2 ∇θ λ(s ; θ ) ds ⎝ ∇θ λ(s ; θ) ds ⎠ (4.30)
R R λ(s ; θ ) ds R j
j=1 j
⎛ ⎞⎛ ⎞T

K
K
nj nj
+⎝ ∇θ λ(s ; θ ) ds ⎠⎝ ∇θ λ(s ; θ) ds ⎠ .
j=1R j λ(s ; θ ) ds R j R j λ(s ; θ) ds R j
j=1

The first term is independent of the data, so it is identical to its expectation. The
second term is a product of two factors, the first of which is independent of the data
so it multiplies the expectation of the other factor, which is a sum over K terms.
Since K is the number of histogram cells and is fixed, the expectation of the sum is

K
E nj
∇θ λ(s ; θ ) ds
λ(s ; θ ) ds Rj
j=1 R j

K
R λ(s ; θ ) ds
= j ∇θ λ(s ; θ ) ds
j=1 R j λ(s ; θ ) ds Rj

K

= ∇θ λ(s ; θ ) ds = ∇θ λ(s ; θ ) ds . (4.31)
j=1 Rj R

The expectation of the second term is therefore

92 4 Cramér-Rao Bound (CRB) for Intensity Estimates

T
−2 ∇θ λ(s ; θ ) ds ∇θ λ(s ; θ ) ds . (4.32)
R R

The third term is trickier. Rewrite it as a double sum:

& '⎛
⎞T

K
nj n j
∇θ λ(s ; θ ) ds ⎝ ∇θ λ(s ; θ) ds ⎠ .
j=1 j =1 R j λ(s ; θ ) ds Rj R j λ(s ; θ) ds R j

There are two cases. In the first case, j = j and the cells R j and R j are disjoint.
The summands are therefore independent and the expectation of their product is the
product of their expectations. In the same manner as done in (4.31), the expectation
simplifies to
⎡ ⎤
& ' & 'T
⎢

K ⎥

K
⎢ ⎥
E⎢ ⎥ = ∇θ λ(s ; θ ) ds ∇θ λ(s ; θ ) ds . (4.33)
⎣ ⎦ Rj R j
j, j = 1 j, j = 1
j = j j = j

In the other case, j = j and the double sum is the single sum,
& '& 'T

K
nj nj
∇θ λ(s ; θ ) ds ∇θ λ(s ; θ ) ds .
j=1 R j λ(s ; θ ) ds Rj R j λ(s ; θ ) ds Rj

Because K is fixed, the expectation of the sum is the sum of the expectations. Denote
the summand by e j j and write it as a product of sums:
⎛ ⎞⎛ ⎞T
R j ∇θ λ(s ; θ ) ds R j ∇θ λ(s ; θ ) ds
nj nj

ejj = ⎝ ⎠⎝ ⎠ .
R j λ(s ; θ ) ds
ρ=1 R j λ(s ; θ ) ds
ρ =1

Using the identity (2.39) and simplifying gives

& ' & 'T

E ejj = ∇θ λ(s ; θ ) ds ∇θ λ(s ; θ ) ds
Rj Rj
& ' & 'T
1
+ ∇θ λ(s ; θ ) ds ∇θ λ(s ; θ ) ds . (4.34)
Rj λ(s ; θ ) ds Rj Rj

Now add the sum over j of (4.34) to (4.33). The double sum no longer has the
exception j = j , so it becomes the product of single term sums over j. The single
4.4 CRB for PPP Intensity on Discrete Spaces 93

term sums are identical to integrals over all R. Therefore, the expectation of the
third term is
T
∇θ λ(s ; θ ) ds ∇θ λ(s ; θ ) ds + J (θ ) .
R R

Finally, adding the expectations of the three terms gives the FIM (4.28).

Example 4.4 Scaled Intensity (continued). Let the intensity be the same as in Exam-
ple 4.1. The FIM for the scale factor I using histogram data is
& '2 K

K
1 1
f (x) dx
J (I ) = f (x) dx = f (x) dx = R .
j=1 R j f (x) dx
I Rj I
j=1 R j
I

The CRB of I is therefore the same for both conditionally independent sample data
and histogram data. The ML estimator for histogram data is given by (3.6) with the
Gaussian pdf replaced by f (x). Its variance is
K
Var m j
Var IˆM L
j=1
= 2
R f (s) ds
K
j=1 R j I f (s) ds
= 2
R f (s) ds
K
j=1 R j f (s) ds I
= I 2 = .
R f (s) ds R f (s) ds

The ML estimator is therefore unbiased and efficient for histogram data.

4.4 CRB for PPP Intensity on Discrete Spaces

Let Φ = {φ1 , φ2 , . . .} be a discrete space as discussed in Section 2.12.1, and let
the intensity vector λ(θ ) = {λ1 (θ ), λ2 (θ ), . . .} > 0 of the PPP on Φ depend on
the parameter θ . The FIM of the PPP Ξ corresponding to λ(θ ) is

1 T
J (θ ) = ∇θ λ(φ j ; θ ) ∇θ λ(φ j ; θ ) . (4.35)
λ(φ j ; θ )
j∈R

The derivatives are evaluated at the true value of θ . To see that (4.35) holds, it is only
necessary to follow the steps of the proof of (4.21) for PPPs on continuous spaces,
replacing integrals with the appropriate sums. This requires verifying that certain
results hold in the discrete case, e.g., Campbell’s Theorem. Details are omitted.
94 4 Cramér-Rao Bound (CRB) for Intensity Estimates

A quick intuitive way to see that the result must hold is as follows: imagine that
the points of Φ are isolated points of the region over which the integral in (4.21) is
performed. Let the intensity function be a test function sequence for these isolated
points (of the kind used to define the Dirac delta function). Then (4.21) goes in the
limit to (4.35) as the test function sequence “converges” to the Dirac delta function.

Example 4.5 Intensity Vector. Let Ξ denote a PPP on the discrete space Φ =
{φ1 , φ2 } with intensity vector λ(θ ) = (λ1 (θ ), λ2 (θ )) , where θ = (θ1 , θ2 ) and

λ1 (θ ) = θ1
λ2 (θ ) = θ2 . (4.36)

For R = Φ, the FIM of θ is

1 1 T 1 0 T
J (θ ) = 10 + 01 (4.37)
θ1 0 θ2 1
. /
1
= θ1 0 .
0 θ12

The FIM J (θ ) is a diagonal matrix, so the CRBs of the estimates of θ1 and θ2

are uncoupled, as anticipated. Their CRBs are equal to the CRBs of θ1 and θ2 ,
respectively. That the CRBs are equal to the mean number of occurrences of φ1 and
φ2 is a reflection of the fact that the mean and variance of a Poisson distributed
random variable are equal.

Example 4.6 Parametrically Tied Intensity Vector. Let Ξ denote a PPP on the dis-
crete space Φ = {φ1 , φ2 } with intensity vector

λ(θ ) = (I cos θ, I sin θ ) ,

π

where I is known and θ ∈ 0, 2 . For R = Φ, the FIM is

1 1
J (θ ) = (−I sin θ )2 + (I cos θ )2
I cos θ I sin θ
sin3 θ + cos3 θ
= I .
sin θ cos θ

The CRB is the inverse of J (θ ). As is easily seen, dθ d

J (θ ) = −I cos(2θ ). The
derivative is zero at θ = π/4, and this corresponds to the global minimum for J (θ )
of I /2 on the interval 0, π2 . Hence, the CRB of θ is largest when the Poisson
distributions for the number of occurrences of φ1 and φ2 are identical. The CRB of
θ is 2/I for θ = π/4.
4.5 Gating: Gauss on a Pedestal 95

4.5 Gating: Gauss on a Pedestal

A classic problem concerns estimating a Gaussian signal in noise, or clutter. For
PPPs the CRB for the location of the Gaussian signal takes an especially compact
and explicit form. The intensity is modeled as the sum of a nonhomogeneous signal
and a nonhomogeneous noise (pedestal):

λ(x ; μ) = λ N pnoise (x) + λ S N (x ; μ, Σ) , (4.38)

where R pnoise (x) dx = 1. The intensity of the signal process is proportional to a
multivariate Gaussian pdf centered at μ with spread determined by the eigenvalues
and eigenvectors of Σ. The expected signal and noise counts, λ S and λ N , respec-
tively, are dimensionless. Define the signal to noise ratio (SNR) by

λS
γS N R = .
λN

The CRB for the Gaussian mean μ is given in this section.

The parameter μ is estimated from a realization of the PPP with intensity (4.38)
on a bounded set R called the gate. The gate is an arbitrary bounded subset of Rn x ,
but is typically a standardized ellipsoid (as in Example 4.7 below). The gradient of
(4.38) is

∇μ λ(x ; μ) = λ S N (x ; μ, Σ) Σ −1 (x − μ) . (4.39)

From (4.21), the FIM for μ using conditionally independent sample data is written
in the form

JR (μ) = λ S Σ −1 WR (μ) Σ −1 , (4.40)

where the weighted covariance matrix WR (μ) is defined by

(N (x ; μ, Σ))2
WR (μ) = γ S N R (x − μ)(x − μ)T dx .
R pnoise (x) + γ S N R N (x; μ, Σ)
(4.41)

The matrix WR (μ) is evaluated at the correct value of μ. The CRB is the inverse of
the FIM, so

1 −1
C R BR (μ) = Σ WR (μ) Σ . (4.42)
λS

It is not required that μ ∈ R, but the CRB is large if μ is outside of R.

A curious aspect of the CRB (4.42) is that it involves both signal strength λ S and
SNR. The signal strength λ S governs the average absolute number of signal points
96 4 Cramér-Rao Bound (CRB) for Intensity Estimates

in a realization, and thus inversely scales the variance reduction at a given SNR. On
the other hand, the trade-off between signal and noise pdfs and their effect on the
shape (eigenvalues and eigenvectors) of the weighted covariance matrix WR (μ) is
determined by the average fraction of points in the realization that originate from
the signal. This fraction is governed by γ S N R . Good estimation therefore depends
on both λ S and γ S N R .
The CRB for the covariance matrix Σ can also be found using the general result
(4.21). The results seem to provide little insight and are somewhat tedious to obtain,
so they are not presented here. Details for the closely related problem of evaluating
the CRB of Σ for the classical multivariate Gaussian density using i.i.d. data can be
found in [116].
The general result (4.21) will also give the CRB for non-Gaussian signals, that
is, for data that are a realization of the PPP with intensity

λ(x ; μ) = λ N pnoise (x) + λ S psignal (x ; μ),

where psignal (x ; μ) is an arbitrary pdf with location parameter vector μ. These

problems are application specific and are left to the reader.
A different, but related, approach to the finding CRB-like bounds for signals
in clutter are the scaled “information-reduction” matrices of [89]. Comparisons
between this method and the PPP approach would be interesting, but they are not
given here.

Example 4.7 Effect of Gating on Estimated Mean. The structure of the CRB is
explored numerically for the special case R = R1 . Let

R(ρ) = {x : (x − μ)T Σ −1 (x − μ) ≤ ρ 2 },

where ρ determines the standardized gate volume. In one dimension, Σ ≡ σ 2 and

(4.42) gives
⎡ ⎤−1
ρ x 2 e−x
2
π σ3 ⎢ ⎥
C R BR(ρ) (μ) = ⎣ dx ⎦ . (4.43)
λS γS N R 0 pnoise (σ x) + γS N R
√ e
2
− x2
2π σ

The term in brackets has units of length, so the CRB has units of length squared, as
required in one dimension.
The effect of SNR and gate size on the CRB is seen by plotting C R BR(ρ) (μ) as
a function of ρ for several values of SNR and for, say, σ = 1. Let pnoise (x) ≡ 1.
It is seen in Fig. 4.1 that the CRB decreases with increasing ρ and SNR. It is also
seen that there is no practical reason to use gates larger than ρ = 2.5 in the one
dimensional case, regardless of SNR.
4.6 Joint CRB for Gaussian Sums 97

√
Fig. 4.1 CRB of (4.43) for μ as a function of gate size ρ: λ N = 2π ≈ 2.5 points per unit
length; S N R = 3(1)10; σ = 1. Gates larger than ρ = 2.5 do little to improve estimation, at
least in R1

4.6 Joint CRB for Gaussian Sums

The intensity of a finite superposition of noise and L independent PPPs with

Gaussian shaped intensities is

λ(x ; μ, Λ) ≡ λ(x ; μ1 , . . . , μ L , λ1 , . . . , λ L )

L
= λ N pnoise (x) + λ N (x ; μ , Σ ), (4.44)
=1

where μ = (μ1 , . . . , μ L ) and Λ = (λ1 , . . . , λ L ) > 0. The special case L = 1 is

the model (4.38) of the previous section. If all the parameters are specified except
for one of the means, say μ j , then the CRB of μ j is found by using the nonhomo-
geneous noise term

L
λnoise( j) (x) ≡ λ N pnoise (x) + λ N (x ; μ , Σ ) (4.45)
=1
= j

in (4.38). The FIM and CRB are then evaluated using the appropriately interpreted
versions of (4.41) and (4.42).
98 4 Cramér-Rao Bound (CRB) for Intensity Estimates

4.6.1 Mean Vectors in a Gaussian Sum

The CRB is a joint function of all the means that are estimated. The CRB of a
particular mean, say μ j , is found from the joint FIM using the result (4.15). This
CRB is very different from the one found by assuming that all the means except μ j
are known.
For simplicity, assume that all the signal mean vectors in (4.44) are estimated.
Evaluating the joint FIM for the full parameter vector μ requires the gradient with
respect to all the means in μ. In this case, the required gradient is the stacked vector
⎡ ⎤ ⎡ ⎤
∇μ1 λ(x ; μ, Λ) λ1 N (x ; μ1 , Σ1 ) Σ1−1 (x − μ1 )
⎢ .. ⎥ ⎢ .. ⎥
∇μ λ( x ; μ, Λ) = ⎣ . ⎦=⎣ . ⎦.
∇μ L λ(x ; μ, Λ) −1
λ L N (x ; μ L , Σ L ) Σ L (x − μ L )
(4.46)

The i j-th element of the outer product required by (4.41) is thus

T
λi N (x ; μi , Σi ) Σi−1 (x − μi ) λ j N (x ; μ j , Σ j ) Σ −1
j (x − μ j )
= λi λ j N (x ; μi , Σi ) N (x ; μ j , Σ j ) Σi−1 (x − μi )(x − μ j )T Σ −1
j .
(4.47)

The FIM, J (μ), is an L×L block matrix. Its entries are matrices of size n x × n x , so
its full dimension is (Ln x ) × (Ln x ). The i j-th block of J (μ) is the integral over R
of (4.47) divided by λ(x ; μ, Λ). Explicitly,

Ji j (μ) = λi λ j Σi−1 WR (μ, Λ) Σ −1

ij
j , (4.48)

where the n x ×n x weighted covariance matrix is defined by

ij N (x ; μi , Σi ) N (x ; μ j , Σ j )
WR (μ, Λ) = (x − μi )(x − μ j )T dx .
R λ(x ; μ, Λ)
(4.49)

ij
The matrix WR (μ, Λ) = WR (μ, Λ) is an L×L block matrix with n x×n x blocks,
so its full dimension is the same as J (μ). Writing the system (4.49) in matrix form
gives

J (μ) = D −1 (Λ) WR (μ, Λ) D −1 (Λ) , (4.50)

where
4.6 Joint CRB for Gaussian Sums 99

⎡ ⎤
λ1 Σ1−1 0 ··· 0
⎢ 0 λ2 Σ −1 ··· 0 ⎥
⎢ ⎥
D −1 (Λ) = ⎢ .
2
.. .. .. ⎥ (4.51)
⎣ .. . . . ⎦
0 0 · · · λ L Σ L−1

is an L × L block diagonal matrix with n x × n x blocks. The matrix D −1 (Λ) is the

same overall size as the FIM. The CRB is, therefore,

−1
C R B(μ) = D (Λ) WR (μ, Λ) D (Λ) , (4.52)

where
⎡ ⎤
λ1 Σ1 ···
1
0 0
⎢ 0 0 ⎥
λ2 Σ2 ···
1
⎢ ⎥
D (Λ) = ⎢
⎢ .. .. .. .. ⎥
⎥. (4.53)
⎣ . . . . ⎦
0 0 ··· λL ΣL
1

The CRB depends jointly on all the means μ j because, in general, none of the block
matrix elements of WR (μ, Λ) are zero.
The joint CRB separates into CRBs for the individual PPPs in the superposition if
the matrix WR (μ, Λ) is block diagonal. This happens if all the off-diagonal n x × n x
blocks are approximately zero, that is, if

ij
WR (μ, Λ) ≈ 0 for all i = j .

The approximation is a good one if the coefficient of the outer product in the integral
(4.49) is nearly zero for i = j. If the Gaussian pdfs are all well separated, that is, if
−1
max (μi − μ j )T Σi + Σ j (μi − μ j ) 1 ,
i= j

then the joint CRB is block diagonal and splits into separate CRBs for each mean.
More generally, the joint CRB splits the mean vectors into disjoint groups or clus-
ters. The CRBs of the means within each cluster are coupled, but the CRBs of means
in different clusters are independent.

4.6.2 Means and Coefficients in a Gaussian Sum

If the coefficients Λ and the means μ are estimated together, the CRB changes yet
again. The FIM in this case, denoted by J (μ, Λ), is larger than J (μ) by exactly L
rows and columns, so it has dimension (L + L n x ) × (L + L n x ). Partition it so
that
100 4 Cramér-Rao Bound (CRB) for Intensity Estimates

UR (μ, Λ) VR (μ, Λ)
J (μ, Λ) = T (μ, Λ) W (μ, Λ) . (4.54)
VR R

The matrix WR (μ, Λ) is the lower right (L n x ) × (L n x ) submatrix of this larger

FIM. The upper left L × L submatrix, UR (μ, Λ), corresponds to the outer product
(∇Λ λ(x ; μ, Λ))(∇Λ λ(x ; μ, Λ))T ; hence, its i j-th entry is

ij N (x ; μi , Σi ) N (x ; μ j , Σ j )
UR (μ, Λ) = dx . (4.55)
R λ(x ; μ, Λ)

Finally, the L × (L n x ) matrix in the upper right hand corner of the partition,
VR (μ, Λ), contains the cross terms, that is, the terms corresponding to the outer
product

(∇Λ λ(x ; μ, Λ))(∇μ λ(x ; μ, Λ))T .

Its i j-th entry is therefore

N (x ; μi , Σi ) N (x ; μ j , Σ j )
Σ −1
ij
VR (μ, Λ) = (x − μ j )T dx . (4.56)
j
R λ(x ; μ, Λ)

The lower left submatrix is clearly the transpose of VR (μ, Λ). The joint CRB of Λ
and μ is the inverse of J (μ, Λ). It differs significantly from the CRB J (μ) for μ
alone.

4.7 Observed Information Matrices

Evaluating the FIM requires evaluating many (multi-)dimensional integrals, and
these integrals often do not have a closed form. Numerical integration is the obvious
recourse, but such methods lose their appeal if they are deemed too computationally
expensive in practice. This can happen if specialized numerical procedures must
be tailored to the particular functions involved, if the integrals must be computed
repeatedly, if there are a large number of parameters.
An alternative to the FIM is the observed information matrix (OIM). Although
the OIM has no known optimality properties in general, it is nonetheless widely
used in statistical practice as a surrogate for the FIM in difficult problems for which
the FIM is unknown. The surrogate CRB is the inverse of the OIM. An interesting
paper discussing the notion of the OIM in the single parameter case is [26].
Unlike the FIM, the OIM depends on the measured data. The sensitivity of the
OIM to the actual data may be of independent interest in some applications, regard-
less of its utility as a surrogate to the FIM.
The OIM is the negative of the Hessian of the loglikelihood function evaluated
at its maximum. For a general twice differentiable and bounded pdf, the OIM is
defined by
4.7 Observed Information Matrices 101

O I M(θ̂ M L ) = −∇θ (∇θ )T log p(x ; θ̂ M L )

T
= −∇θ ∇θ p(x ; θ̂ M L ) , (4.57)

where the maximum likelihood parameter estimate is

θ̂ M L = arg max log p(x ; θ ) .

The inverse of the OIM is always positive definite when θ̂ M L is a local maximum
interior to the domain of allowed parameter vectors, Θ. Intuitively, it is tempting to
think of the OIM as the FIM without the expectation over the data. Unfortunately,
this succinct description is slightly inaccurate technically, since the FIM is evaluated
at the true parameter value while the OIM is evaluated at the ML estimate.

4.7.1 General Sums

In general, the OIM is evaluated by computing the appropriate second derivative

matrix at the point θ̂ M L . This is a relatively easy calculation for PPPs whose inten-
sity is a sum. Let

L
λ(x ; θ ) = f (x ; θ ) , (4.58)
=1

where f (x ; θ ) > 0 is a bounded twice differentiable function with respect to

the parameter vector θ for all x ∈ R, and where θ = (θ1 , . . . , θ L ) is the full
parameter vector. In the most general case, the functions f ( · ) and their parameters
are different; in general, θ have different lengths. The loglikelihood of the PPP with
intensity (4.58) is

& L ' & L '

log p(ξ ; θ ) = − f (x ; θ ) dx + log f (x j ; θ ) ,

R =1 j=1 =1
(4.59)

where ξ = (n, {x1 , . . . , xn }) is the given realization of the PPP. The following
algebraic identity—whose terms are defined in the next paragraph—is obtained by
straightforward, but tedious, differentiation and matrix algebra:

n
−∇θ [∇θ ]T log p(ξ ; θ ) = A(θ ) + B(θ ) + C(θ ) + S j (θ ) S Tj (θ ) . (4.60)
j=1

Those who wish to verify this result will find the identity
102 4 Cramér-Rao Bound (CRB) for Intensity Estimates
& L '

f (x ; θ )
∇θ log f (x ; θ ) = ∇θ log f (x ; θ )
λ(x ; θ )
=1

very useful. Further details are omitted. L

The stacked vector S j (θ ) has length dim(θ ) = =1 dim(θ ) and is given by
⎡ ⎤ ⎡ ⎤
S j1 (θ ) w1 (x1 ; θ ) ∇θ1 log f 1 (x j ; θ1 )
⎢ ⎥ ⎢ ⎥
S j (θ ) ≡ ⎣ ... ⎦ = ⎣ ..
. ⎦, j = 1, . . . , n,
S j L (θ ) w L (x L ; θ ) ∇θ L log f L (x j ; θ L )
(4.61)

where the weights w j (θ ) are

f (x j ; θ )
w (x j ; θ ) = , j = 1, . . . , n , = 1, . . . , L . (4.62)
λ(x j ; θ )

For = 1, . . . , L, the vector S j (θ ) has length dim(θ ). The dim(θ ) × dim(θ )

matrices

A(θ ) = Diag [A1 (θ ), . . . , A L (θ )] , j = 1, . . . , n ,

B(θ ) = Diag [B1 (θ ), . . . , B L (θ )] , j = 1, . . . , n ,
C(θ ) = Diag [C1 (θ1 ), . . . , C L (θ L )]

are block diagonal, and their -th diagonal blocks are size dim(θ ) × dim(θ ). The
-th blocks are, for = 1, . . . , L,

n
T
A (θ ) = − w (x j ; θ ) ∇θ log f (x j ; θ ) ∇θ log f (x ; θ )
j=1

n
T
B (θ ) = − w (x j ; θ ) ∇θ ∇θ log f (x j ; θ )
j=1
T
C (θ ) = ∇θ ∇θ f (x ; θ ) dx .
R

The OIM is obtained by evaluating (4.60) at the ML estimate θ̂ M L .

In principle, it does not matter how the ML estimate is computed. As was first
pointed out by Lewis [69], if the EM method is used, then the OIM can be computed
using the weights evaluated at the last (convergent) step of the EM algorithm. The
overall computational effort depends on the difficulty of evaluating the derivatives of
the functions f (x ; θ ). The surrogate CRB is the inverse of the OIM, so the com-
putational cost to find the CRB grows with cube of the total number of parameters,
dim(θ ), once the OIM is known.
4.7 Observed Information Matrices 103

4.7.2 Affine Gaussian Sums

The EM approach to ML estimation is especially simple to implement for affine
Gaussian sums of the form

L
λ(x ; θ ) = λ0 (x) + λ N (x ; μ , Σ ) ,
=1

where λ0 (x) is the intensity of a known PPP background process. Except for λ0 (x),
this sum is the same as (4.58) with

f (x ; θ ) = λ N (x ; μ , Σ ) , = 1, . . . , L .

After EM algorithm convergence, the OIM is easily evaluated using the weights that
are computed during the last EM iteration. Here, for simplicity, the coefficients λ
and covariance matrices Σ are specified, and only the mean vectors are estimated,
so in the above θ = μ and θ ≡ μ = (μ1 , . . . , μ L ). The only change in the
OIM calculation required to accommodate the affine term λ0 (x) is to adjust the
weights to include it in the denominator; explicitly,

λ N (x j ; μ , Σ )
w (x j ; μ) = L , j = 1, . . . , n ,
λ0 (x j ) + =1 λ N (x j ; μ , Σ ) = 1, . . . , L . (4.63)

Evaluating (4.60) gives

⎡ ⎤

n
A (μ) = − Σ−1 ⎣ w (x j ; μ) (x j − μ )(x j − μ )T ⎦ Σ−1
j=1
⎛ ⎞

n
B (μ) = ⎝ w (x j ; μ)⎠ Σ−1
j=1
% (
C (μ ) = λ Σ−1 N (x ; μ , Σ ) −Σ + (x − μ )(x − μ ) dx Σ−1
T
R
⎡ ⎤
w1 (x j ; μ) Σ1−1 (x j − μ1 )
⎢ .. ⎥
S j (μ) = ⎣ . ⎦.
w L (x j ; μ) Σ L−1 (x j − μ L )

In these equations, μ is taken to be the ML estimate μ̂ M L ≡ μ̂1 , . . . , μ̂ L .
The equations are more intuitive when written in a different form. Adding A (μ)
and B (μ) gives
104 4 Cramér-Rao Bound (CRB) for Intensity Estimates
⎛ ⎞

n
A (μ) + B (μ) = ⎝ w (x j ; μ)⎠ Σ−1 Σ − Σ̃ Σ−1 , (4.64)
j=1

where the weighted covariance matrix for the -th Gaussian term is
n
j=1 w (x j ; μ) (x j − μ )(x j − μ )T
Σ̃ = n .
j=1 w (x j ; μ)

The coefficient of (4.64) is the conditional expected number of samples that origi-
nate from the -th Gaussian component. Similarly,

C (μ) = − λ N (x ; μ , Σ ) dx Σ−1 Σ − Σ̄ Σ−1 , (4.65)
R

where the conditional covariance matrix is

N (x ; μ , Σ ) (x − μ )(x − μ )T dx
Σ̄ = R .
R N (x ; μ , Σ ) dx

The negative of the coefficient of (4.65) is the expected number of samples from the
-th Gaussian component. If the bulk of the -th Gaussian component lies within R,
the term C (μ) is approximately zero.
With these forms it is clear that the first three terms in the OIM are comparable
in some situations, in which case it is not clear whether or not their sum is positive
definite. In these situations, the fourth term is an important one in the OIM, since
the OIM as a whole must be positive definite at the ML estimate μ̂ M L . The (ρ, )-th
block component of the sum of outer products of S j (μ) is

⎡ ⎤ ⎛ ⎞

n
⎣ S j (μ) S Tj (μ)⎦ = Σρ−1 ⎝ wρ (x j ; μ) w (x j ; μ) (x j − μρ )(x j − μ )T ⎠ Σ−1 .
j=1 ρ j=1
(4.66)

The only off-diagonal block terms of the OIM come from (4.66). When the Gaussian
components are well separated, the off-diagonal blocks are small, and the OIM is
approximately block diagonal. Given good separation, then, for every the sum of
the four terms is positive definite at the ML estimate μ̂ M L .
The OIM calculation requires only L numerical integrals in this case, signifi-
cantly fewer that the analogous FIM calculation. Some of these integrals are unnec-
essary if the bulk of the Gaussian density evaluated at θ̂ lies inside R, for in this
case the integral in C (θ̂ ) vanishes. Avoiding numerical integration is an advantage
in practice, provided it is verified by simulation or otherwise that the OIM performs
satisfactorily as a surrogate for the FIM.
4.7 Observed Information Matrices 105

Obvious modifications must be made to the algebra if the estimation problem is

changed. For example, in some applications, the only estimated parameters are the
coefficients λ because both the means μ and the covariances Σ are known. The
OIM in this case is L × L. In other problems both the coefficients and the means are
estimated, in which case the OIM has exactly L more rows and columns than the
OIM for means alone.
Part II
Applications to Imaging, Tracking,
and Distributed Sensing
Chapter 5
Tomographic Imaging

There are a thousand thousand reasons to live this life, every

one of them sufficient.
Marilynne Robinson, Gilead, 2004

Abstract PPP methods for tomographic imaging are presented in this chapter. The
primary emphasis is on methods for emission tomography, but transmission tomog-
raphy is also included. The famous Shepp-Vardi algorithm for positron emission
tomography (PET) is obtained via the EM method for time-of-flight data. Single-
photon emission computed tomography (SPECT) is used in practice much more
often than PET. It differs from PET in many ways, yet the models and the mathemat-
ics of the two methods are similar. (Both PET and SPECT are also closely related
to multitarget tracking problems discussed in Chapter 6.) Transmission tomogra-
phy is the final topic discussed. The Lange-Carson algorithm is derived via the EM
method. CRBs for unbiased estimators for emission and transmission tomography
are discussed. Regularization and Grenander’s method of sieves are reviewed in the
last section.

Keywords Positron emission tomography (PET) · Image restoration · Image recon-

struction · PET with time of flight data · PET with histogram data · Shepp-Vardi
algorithm · Single photon emission computed tomography (SPECT) · EM method
· Miller-Snyder-Miller algorithm · Transmission tomography · Lange-Carson algo-
rithm · CRB for emission and transmission tomography

Emission and transmission tomographic imaging methods are discussed in this

chapter. The purpose here is twofold. One is to expose the nature of the problems
involved in image reconstruction, so the level of engineering and medical detail is
necessarily somewhat idealized from the point of view of these applications. The
other purpose is to develop algorithms for emission and transmission tomography.
Once the mathematical form of the relationship of the estimated parameters is writ-
ten down, it is perhaps not too surprising—given the methods of Chapter 3—that
both are grounded in the EM method.
Positron emission tomography (PET) is discussed first. The goal of PET is to
estimate the intensity function of a physically defined PPP from sample data. The

R.L. Streit, Poisson Point Processes, DOI 10.1007/978-1-4419-6923-1_5, 109

C Springer Science+Business Media, LLC 2010
110 5 Tomographic Imaging

intensity function is the sought-after image. The Shepp-Vardi algorithm (or, as

mentioned in the introductory chapter, the Richardson-Lucy deconvolution algo-
rithm) produces the ML estimate of the image in a pixel-by-pixel manner, an excel-
lent attribute for high resolution images. Both steps of the EM method are solved
explicitly. The small cell limit is of particular interest for applications in tracking.
PET with histogram data is discussed next. Histogram data are harder to use
than PPP sample data. Histogram data make developing an ML algorithm more
elaborate, but in the end lead to an algorithm no more difficult to implement than
the one for PPP sample data. The Shepp-Vardi algorithm, as first presented in [110],
was designed for histogram data.
Single photon emission computed tomography (SPECT) is also discussed. For
various reasons, it is used much more often diagnostically than is PET even though
it provides lower resolution than PET. Algorithms for SPECT are similar to the
algorithms for PET since both are based on the EM method, but there are several
interesting differences between them. One is that the calculus of variations is used
to solve the M-step in SPECT.
Transmission tomography is discussed in the last section. The physical nature of
the problem is significantly different from PET. The goal in this instance is to esti-
mate the attenuation coefficients of the image pixels. PPPs are involved as a known
source of photons (or particles) that are subsequently attenuated by the object. Mea-
surements comprise the numbers of photons received in an array of detectors, and
therefore constitute a realization of a PPP on a discrete state space. The points of this
discrete space are the individual detectors in the array. The Lange-Carson algorithm
is an ML estimator of the attenuation coefficients of the pixels in the image. As
in PET, it proceeds computationally in a pixel-by-pixel manner. Although it differs
considerably in its development from Shepp-Vardi, both are obtained via the EM
method. The most significant algorithmic difference is that the M-step is solved
numerically, not explicitly. Since the equations are solved pixel-by-pixel (i.e., are in
one variable only), this is not an issue in practice.
Finally, the CRBs are presented for PET and transmission tomography. They are
easily obtained from the intensity function that parameterizes the PPPs.

5.1 Positron Emission Tomography

In PET imaging, a short-lived radioisotope is injected into the blood stream

(attached to a sugar, e.g., fluorodesoxyglucose) and the sugar is absorbed, or metab-
olized. The degree of absorption varies by tissue type, but pre-cancerous cells typ-
ically show heightened metabolic activity. Pre-cancerous body tissues are imaged
by estimating the spatial density of the radioisotope. This density is directly propor-
tional to the intensity, λ(x), of radioisotope decay at the point x. Brain imaging via
PET is depicted in Fig. 5.1. The overview paper [70] discusses many issues, both
theoretical and practical, not mentioned here.
The radioisotope undergoes beta decay and emits positrons. Positrons quickly
encounter electrons (within a few millimeters), and they annihilate. Annihilation
5.1 Positron Emission Tomography 111

Fig. 5.1 PET processing configuration. (Image released to the Wikimedia Commons by J. Langner
[66])

events emit pairs of (gamma) photons that move in opposite directions. Due to con-
servation of momentum, these directions are essentially collinear if the positrons and
electrons are effectively zero velocity. Departures from straight line motion degrade
spatial resolution, so some systems model these effects to avoid losing resolution.
Straight line propagation is assumed here.
The raw measurement data are the arrival times of photons at an array of detectors
that comprise scintillator crystals and photomultipliers. A pair of photons arriving
within a sufficiently short time window (measured in nanoseconds, ns) at two appro-
priately sited detectors determine that an annihilation event occurred: the event lies
on the chord segment connecting the detectors, and the specific location on the chord
is determined by the time difference of arrivals. Photons without partners within
the time window are discarded. The measurement procedure produces occasional
spurious annihilation events.
Many subtleties attend to the data collection and preprocessing steps for PET
and SPECT. An excellent review of these issues as well as the field of emission
tomography as a whole is found in [70].
PET is now often part of a multisensor fusion system that combines other imag-
ing methodologies such as CT (computed tomography) and MRI (magnetic reso-
nance imaging). These topics are an active area of research.
Transmission tomography, more popularly known as computed tomography
(CT), uses multiple sets of fan beams or cone beams to determine the spatial dis-
tribution of material density, that is, the local spatial variability of the attenuation
112 5 Tomographic Imaging

coefficient. Both fan and cone beams employ a single source that produces radia-
tion in many directions simultaneously. This shortens the data gathering time. Fan
beams are used for two dimensional imaging problems, and their detector array is
one dimensional. Similarly, cone beams are used for three dimensional problems,
and they require a two dimensional detector array. In practice, CT scans are much
less expensive than PET because it does not require the production of short lived
radioisotopes. Contrast enhancement agents may be used to improve imaging of
certain tissue types in medical applications. PET, SPECT, and CT are used diag-
nostically for very different purposes. PET and CT provide significantly higher
resolution images than SPECT.
EM methods are discussed for PET, SPECT, and transmission tomography.
Numerical issues arise with EM algorithms when the number of estimated parame-
ters is large. For PET and SPECT, the parameters are the intensities in the array of
pixels/voxels of the image; for CT, the parameters are the pixel attenuation coeffi-
cients. Good resolution therefore requires a large number of parameters. Regular-
ization methods compatible with the EM method can alleviate many of these issues.
See [119] for further discussion of this and other topics involving medical imaging
applications of PPPs and related point processes.
A prominent alternative method for transmission tomography is Fourier recon-
struction. This approach is fully discussed in the book [28], as well as in the paper
[22]. These are Fourier analysis based methods, of which the Projection-Slice The-
orem is perhaps the best known result in the field. The approach is classical analysis
in the sense that it is ultimately grounded on the Radon transform (1917) and the
Funk-Hecke Theorem, a lovely but little known result that stimulated Radon’s inter-
est in these problems. Fourier methods are not discussed further here.

5.2 PET: Time-of-Flight Data

The Expectation-Maximization method was first proposed for emission tomography

in the 1982 paper [110]. This pioneering paper gives an excellent review of the
physics of the problem and the exceptional fidelity of the model, none of which is
repeated here. The objective is to estimate the (constant) intensity in an array of
closely spaced detectors given data in the individual detectors. This is a nontrivial
inverse problem because annihilations occurring in the volume corresponding to
the same image pixel can be and are recorded in various detector pairs. The proba-
bility that any given detector pair observes photons from annihilations in any given
pixel is the conditional measurement pdf. This pdf is assumed known. The measured
data are conditionally independent, but the image pixel intensities are parametrically
coupled via the measurement pdfs.
The great majority of current PET systems collect only the detector pairs that
observe annihilation photons arriving with energy of 511 keV within a short time
window (6–12 ns). The actual arrival times are not recorded. These detector pairs
determine a chord on which the annihilation event occurred. Compton scattering of
5.2 PET: Time-of-Flight Data 113

one or both of the gamma photons causes detections on the wrong pair of detectors,
that is, the annihilation is deemed to occur on the wrong chord.
In time-of-flight (TOF) systems, the arrival times at the detectors are recorded,
but with a much smaller time window of about 0.5 ns. The differential propagation
time data are preprocessed to estimate the locations of every positron-electron anni-
hilation along the detector chord. The time window of 0.5 ns corresponds to a local-
ization uncertainty of about 7.5 cm along the chord. These TOF PET systems have
met with only limited success in practice as of about 2003 [70].
The PET reconstruction problem is presented in two forms. The first uses PPP
sample data that comprises the estimated locations of every annihilation event, i.e.,
TOF data. The algorithm given here for this data is a variant of the original Shepp-
Vardi algorithm. The discussion is based on [119, Chapter 3]. Even though TOF
PET is not in widespread current use, it is presented here because of the insight it
provides and its intuitive mathematical appeal.
The other PET reconstruction problem uses histogram data, that is, the data com-
prises only the numbers of annihilation events measured by the detectors. These
counts are a realization of a PPP defined on the discrete space of detectors. In the
language and notation of Section 2.12.1 for PPPs on discrete spaces, the detectors
are the points of the discrete space Φ. Histogram data are used in the original paper
by Shepp and Vardi [110].

5.2.1 Image Reconstruction

5.2.1.1 Formulation
Given disjoint, bounded, nonempty sets R1 , . . . , R K , K ≥ 1. Let R =
∪rK=1 Rr ⊂ Rn x , where n x is the dimension of the image space. In the simplest
PET problem, n x = 2. Think of these sets as a close-packed grid of small cells
(pixels or voxels) in the image space.
The spatial intensity λ(x) is assumed piecewise constant:

K
λ(x) = λr Ir (x), x ∈ R, (5.1)
r =1

where λr > 0 and

%
1, if x ∈ Rr
Ir (x) =
0, if x ∈
/ Rr .

The parameter vector is Λ = (λ1 , . . . , λ K ). From the physics, the input process
is a PPP on R. Realizations of this PPP model the locations of positron-electron
annihilations. An input point x ∈ R is the location of a positron-electron annihi-
lation, and the detector array estimates that it occurred at the point y in the output
(measurement) space T ⊂ Rn y . For PET imaging, T ≡ R, but this restriction is
not used here. The pdf of the measurement is (y | x), so that T (y | x) dy = 1
114 5 Tomographic Imaging

for all x. This pdf is assumed to be known. From (2.86) and (5.1), the measurement
point process is a PPP with intensity

μ(y) = (y | x) λ(x) dx (5.2)
R

K
= λr fr (y), y∈T , (5.3)
r =1

where

fr (y) = (y | x) dx. (5.4)
Rr

The observed, or incomplete, data are the m output points

Y = (y1 , . . . , ym ) , yj ∈ T . (5.5)

Because this is TOF data, the points correspond to the estimated locations of
positron-electron annihilation events. The order of the points in Y is irrelevant, so
the incomplete data pdf is, from (2.12),

m
p(Y ; Λ) = e− T μ(y) dy
μ(y j ) (5.6)
j=1
& K '
K
m

− λr |Rr |
= e r =1 λr fr (y j ) , (5.7)
j=1 r =1

where |Rr | = Rr dx < ∞. The ML estimate of Λ is

0 ≡ arg max p(Y ; Λ).

Λ (5.8)
Λ

The ML estimate Λ 0 is computed by solving the coupled nonlinear system of equa-

tions corresponding to the necessary conditions ∇Λ p(Y ; Λ) = 0.

5.2.1.2 Uniqueness of the Estimate

The ML estimate of Λ is unique because p(Y ; Λ) is unimodal. The easy way to
see this is to show that it is strictly log-concave, i.e., log p(Y ; Λ) is concave. To
see this, differentiate log p(Y ; Λ) to find its Hessian matrix:
5.2 PET: Time-of-Flight Data 115

T
H (Λ) = ∇Λ ∇Λ log p(Y ; Λ)
⎡ ⎤
f 1 (y j )

m
1 ⎢ . ⎥ T
= − 2 ⎣ .. ⎦ f 1 (y j ) · · · f K (y j ) . (5.9)
K
j=1 r =1 λr fr (y j ) f K (y j )

The quadratic form z T H (Λ) z is strictly negative for nonzero vectors z if the num-
ber of data points m is at least as great as the number of pixels K , and the m × K
matrix

⎡ ⎤
f 1 (y1 ) f 2 (y1 ) · · · f K (y1 )
⎢ f 1 (y2 ) f 2 (y2 ) · · · f K (y2 ) ⎥
⎢ ⎥
F = ⎢ . .. .. ⎥ (5.10)
⎣ .. . . ⎦
f 1 (ym ) f 2 (ym ) · · · f K (ym )

is full rank. Because the Hessian matrix is negative definite for any Λ = 0, the
function p(Y ; Λ) is strictly log-concave, and hence also strictly concave.

5.2.1.3 E-step
The EM method is used to derive a recursive algorithm to compute the ML estimate
0 The recursion avoids directly solving the nonlinear system ∇Λ p(Y ; Λ) = 0
Λ.
directly. For the PET problem, EM yields a recursion for Λ 0 known as the Shepp-
Vardi algorithm. Because the pdf (5.7) is unimodal, the EM method is guaranteed
to converge to the ML estimate.
Let x j be the unknown location of the annihilation event that generated the
data point y j . The question is, “Which cell/pixel/voxel contains x j ?” Since this is
unknown, let the index k j ∈ {1, . . . , K } indicate the correct cell, that is, let

x j ∈ Rk j , j = 1, . . . , m. (5.11)

The missing data in the sense of EM are the indices K = {k1 , . . . , km }. The joint
pdf of (K, Y) is defined by

K
m

− λr |Rr |
p(K, Y) = e r =1 λk j f k j (y j ) , (5.12)
j=1
116 5 Tomographic Imaging

The conditional pdf of the missing data is written

p(k1 , . . . , km ; Λ) ≡ p(K | Y ; Λ)
p(K, Y ; Λ)
=
p(Y ; Λ)
m
λk j f k j (y j )
= K . (5.13)
j=1 r =1 λr fr (y j )

The dependence of the left hand expression on the data Y is suppressed to simplify
the notation. The logarithm of the joint density (5.12) is

m

log p(K, Y ; Λ) = − λr |Rr | + log λk j f k j (y j ) .
r =1 j=1

The term log f k j (y j ) is independent of the parameters {λ j }; dropping it gives the

loglikelihood function

m
L (Λ) = − λr |Rr | + log λk j . (5.14)
r =1 j=1

In some applications, the functions log f k j (y j ) are retained because they incorporate
a Bayesian a priori pdf and depend on one or more of the estimated parameters.

5.2.1.4 M-step
Let n ≥ 0 denote the EM
recursion index,
and let the initial value for the intensity
(0) (0) (0) (0)
parameter be Λ = λ1 , . . . , λ K , where λr > 0 for r = 1, . . . , K . The
EM auxiliary function is

Q Λ ; Λ(n) = E L (Λ) ; Λ(n)

K
= ··· L (Λ) p k1 , . . . , km ; Λ(n) . (5.15)
k1 =1 km =1

Substituting (5.14) and (5.13) gives, after interchanging summations and

simplifying,
⎛ ⎞

K
K
m (n)
λk f k (y j )
Q Λ ; Λ(n) = − λr |Rr | + ⎝ K ⎠ log λk . (5.16)
(n)
r =1 k=1 j=1 r =1 λ r f r (y j )
5.2 PET: Time-of-Flight Data 117

Setting ∇Λ Q Λ ; Λ(n) = 0 gives the EM update:

(n)
1

m
λk f k (y j )
λ(n+1) = K (n)
, k = 1, . . . , K . (5.17)
k
|Rk | λ f (y )
j=1 r =1 r r j

The Shepp-Vardi algorithm evaluates (5.17) until it satisfies the stipulated conver-
gence criterion. An intuitive interpretation of the recursion is given below. It is
summarized in Table 5.1.

Table 5.1 Shepp-Vardi algorithm for PPP sample data

Data are estimated locations of annihilation events: y1:m ≡ {y1 , . . . , ym }
Area of pixels |R1 |, . . . , |R K | of the vectorized PET image
Output: Λ = {λ1 , . . . , λ K } ≡ vectorized PET image in pixels R1 , . . . , R K

• Precompute the m × K matrix of cell-level integrals, F ≡ F ( j, r
– FOR annihilation event j = 1, . . . , m and pixel r = 1, . . . , K , evaluate the integral

F ( j, r ) = (y j | x) dx
Rr

– END FOR
• Initialize the PET image: Λ(0) = [λ1 (0), . . . , λ K (0)]T
• FOR EM iteration index n = 0, 1, 2, . . . until convergence:
– Update the expected detector count vector, D(n) ≡ [D1 (n), . . . , Dm (n)] :

D(n) = F Λ(n) (matrix-vector product)

– FOR pixel r = 1 : K , update the intensity of r -th pixel:

λr (n)
F ( j, r )
m
λr (n + 1) =
|R r | D j (n)
j=1

– END FOR
– Update vectorized PET image: Λ(n + 1) = [λ1 (n + 1), . . . , λ K (n + 1)]T
• END FOR EM iteration (Test for convergence)
• If converged: Estimated PET image is Λ̂ M L = Λ(nlast ) = [λ1 (nlast ), . . . , λ K (nlast )]T

5.2.2 Small Cell Limit

The expression (5.17) is an ugly ducking in that it is prettier after taking the limit as
|Rk | → 0. The limiting form for small cells (pixels or voxels) is not only insightful,
but also useful in other applications. If

(n+1)
λk → λ(n+1) (x) ,
(n)
λk → λ(n) (x) ,
118 5 Tomographic Imaging

and |Rk | → 0 as k → 0, then, from (5.4),

f k (y j )
→ (y j | x) ,
|Rk |

K
K
(n) (n) fr (y j )
λr fr (y j ) = λr |Rk | → (y j | s) λ(n) (s) ds ,
|Rk | R
r =1 r =1

and the limit of (5.17) is

m
(y j | x)
λ(n+1) (x) = λ(n) (x)
(n)
, x ∈ R. (5.18)
R (y j | s) λ (s) ds
j=1

The form of the algorithm used in the PET application is (5.17), not (5.18).

5.2.3 Intuitive Interpretation

The Shepp-Vardi recursion (5.18) has a simple intuitive interpretation. The data Y
are a realization of a PPP, so the best current estimate of the probability that y j orig-
inates from an annihilation event in the multidimensional infinitesimal (x, x + dx)
with volume |dx| is (see Section 3.2.2)

(y | x) λ(n) (x) |dx|

j . (5.19)
(n)
R (y j | s) λ (s) ds

Because there is at most one annihilation event per measurement, the sum over j
is the estimated number of annihilations originating in (x, x + dx) that generate a
measurement. Since annihilations form a PPP, this number is equated to the updated
intensity over (x, x + dx); that is, it is equated to λ(n+1) (x) |dx|. Dividing by |dx|
gives (5.18). A similar interpretation also holds for (5.17).

5.3 PET: Histogram Data

Most current PET systems record only photon detector pairs and do not collect TOF
data. In such systems, the absence of differential photon propagation times means
that the measurements of the location of the annihilation event along the chord con-
necting the detector pairs are not available. The only change to the PET problem
is that histogram data are used, not PPP sample data, but this change modifies the
likelihood function of the data. As a result the EM formulation is altered, as is the
reconstruction algorithm. These changes are detailed in this section.
5.3 PET: Histogram Data 119

5.3.1 Detectors as a Discrete Space

The pixels R1 , . . . , R K remain the same as for TOF data, but now the sample data
are accumulated by the L detectors. The available data comprise the list of integer
counts m 1:L = {m 1 , . . . , m L } of the detectors. As mentioned earlier, the histogram
data are a realization of a PPP on the discrete space of detectors. The measurement
conditional pdf ( · ) is replaced by the discrete measurement function

( j | r ) = Pr detection in cell T j | annihilation event x ∈ Rr . (5.20)

The precise location of the event x within Rr is irrelevant because the intensity of
annihilation events in pixel R j is λ j . The vector Λ of these parameters is estimated
from the measured counts m 1:L .

5.3.2 Shepp-Vardi Algorithm

5.3.2.1 Missing and Complete Data

For 1 ≤ r ≤ K and 1 ≤ j ≤ L, let m(r, j) be the number of annihilation
events that occur in pixel Rr that are subsequently detected in cell T j . These are
the complete data, denoted by m 1:K , 1:L . These numbers are not observed, hence
the index m(r, j) ranges from 0 to m j . The expected number of such events is the
intensity

E[m(r, j)] = ( j | r ) λr . (5.21)

The observed data constitute constraints on the complete data:

K
mj = m(r, j) , 1 ≤ j ≤ L. (5.22)
r =1

The intensity of the observed data in cell T j is

. K /

K
E[m j ] = E m(r, j) = E m(r, j) = ( j | r ) λr . (5.23)
r =1 r =1 r =1

The observed data are independent Poisson distributed variables, so their joint pdf
is the product
m j
K

L K r =1 ( j | r ) λr
p(m 1:L ; Λ) = e− r =1 ( j | r ) λr
. (5.24)
m j!
j=1
120 5 Tomographic Imaging

Because m j is Poisson distributed, the complete data corresponding to it, namely

{m(1, j), . . . , m(K , j)}, are also independent and Poisson distributed. Hence the
PDF of the complete data m 1:K , 1:L is

L
K
(( j | r ) λr )m(r, j)
p(m 1:K , 1:L ; Λ) = e−( j | r ) λr . (5.25)
m(r, j)!
j=1 r =1

Thus, by the definition of conditioning,

mj 1K
r =1 (( j | r ) λr )
m(r, j)

L
m(1, j) · · · m(K , j)
p(m 1:K , 1:L | m 1:L ; Λ) = m j ,
K
j=1 r =1 ( j | r ) λr
(5.26)

where the multinomial coefficient is

mj m j!
= . (5.27)
m(1, j) · · · m(K , j) m(1, j)! · · · m(K , j)!

Since the numerator of the ratio (5.26) is a multinomial expansion, it follows easily
that

p(m 1:K ,1:L | m 1:L ; Λ) = 1 , (5.28)

{m(r, j)}r,K ,j=1
L

where the sum is over all indices m(r, j) that satisfy the measurement constraints
(5.22).
The logarithm of the complete data pdf is, from (5.25),

log p(m 1:K , 1:L ; Λ)

K
= {−( j | r ) λr + m(r, j) log (( j | r ) λr ) − log m(r, j)!}
j=1 r =1
⎧ ⎛ ⎞ ⎫
K ⎨

L ⎬
= c + −λr + ⎝ m(r, j)⎠ log λr , (5.29)
⎩ ⎭
r =1 j=1

where in the last equation the constant c contains only terms not involving Λ.

5.3.2.2 E-step

Let n denote the EM iteration index, and let Λ(0) = λ(0) 1 , . . . , λ (0)
K > 0 be an
initial set of intensities. For n ≥ 0, the auxiliary function of the EM method is,
5.3 PET: Histogram Data 121

by definition,

Q Λ ; Λ(n) = E Λ(n) log p(m 1:K ,1:L ; Λ)

)
= log p m 1:K ,1:L ; Λ p m 1:K ,1:L ) m 1:L ; Λ(n) ,
{m(r, j)}r,K ,j=1
L

(5.30)

where the sum in (5.30) is over all indices m(r, j) that satisfy the L measurement
constraints (5.22). Substituting (5.29), dropping the irrelevant constant c, and using
the linearity of the expectation operator gives
⎧ ⎛ ⎞ ⎫
K ⎨

L ⎬

Q Λ ; Λ(n) = −λr + ⎝ E Λ(n) m(r, j) ⎠ log λr , (5.31)
⎩ ⎭
r =1 j=1

where the expectation in the last step is

( j | r ) λr(n)
E Λ(n) m(r, j) = m j K . (5.32)
(n)
r =1 ( j | r ) λr

The expectation (5.32) is written more intuitively as

E (n) [m(r, j)]

E Λ(n) m(r, j) = m j K Λ . (5.33)

r =1 E Λ(n) m(r , j)

The E-step is complete when (5.32) is verified below.

5.3.2.3 Finishing the E-step

Encountering identities like (5.32) is commonplace when simplifying expressions
during the E-step. A similar, but more transparent, identity is buried in the E-step of
PET between (5.15) and (5.16). Another identity, from transmission tomography, is
given by (5.84) below. Yet another example is the identity (3.18) used in estimating
superposed PPPs.
The expectation (5.32) is defined for arbitrary Λ by

E Λ m(r, j) = m(r, j) p(m 1:K ,1:L | m 1:L ; Λ).

{m(r, j)}r,K ,j=1
L

Substituting p(m 1:K ,1:L | m 1:L ; Λ) and summing over all indices except
m(1, j), . . . , and m(K , j) gives
122 5 Tomographic Imaging

E Λ m(r, j)

m j mj 1K
m(r , j)
m(1, j),...,m(K , j)=0 m(r, j) =1 ( j | r ) λr
m(1, j) · · · m(K , j) r
= m j .
K
r =1 ( j | r ) λr
(5.34)

Making the sum on m(r, j) the outermost sum gives the numerator of the ratio on
the right hand side of (5.34) as

mj

m j!
m(r, j) (( j | r ) λr )m(r, j)
m(r, j)! (m j − m(r, j))!
m(r, j)=0
⎡ ⎤m j − m(r, j)

×⎣ ( j | r ) λr ⎦ .
r =r

Canceling the factor m(r, j) in m(r, j)!, changing the index of summation to
m̃(r, j) = m(r, j) − 1, and shuffling terms yields

m j −1

(m j − 1)! (( j | r ) λr )m̃(r, j)
m j ( j | r ) λr
m̃(r, j)! (m j − 1 − m̃(r, j))!
m̃(r, j)=0
⎡ ⎤m j − 1 −m̃(r, j)

×⎣ ( j | r ) λr ⎦ .
r =r

The sum in this last expression simplifies immediately using the binomial theorem:

. /m j − 1

K
m j ( j | r ) λr ( j | r ) λr .
r =1

Substituting this form of the numerator into the expectation (5.34) and canceling
terms gives the desired expectation for any Λ. The expression (5.32) is the special
case Λ = Λ(n) . This concludes the E-step.

5.3.2.4 M-step (Shepp-Vardi Algorithm)

The maximum of Q Λ ; Λ(n) with respect to Λ is found by differentiation. The
result is the pixel level iteration
5.3 PET: Histogram Data 123

L
m j ( j | r )
λr(n+1) = λr(n) K , 1 ≤ r ≤ K. (5.35)
j=1 r =1 ( j | r ) λr(n)

This iteration is the original form of Shepp-Vardi algorithm. A summary is given in

Table 5.2.

Table 5.2 Shepp-Vardi algorithm for histogram data

Detector Count Data: m 1:L ≡ {m 1 , . . . , m L } in detectors 1, . . . , L
Output: Λ = {λ1 , . . . , λ K } ≡ vectorized PET image in pixels 1, . . . , K

• Precompute the L × K likelihood matrix: L ≡ L( j, r
– FOR detector j = 1, . . . , L and pixel r = 1, . . . , K , evaluate the likelihood
function

L( j, r ) = ( j | r )

– END FOR
• Initialize the vectorized PET image: Λ(0) = [λ1 (0), . . . , λ K (0)]T
• FOR EM iteration index n = 0, 1, 2, . . . until convergence:
– Update the expected detector count vector, D(n) ≡ [D1 (n), . . . , D L (n)] :

D(n) = L Λ(n) (matrix-vector product)

– FOR pixel r = 1 : K , update the intensity of r -th pixel:

L
L( j, r )
λr (n + 1) = λr (n) mj
D j (n)
j=1

– END FOR
– Update vectorized PET image: Λ(n + 1) = [λ1 (n + 1), . . . , λ K (n + 1)]T
• END FOR EM iteration (Test for convergence)
• If converged: Estimated PET Image is Λ(nlast ) = [λ1 (nlast ), . . . , λ K (nlast )]T

5.3.2.5 Convergence
General EM theory guarantees convergence of the iteration (5.35) to a stationary
point of the likelihood function. The strict concavity of the loglikelihood func-
tion guarantees that it converges to the global ML estimate, Λ̂ M L . To see that the
observed data pdf p(Y ; Λ) is strictly logconcave, differentiate (5.25) to find the
Hessian matrix of its logarithm:
T
H (Λ) = ∇Λ ∇Λ log p(m 1:L ; Λ)
⎡ ⎤
( j | 1)

L
mj ⎢ . ⎥ T
= − 2 ⎣ .. ⎦ ( j | 1) · · · ( j | K ) .
K
j=1 r =1 ( j | r ) λ j ( j | K )
(5.36)
124 5 Tomographic Imaging

The quadratic form z T H (Λ) z is strictly negative for nonzero vectors z if the L × K
likelihood matrix
⎡ ⎤
(1 | 1) (1 | 2) · · · (1 | K )
⎢ (2 | 1) (2 | 2) · · · (2 | K ) ⎥
⎢ ⎥
L = ⎢ .. .. .. ⎥ (5.37)
⎣ . . . ⎦
(L | K ) (L | K ) · · · (L | K )

is full rank. Because the Hessian matrix is negative definite for any Λ = 0, the
observed data pdf is strictly concave.

5.4 Single-Photon Computed Emission Tomography (SPECT)

A older and lower resolution imaging procedure is called SPECT (single photon
emission computed tomography). In SPECT, a radioisotope (most commonly, tech-
netium 99) is introduced into the body. As the isotope decays, gamma photons are
emitted in all directions. A gamma camera (also often called an Anger camera after
its developer, Hal Anger, in 1957) takes a “snapshot” of the photons emitted in the
direction of the camera. Unlike PET which can be treated as a stack of two dimen-
sional slice problems, SPECT requires solving an inherently a three dimensional
reconstruction problem.

5.4.1 Gamma Cameras

A simplistic depiction of a gamma camera is given in Fig. 5.2. The camera com-
prises several parts:

• The emitted gamma photons are collimated using one of several methods (a
lead plate with parallel drilled holes is common). Fewer than 1% of the incident
gamma photons emerge from the collimator.
• Photons that emerge enter a thallium-activated sodium iodide (NaI(T1)) crystal.
The NaI(T1) crystal is a flat circular plate approximately 1 cm thick and 30 cm
in diameter. (The thinner the crystal, the better the resolution but the less the
efficiency.) If a gamma photon is fully absorbed by an atom in the crystal (pho-
toelectric effect), the atom ejects an electron; if it is partially absorbed (Compton
effect), the atom ejects an electron and another gamma photon.
• Ejected electrons encounter other atoms in the crystalline lattice and produce
visible light in a physical process called scintillation. The number scintillated
light photons is proportional to the energy incident on the crystal.
• The scintillated photons are few in number, so an array of hexagonally-packed
photo-multiplier tubes (PMTs) are affixed to the back of the crystal to boost
the count (without also boosting the noise level). Typical gamma cameras use
5.4 Single-Photon Computed Emission Tomography (SPECT) 125

Fig. 5.2 Sketch of the basic components of a gamma (Anger) camera used in SPECT

between 37 and 120 PMTs. The face of a PMT ranges in size from 5 to 7 cm
across.
• Position logic circuits are used to estimate the locations of the crystal atoms that
absorb gamma photons. An estimation procedure is necessary because several
PMTs detect light from the same event. The position estimate is a convex combi-
nation of the PMT data.

The output is a two dimensional “snapshot” of the estimated locations of the atoms
that absorb gamma photons in the NaI(T1) crystal. The estimated locations depend
on the location of the camera.
In a typical SPECT imaging procedure, the gamma camera is moved to a fixed
number of different viewing positions around the object (and, naturally, is never in
physical contact with it). A snapshot is made at each camera position. The multiple
view snapshots are used to reconstruct the image. The reconstructed SPECT image
is the estimated intensity of radioisotope decay within the three dimensional volume
of the imaged object.
The clinical use of SPECT is well established. A common use of SPECT
is cardiac blood perfusion studies, in which an ECG/EKG (electrocardio-
gram/elektrokardiogram) acquires data from a beating heart and the heart rhythm
is used to time gate the SPECT data collection procedure.
SPECT is much more widely used than PET. There are several reasons for this
difference. One is that SPECT and PET are used for diagnostically different pur-
poses. Many diagnostic needs do not require the high resolution provided by PET.
126 5 Tomographic Imaging

Another is that the radioisotopes needed for SPECT are readily accessible compared
to those for PET. Yet another reason is simply the cost—SPECT procedures are rel-
atively inexpensive. In 2009, SPECT procedures cost on the order of US$500 and
were often performed in physician offices and walk-in medical facilities. In contrast,
PET cost US$30,000 or more and required specialized hospital-based equipment
and staff.

5.4.2 Image Reconstruction

A relatively recent overview of methods for SPECT, as well as PET, is given in

[70]. The discussion here follows the outline of the method first used in [82], with
appropriate modifications.

5.4.2.1 Problem Formulation and Data

The data used by SPECT are the set of snapshots made by the gamma camera. Each
snapshot comprises a list of the position logic circuit estimates of the locations of
atoms that absorb gamma photons. A Cartesian coordinate system x = (x1 , x2 , x3 )
is adopted for the imaged object, where x3 is the horizontal axis. For simplicity, the
origin is taken interior to the object. The goal of SPECT is to compute the ML
estimate of the intensity of radioisotope decay at the location x, denoted λ(x), x ∈
R3 . The ML estimate is found using an algorithm derived using the EM method.
A pleasing aspect of the derivation is that the M-step is solved by the calculus of
variations.
The camera is rotated about the x3 -axis. The face (the NaI(T1) crystal) is kept
parallel to the x3 -axis, and the locus of a specified point P on the camera face (e.g.,
the center) is constrained to lie a circle in the x1 –x2 plane centered at the origin. The
camera viewing positions correspond to the angles

Θ ≡ θ0 , θ1 , . . . , θn Θ −1 ,

where n Θ is the fixed number of camera viewing positions. See Fig. 5.3. Let F(θ j )
denote the two dimensional plane of the camera face at view angle θ j . An arbitrary
point in F(θ j ) is denoted by (y, θ j ), where y ≡ (y1 , y2 ) ∈ R2 is a two-dimensional
coordinate system. The data comprising the snapshot at view angle θ j are the events
in the list

Sj = y j 1 , θ j , y j 2 , θ j , . . . , y j, n j , θ j ⊂ R3 , (5.38)

where n j ≥ 1 is the number of position estimates for the snapshot at camera angle
θ j . Let S ≡ S0 , S1 , , . . . , Sn Θ −1 . The intensity function λ(x) is estimated from
the data S.
The intensity functions needed to formulate the SPECT likelihood function are
defined on the Cartesian product R3 × F(θ j ) of decay points x in the imaged object
5.4 Single-Photon Computed Emission Tomography (SPECT) 127

Fig. 5.3 Geometry and coordinates for SPECT imaging of a three dimensional object for a gamma
camera with view angle θ. The center P of the camera face lies in the x1 –x2 plane for all camera
view angles θ

and observed gamma photon absorption points y on the crystal face of the camera
at view angle θ j . Let μ(x, y, θ j ) denote the intensity function of the j-th PPP.
There are n Θ of these PPPs. In terms of these intensity functions, the intensity to be
estimated is

Θ −1
n

λ(x) = μ(x, y, θ j ) dy. (5.39)
j=0 F(θ j )

The j-th integral in (5.39) is a double integral over the coordinates y ∈ F(θ j ) that
correspond to points on the camera face.
The intensity function μ(x, y, θ j ) is the superposition of the intensity functions
of two independent PPPs. One is determined by the detected photons. Its intensity
function is denoted by μ0 (x, y, θ j ). The other is determined by the photons that
arrive at the camera face but are not detected. The intensity function of the unde-
tected photons is denoted by μ1 (x, y, θ j ). Thus,

μ(x, y, θ j ) = μ0 (x, y, θ j ) + μ1 (x, y, θ j ) (5.40)

Both μ0 ( · ) and μ1 ( · ) are expressed in terms of three input functions. These inputs
are assumed known. They are discussed next.

5.4.2.2 Specified Functions

Several functions are required inputs to SPECT imaging. One is the pdf of the posi-
tion estimate y. This density is known, or in principle knowable, from the physics
128 5 Tomographic Imaging

and the engineering details of the system design. Its detailed mathematical form
depends on where the decay x occurred and the camera position θ . It is denoted by
pY |X Θ (y | x, θ ). A significant difference between this pdf and the analogous pdf for
PET is that it is depth dependent—it broadens with increasing distance between the
decay point x and the camera face. This pdf is assumed known.
Another function is the survival probability function
⎡ ⎤
Photon emitted at decay point x moving
β(x, y, θ ) = Pr ⎣ toward location y on the camera face at ⎦ . (5.41)
view angle θ arrives at the camera face

The survival function depends on many factors. These factors include the efficiency
of the detector and the three dimensional attenuation density of the object being
imaged. The efficiency is determined as part of a calibration procedure. The atten-
uation is clearly a complex issue since it depends on the object being imaged. It is,
moreover, important for successful imaging. Methods for estimating it are discussed
in some detail in [82] and the references therein. The function β(x, y, θ ) is assumed
known. )
The third required function is the conditional pdf pΘ|X θ j ) x . It is the fraction
of photons emanating from x that propagate toward the camera at view angle θ j .
This fraction is dependent on geometric factors such as the solid angle subtended
by the camera. It is also weighted by the relative length of the “dwell times” of the
camera at the camera angles Θ. It is assumed known.

5.4.2.3 Likelihood Function

The intensity functions of the undetected and detected decay PPPs are given by
) )
μ0 (x, y, θ j ) = pY |X Θ y ) x, θ j pΘ|X θ j ) x λ(x) 1 − β(x, y, θ j ) (5.42)

and
) )
μ1 (x, y, θ j ) = pY |X Θ y ) x, θ j pΘ|X θ j ) x λ(x) β(x, y, θ j ) , (5.43)

respectively. These PPPs are independent because they result from an independent
thinning process determined by β(x, y, θ j ) (cf. Examples 2.6 and 2.7). Their sum
satisfies (5.40). The identity

Θ −1
n

μ(x, y, θ j ) dx dy = λ(x) dx (5.44)
j=0 R3 ×F(θ j ) R3

is used shortly.
The j-th snapshot S j is a realization of a PPP on the two dimensional camera
face, that is, on the plane of the camera face at view angle θ j . The photons arriving
5.4 Single-Photon Computed Emission Tomography (SPECT) 129

at the camera face form a PPP. The intensity of this PPP is

μ(y, θ j ) = μ(x, y, θ j ) dx. (5.45)
R3

These intensities are defined only for values of y ∈ F(θ j ) ⊂ R3 . In light of the
relationship (5.39), the function

, 2 n j −1

L S j (λ) = exp − μ(y, θ j ) dy μ(y jr , θ j )
F(θ j ) r =1
, 2 n j −1

= exp − μ(x, y, θ j ) dx dy μ(x, y jr , θ j ) dx
R3 ×F(θ j ) r =1 R3
(5.46)

is the likelihood function for λ(x) given the data S j . The PPPs for different camera
view angles are independent, so the product

Θ −1
n
LS (λ) = L S j (λ)
j=0
⎧ ⎫
⎨ Θ −1
n
⎬ n
Θ −1
nj
= exp − μ(x, y, θ j ) dx dy μ(y jr , θ j )
⎩ R3 ×F(θ j ) ⎭
j=0 j=0 r =1
% ( n nj
Θ −1
= exp − λ(x) dx μ(x, y jr , θ j ) dx (5.47)
R3 j=0 r =1 R3

is the likelihood function of λ(x) given all the snapshots.

5.4.2.4 Missing Data

Missing data (in the EM sense) can be defined in many ways. The choice that seems
most appropriate in this problem are defined, for each view angle θ j , as follows:

• The number N j (0) ≥ 0 of gamma photons that reach the camera but are not
detected;
• The locations {y jr : r = n j + 1, . . . , n j + N j (0)} at which the undetected
gamma photons exit the crystal face of the camera;
• The locations {x jr : r = 1, . . . , n j + N j (0)} of the decays that generated the
detected and undetected gamma photons.
130 5 Tomographic Imaging

The complete data (in the EM sense) for the j-th snapshot are

S j = x j 1 , y j 1 , θ j , x j 2 , y j 2 , θ j , . . . , x j, n j + N j , y j, n j + N j , θ j ,
(5.48)

where x jr ∈ R3 . The points (y jr , θ j ) correspond to undetected decays

* + for r > n j .
The likelihood function of λ(x) given the complete data S ≡ S j for all snap-
shots is, using (5.47),
% (
LScom
(λ) = exp − λ(x) dx
⎧R
3
⎫
n −1 ⎨n n j +N j (0) ⎬
Θ j

× μ1 (x jr , y jr , θ j ) μ0 (x jr , y jr , θ j ) . (5.49)
⎩ ⎭
j=0 r =1 r =n j +1

The logarithm is

,
Θ −1
n
nj

log LS (λ) = −
com
λ(x) dx + log μ1 (x jr , y jr , θ j )
R3 j=0 r =1
⎫
n j +N j (0) ⎬

+ log μ0 (x jr , y jr , θ j ) . (5.50)
⎭
r =n j +1

Using (5.42) and (5.43) gives the loglikelihood function of λ(x) as

⎧ ⎫
Θ −1 ⎨

n
nj n j +N j (0)

⎬
log LScom
(λ) = c − λ(x)dx + log λ(x jr ) + log λ(x jr ) ,
R3 ⎩ ⎭
j=0 r =1 r =n j +1
(5.51)

where c contains terms independent of the intensity function λ(x).

5.4.2.5 E-step
Let m ≥ 0 denote the EM iteration index, and let λ(0) (x) > 0 be specified. The
auxiliary function of the EM method is defined as the conditional expectation of
(5.50):
)
Q λ ) λ(m) = E λ(m) log LScom
(λ)

= c − λ(x) dx + A + B , (5.52)
R3
5.4 Single-Photon Computed Emission Tomography (SPECT) 131

where
⎡ ⎤
Θ −1
n
n j +N j (0)

A = E λ(m) ⎣ log λ(x jr )⎦ (5.53)

j=0 r =n j +1

. /
Θ −1
n
nj

B = E λ(m) log λ(x jr ) . (5.54)

j=0 r =1

The expectations A and B are evaluated differently because the number of terms in
the r sum is both random and missing, while the number of terms in the r sum is
specified.
To evaluate A, note that the j-th expectation in (5.53) is the expectation of a
random sum, all of whose summands are λ(x), with respect to the undetected tar-
get PPP on the camera face with view angle θ j . The number of terms in the sum
is Poisson distributed; therefore, replacing λ(x) in (5.42) with λ(m) (x) and using
Campbell’s Theorem gives

Θ −1
n

A = log λ(x)
j=0 R3 ×F(θ j )

)
× pY |X Θ y ) x, θ j pΘ|X θ j | x λ(m) (x) 1 − β(x, y, θ j ) dx dy

= λ(m) (x) log λ(x)
R3
⎧ ⎫
Θ −1
⎨n
) ⎬
× pY | X Θ y ) x, θ j pΘ|X θ j | x 1 − β(x, y, θ j ) dy dx
⎩ F(θ j ) ⎭
j=0

= 1 − β̄(x) λ(m) (x) log λ(x) dx , (5.55)
R3

where

Θ −1
n

)
β̄(x) = pY |X Θ y ) x, θ j pΘ|X θ j | x β(x, y, θ j ) dy (5.56)
j=0 F(θ j )

is the mean survival probability of detected gamma photons originating from decays
at x. Equivalently,

Θ −1
n

β̄(x) = pΘ|X θ j | x β j (x) , (5.57)
j=0
132 5 Tomographic Imaging

where

)
β j (x) = pY |X Θ y ) x, θ j β(x, y, θ j ) dy (5.58)
F(θ j )

is the probability that gamma photons generated from decays at x are detected in
the camera at view angle θ j .
Evaluating B requires using methods akin to those used in Chapter 3. From
(5.47) and (5.49), the conditional pdf of the missing points x jr for r ≤ n j and
all j is

Θ −1
n nj
w j (x jr | y jr , θ j ) , (5.59)
j=0 r =1

where the weights are, for j = 0, 1, . . . , n Θ − 1,

It is straightforward to verify that the conditional expectation is

, . nj /. nj / 2
Θ −1
n

B = ··· log λ(x jr ) w j (x jr | y jr , θ j ) dx j 1 · · · dx j, n j
j=0 R3 R3 r =1 r =1

Θ −1

n
nj
= w j (x | y jr , θ j ) log λ(x) dx
j=0 r =1 R3

= f (m) (x) log λ(x) dx , (5.61)
R3

where

Θ −1

n
nj
(m)
f (x) = w j (x | y jr , θ j ). (5.62)
j=0 r =1

The intuitive interpretation of f (m) (x) is that it is the expected number of decays at
the point x given the data from all camera view angles.
5.4 Single-Photon Computed Emission Tomography (SPECT) 133

Substituting the expressions for A and B into (5.52) gives the final form of the
auxiliary function as
)
)
Q λ ) λ(m)
* +

= c + −λ(x) + 1 − β̄(x) λ(m) (x) + f (m) (x) log λ(x) dx.
R3
(5.63)

This completes the E-step.

5.4.2.6 M-step
The EM update is the solution of the maximization problem defined by
)
)
λ(m+1) (x) = arg max Q λ ) λ(m) (5.64)
λ>0

Maximizing Q with respect to λ(x) is a straightforward problem in the calculus of

variations. Let δ(x) be an arbitrary “variation” about the solution λ(m+1) (x) such
that λ(m+1) (x) + δ(x) > 0 for sufficiently small . Then
) *
)
Q λ(m+1) + δ(x) ) λ(m) = c + −λ(m+1) (x) − δ(x)
R3
+
+ 1 − β̄(x) λ(m) (x) + f (m) (x) log λ(m+1) (x) + δ(x) dx.
(5.65)

For a specified variation δ(x), the maximum of (5.65) with respect to must occur
at = 0, for otherwise λ(m+1) (x) is not the solution of the M-step. Evaluating the
derivative of (5.65) with respect to at zero gives
. /
1 − β̄(x) λ(m) (x) + f (m) (x)
δ(x) −1 + dx = 0. (5.66)
R3 λ(m+1) (x)

Since (5.66) must hold for all variations δ(x), it follows that

1 − β̄(x) λ(m) (x) + f (m) (x)
−1 + ≡ 0 for all x. (5.67)
λ(m+1) (x)

The EM update for the SPECT intensity function estimate is

λ(m+1) (x) = 1 − β̄(x) λ(m) (x) + f (m) (x). (5.68)

To facilitate later discussion (in Section 6.5), the recursion (5.68) is rewritten as a
sum over the camera view angles:
134 5 Tomographic Imaging

.
Θ −1
n

(m+1) (m)

λ (x) = λ (x) pΘ|X θ j | x 1 − β j (x)
j=0
) /
pY |X Θ y jr ) x, θ j β(x, y jr , θ j )
nj

+ ) .
) (m) (x) dx
r =1 R3 pY |X Θ y jr x, θ j pΘ|X θ j | x β(x, y jr , θ j ) λ
(5.69)

This completes the M-step.

The SPECT recursion is similar to estimators that are discussed elsewhere for
other applications. One is the Shepp-Vardi recursion for PET. The similarity is
especially evident for n Θ = 1. Another is the information update of a multisensor
multitarget tracking filter. This application is discussed in Chapter 6. The similarity
arises because the different camera positions are analogous to multiple sensors.

5.5 Transmission Tomography

5.5.1 Background

In transmission tomography the photon source is external to the tissue or other object
being imaged, so the photon count can be much higher than in PET. It is desirable
to minimize patient radiation exposure, so there are limits on how much the photon
count can be increased.
The estimated parameters are no longer the PPP intensities in a pixel grid, but
rather the attenuation coefficients of the pixels in the grid. PET measures radioac-
tivity concentration and, thus, preferentially images tissues with higher metabolic
rates. Transmission tomography measures x-ray absorption, which is related to
the average electron density (or effective atomic number) of the tissue. Readable
accounts of the physics involved are found in the book by Ter-Pogossian [131]. A
more recent discussion of effective atomic numbers and electron density is given in
[141].
As in PET, the image pixels are denoted by Rr , r = 1, . . . , K . Let μr be
the attenuation coefficient in Rr . The probability that a photon is absorbed while
traversing a line segment of length l lying inside Rr is 1 − e−μr l , so μr has units
of inverse length. Hence, the probability that a photon is not absorbed while travers-
ing an interval of length l inside Rr is e− μr l . Imaging in transmission tomogra-
phy is equivalent to estimating the attenuation coefficients μ = (μ1 , . . . , μ K ).
The EM method is used to estimate the attenuation coefficients. For various
reasons this problem is significantly more difficult than the PET algorithm. The
approach given here parallels that of Lange and Carson [65], who were first use of
the EM method for transmission tomography. In any event, the method here is very
interesting and potentially of utility in applications other than transmission tomog-
raphy. The word photon is used throughout the discussion because it matches the
5.5 Transmission Tomography 135

attenuation assumptions; however, photons are only placeholders for any quantity
of interest that attenuates linearly.

5.5.2 Lange-Carson Algorithm

5.5.2.1 Attenuation and the Likelihood Function

Let L be the number of detectors. A detector is assumed to receive photons from
only one source. This assumption is a limiting factor in some applications since it
implies that the photon beam is very narrow and does not spread across multiple
detectors. The source intensity typically varies across the emitted fan or cone beam.
Let α j > 0 denote the intensity emitted in the direction of detector j. These inten-
sities are assumed known since they can be measured before inserting the object to
be imaged into the viewing field. The expected number of photons emitted toward
detector j in the time interval t over which the data are aggregated in the detectors
is

d j = α j t,

so α j has units of number per unit time. The number of photons emitted by source
j is assumed to be Poisson distributed with parameter d j .
The number of photons arriving at the detectors in the array is denoted by

m 1:L = (m 1 , . . . , m L ).

A photon emitted by source j traverses a number of pixels on its way to detector j.

Denote the indices of these pixels by I j ⊂ {1, . . . , K }. The line of flight of the
photon from source j to detector j is a straight line intersecting every pixel in I j .
Let l jr , r ∈ I j , denote the length of the traversal in pixel Rr for source-detector
pair j. Let l jr = 0 for r ∈/ I j . The probability that a photon from source j reaches
detector j is the product of probabilities that it successfully traversed each of the
cells in its flight path:
⎛ ⎞

e− l jr μr = exp ⎝− l jr μr ⎠.
r ∈I j r ∈I j

Attenuation is a thinning process, so the number of photons arriving at detector j is

Poisson distributed with parameter
⎛ ⎞

d j exp ⎝− l jr μr ⎠. (5.70)
r ∈I j
136 5 Tomographic Imaging

The detector counts in m 1:K are independent, so the loglikelihood of m 1:K is

⎧ ⎛ ⎞ ⎫
L ⎨

⎬
log p(m 1:L ; Λ) = − d j exp ⎝− l jr μr ⎠ − m j l jr μr ,
⎩ ⎭
j=1 r ∈I j r ∈I j
(5.71)

where the constants mj log d j − log mj ! are omitted because they do not depend
on Λ.

5.5.2.2 EM Convergence
The loglikelihood function (5.71) is strictly concave. To see this, differentiate to find
the (u, v)-th entry of the K × K Hessian matrix:
⎛ ⎞
∂2

log p(m 1:L ; Λ) = − a ju d j exp ⎝− l jr μr ⎠ a jv , (5.72)

∂μu ∂μv
j=1 r ∈I j

where a jk = l jk if k ∈ I j and a jk = 0 otherwise. The quadratic form of this

matrix is strictly negative definite for all Λ if the L × K matrix of cell traversal
lengths
⎡ ⎤
l11 l12 · · · l1K
⎢ l21 l22 · · · l2K ⎥
⎢ ⎥
L = ⎢ . .. .. ⎥ (5.73)
⎣ .. . . ⎦
l L1 l L2 · · · l L K

is full rank. Many entries of the traversal length matrix are zero, since the indices of
the nonzero entries of row j are identical to the indices in I j . The matrix is full rank
if the configuration of pixels and source-detector pairs is properly designed. The full
rank condition is assumed to be incorporated into the system. Consequently, the EM
method is guaranteed to converge to the global ML estimate.

5.5.2.3 E-step: Missing Data Likelihood Function

The indices in I j are ordered arbitrarily via the labeling assigned to the pixels of
the object being imaged. This is not the same as the order in which the pixels are
traversed by photons propagating from source j to detector j. The traversal order is
denoted by

I j (1) < I j (2) < · · · < I j (T j − 1) , (5.74)

5.5 Transmission Tomography 137

where T j − 1 is the total number of indices in I j . To be clear, the pixel adjacent to

the source is RI j (1) , and the pixel adjacent to the detector is RI j (T j −1) .
Although other choices for the missing data are possible, the natural choice
here is the set of numbers m j (t), t = 1, . . . , T j − 1, of photons entering pixel
RI j (t) . The number m j (1) is the number of photons emitted by the source. The
number m j (T j ) is identified with the number of detected photons. The numbers
m j (t), t = 1, . . . , T j , j = 1, . . . , L , are random variables. Their conditioning
on the observed measurements m j (T j ) = m j is discussed in the next subsection.
Because of attenuation,

m j (1) ≤ m j (2) ≤ · · · ≤ m j (T j − 1) ≤ m j (T j ). (5.75)

The probability that the j-th source emits m j (1) photons is

m j (1)
dj
e−d j .
m j (1)!

A photon successfully transits pixel Rt with probability e−l jt μt , so the probability

that m j (t + 1) photons are not absorbed and that m j (t) − m j (t + 1) are absorbed
is the binomial term:
m j (t+1) m j (t) − m j (t+1)
m j (t)
e−l jt μt 1 − e−l jt μt .
m j (t + 1)

The likelihood functions of the missing data of the j-th source-detector pair is
therefore

m j (1) T j −1 m j (t+1) m j (t) − m j (t+1)

dj m j (t)
−d j
pj = e e−l jt μt 1 − e−l jt μt .
m j (1)! m j (t + 1)
t=1
(5.76)

The source-detector pairs are independent, so the joint likelihood function is the
product:

L
p = pj. (5.77)
j=1

Retaining only terms that depend on the attenuation coefficients μ gives the com-
plete data loglikelihood function in the form
138 5 Tomographic Imaging
⎧
j −1
L ⎨T

L (μ) = m j (t + 1) log e−l jt μt

⎩
j=1 t=1
⎫
⎬
+ (m j (t) − m j (t + 1)) log 1 − e−l jt μt . (5.78)
⎭

The auxiliary function of the EM method is the conditional expectation of (5.78).

Evaluating this expectation requires an explicit expression for the conditional expec-
tations E[m j (t) | m j (T j ) = m j ]. This result is obtained in the next subsection.

5.5.2.4 E-step: Conditional Expectation Identity

The joint probability of m j (t) and m j (T j ) is, by the definition of conditioning,

Pr[m j (t), m j (T j ) ; μ] = Pr[m j (t) ; μ] Pr[m j (T j ) | m j (t) ; μ]. (5.79)

The random variable m j (t) is Poisson distributed with mean

& '

t−1
E[m j (t) ; μ] = d j exp − l j,I j (t ) μI j (t )
t =1
= γ jt ≡ γ jt (μ). (5.80)

The observed photon count m j (T j ) conditioned on m j (t) is binomially distributed

because
⎛ ⎞
T j −1

γ j Tj
exp ⎝− l j,I j (t ) μI j (t ) ⎠ = . (5.81)
γ jt
t =t

is the probability of a photon successfully traversing pixel RI j (t) and also all pixels
it encounters on the way to the j-th detector. The joint pdf of m j (T j ) and m j (t) is,
from (5.79),

Pr[m j (t), m j (T j ) ; μ]
, m j (t) 2 , 2
−γ jt γ jt m j (t) γ j Tj m j γ j T j m j (t)−m j (T j )
= e 1− .
m j (t)! m j (T j ) γ jt γ jt
(5.82)

The conditional pdf given the measurement m j (T j ) = m j is, using (5.82),

5.5 Transmission Tomography 139

Pr[m j (t), m j (T j ) = m j ; μ]
Pr[m j (t) | m j (T j ) = m j ; μ] =
Pr[m j (T j ) = m j ; μ]
% (% γ j T m j (
m j (t) γ j T j m j (t) − m j
e−γ jt ( mjt j)(t)!
γ m j (t)!
m j ! (m j (t) − m j )! γ jt
j
1 − γ jt
= m j
−γ j T j γ jTj
e m j!
m j (t) − m j
−(γ jt − γ j T j ) γ jt − γ j
= e . (5.83)
m j (t) − m j !

Since m j (t) ≥ m j , the expected value of (5.83) is

E m j (t) | m j ; μ = m j (t) Pr[m j (t) | m j ; μ]
m j (t)=m j

= m j + γ jt (μ) − γ j T j (μ). (5.84)

To see this, substitute (5.83) and simplify. The details are straightforward and are
omitted.

5.5.2.5 E-step: Auxiliary Function

(0)
Let n ≥ 0 denote the EM iteration index, and let μ(0) = μ(0) , . . . , μ K > 0 be
an initial value for the attenuation coefficients. The EM auxiliary function is defined
as

Q μ ; μ(n) = E μ(n) [L (μ)] .

Substituting (5.78) and using the conditional expectations

M̃ jt μ(n) = m j − γ j T j μ(n) + γ jt μ(n)

Ñ jt μ(n) = m j − γ j T j μ(n) + γ j,t+1 μ(n) ,

gives

j −1 *
L T

Q μ ; μ(n) = Ñ jt μ(n) log e−l jt μt
j=1 t=1
+
+ M̃ jt μ(n) − Ñ jt μ(n) log 1 − e−l jt μt . (5.85)

This form is awkward because the traversal ordering (5.74) does not naturally
aggregate the parameter μt into separate parts of the expression. Rewriting
140 5 Tomographic Imaging

Q μ ; μ(n) in the original pixel ordering is more convenient for this purpose, and
is straightforward.
For r = 1, . . . , K , let M jr μ(n) and N jr μ(n) denote the expected num-
bers of photons entering and exiting pixel r for source-detector pair j, conditioned
on the measurements m 1:K and given the parameter vector μ(n)*. These expecta-
+
tions are evaluated by appropriately permuting the expectations M̃ jt μ(n) and
* +
Ñ jt μ(n) .
Now let Jr denote the indices j of the source-detector pairs whose photons
traverse pixel r . Then (5.85) becomes

K
*
Q μ ; μ(n) = N jr μ(n) log e−l jr μr
r =1 j∈Jr
+
+ M jr μ(n) − N jr μ(n) log 1 − e−l jr μr . (5.86)

This completes the E-step.

5.5.2.6 M-step
The EM recursive update is found by setting the gradient of (5.86) with respect
to μ equal to zero. EM works like magic here—the gradient equations decouple
completely into K equations, one for each attenuation coefficient. The equation for
the update of μr is

l jr

M jr μ(n) − N jr μ(n) = N jr μ(n) l jr . (5.87)
el jr μr − 1
j∈Jr j∈Jr

(n+1)
The solution of this transcendental equation is the EM update μr . The solution
of this equation is unique. To see this, observe that the left hand side of (5.87)
decreases monotonically from +∞ to zero as μ goes from zero to +∞. Since the
right hand side is a positive constant for every EM iteration n, the equation (5.87)
has a unique solution.
An explicit analytical solution of (5.87) is not available, but an explicit solution is
unnecessary for EM theory to work. Given the monotonicity of the left hand side of
the equation, any number of numerical methods will provide the solution to arbitrary
accuracy.
General EM theory guarantees that the iterates converge to a stationary point of
the likelihood function. Because of the strict concavity of the loglikelihood function,
this stationary point is the global ML estimate, μ̂ M L .
Solving the M-step numerically to arbitrary precision makes the Lange-Carson
algorithm an exact EM algorithm. Lange and Carson in their original 1984 paper
5.5 Transmission Tomography 141

[65] introduce an approximate solution to (5.87), but there is no longer any practical
reason to use it.
Improvements to the Lange-Carson algorithm are discussed in [91] and reviewed
in [34]. These include numerical algorithms to solve (5.87), methods to increase the
convergence rate, and regularization to improve image quality.
A summary of the Lange-Carson algorithm is given in Table 5.3.

Table 5.3 Lange-Carson algorithm for transmission tomography

Data:
• Counts of detected events in L detectors: m 1:L ≡ {m 1 , . . . , m L }
• Number of photons emitted by source j aimed at detector j: {d1 , . . . , d L }
• Traversal length matrix L = l jr : See (5.73).
• Traverse order: For j = 1, . . . , L, I j = (I j (1), . . . , I j (T j − 1)): See (5.74).
Output:

• μ̂ M L = μ̂1 , . . . , μ̂ K ≡ vectorized image of attenuation coefficients in pixels
R1 , . . . , R K
Algorithm:
• Initialize the attenuation coefficient image: μ(0) = (μ1 (0), . . . , μ K (0))
• FOR EM iteration index n = 0, 1, 2, . . . until convergence:
– FOR j = 1 : L
• γ ( j, 1) = d j
• FOR t = 1 : T j − 1

γ ( j, t + 1) = γ ( j, t) exp −l j,I j (t) μI j (t)

• END FOR
• FOR r = 1 : K

M( j, r ) = N ( j, r ) = 0

• END FOR
• FOR t = 1 : T j − 1

M j, I j (t) = m j − γ j, I j (T j ) + γ j, I j (t)
N j, I j (t) = m j − γ j, I j (T j ) + γ j, I j (t + 1)

• END FOR
– END FOR
– FOR r = 1 : K , solve for the update μr (n + 1) of the r -th attenuation coefficient

L
l jr

L
(M( j, r ) − N ( j, r )) = N ( j, r ) l jr
j=1
e l jr μr − 1 j=1

– END FOR
– Update vectorized attenuation image: μ(n + 1) = (μ1 (n + 1), . . . , μ K (n + 1))
• END FOR EM iteration (Test for convergence)
• If converged: Estimated attenuation coefficient image is μ̂ M L = (μ1 (nlast ), . . . , μ K (nlast ))
142 5 Tomographic Imaging

5.6 CRBs for Emission and Transmission Tomography

The FIM and CRB are probably most useful during system design phase of imaging
system development, or perhaps for the control and analysis of alternative deploy-
able system configurations. In any event, showing that the CRB matrix of a pro-
posed system (or system configuration) is diagonally dominant with small variances
establishes that it has the potential to estimate high quality images. This statement
assumes that an efficient estimator (one that achieves the CRB) exists and is known.
The expressions above yield explicit expressions for the FIM for the PET prob-
lem for both sample data and histogram data. For sample data, the intensity is given
by (5.3). Hence, using (4.21), the (i, j)-th entry of the K × K FIM is

f i (y) f j (y)
Ji j (Λ) = K dy. (5.88)
R r =1 λr fr (y)

The function fr (y) is given by (5.4). The FIM is determined by the likelihood func-
tion ( · ) and by the sizes and locations of the pixels. Similarly, for histogram data,
the piecewise constant intensity is such that

K
λ(s) ds = ( j | r ) λr . (5.89)
Rj r =1

Using (4.28) gives the K × K FIM as

⎡ ⎤
( j | 1)

L
1 ⎢ .. ⎥ T
Jhist (Λ) = K ⎣ . ⎦ ( j | 1) · · · ( j | K ) . (5.90)
r =1 ( j | r ) λr
j=1 ( j | K )

The (i, j)-th entry of Jhist (Λ) is closely related to (5.88).

In the case of transmission tomography, the PPP is defined on the discrete space
of all source-detector pairs. The appropriate FIM is therefore (4.35). It follows from
(5.70) that the (u, v)-th entry of the FIM is
⎛ ⎞

Jtran (μ) = a ju d j exp ⎝− l jr μr ⎠ a jv , (5.91)

j=1 r ∈I j

where a jk = l jk if k ∈ I j and a jk = 0 otherwise. This expression is no surprise

since it is identical to the Hessian matrix used in (5.72) to establish the concavity of
the likelihood function.
In all cases, inverting the full FIM gives the CRB. The diagonal of the CRB is
the variance of the pixel intensity, or attenuation coefficient, as the case may be, so
the diagonal is often the primary quantity of interest in practice. The diagonal is of
length K , so it corresponds to a vectorized image. If the CRB is computed, the
5.7 Regularization 143

diagonal image reveals which pixels may have unreliable estimates. The off-
diagonal entries of the CRB are of interest, but are not as easily displayed intu-
itively. Such entries are potentially useful because they may reveal undesirable long
distance spatial correlations between pixel estimates.
The drawback to the FIM for tomography is that practical imaging problems
require high resolution. If the field of view is large and the scale of the objects of
interest within the image is small, the number of pixels K rapidly becomes very
large. For example, a 100 × 150 pixilated image gives K = 15, 000 pixels. Large K
causes two difficulties in applications. The first is that numerically evaluating any
one of the entries of the FIM is straightforward, but evaluating all K (K + 1)/2 of
them can be time consuming and require more care than is desirable for a devel-
opment effort. In many applications this problem can probably be circumvented in
various ways. The other is more daunting—inverting the full FIM to find the CRB is
infeasible for sufficiently large K. A more practical alternative is provided by Hero
and Fessler [50, 51], who propose a recursive algorithm to compute the CRB for a
subset of the intensities in a region of interest within the larger image.

5.7 Regularization
Tomographic imaging is known to require regularization to reduce noise artifacts
that are visually similar to the Gibbs phenomenon. These artifacts are not caused by
the EM method, but arise in any algorithm which estimates an infinite dimensional
quantity, in this case the intensity, from finite dimensional measured data.
Regularization methods for transmission tomography are reviewed in [34]. Sev-
eral interesting methods for regularizing PET are described in detail in [119]:

• Grenander’s Method of Sieves.

• Penalty Functions.
• Markov Random Fields.
• Resolution Kernels.
• ML Estimator.
• Penalized maximum likelihood (ridge estimation).

These methods differ in important ways, but they have much in common. Perhaps
the simplest to use that preserves the EM form of the intensity estimator is Grenan-
der’s method.

5.7.1 Grenander’s Method of Sieves

Intensity estimation sometimes requires regularization to reduce noise artifacts.

These artifacts, when they arise, are not caused by the EM method since the EM
method is merely a method for computing the ML estimate. The fault typically is
due to inadequate data size compared to the number of parameters being estimated.
144 5 Tomographic Imaging

Grenander’s Method of Sieves is especially useful because it preserves the EM

form of the intensity estimator. A sieve restricts the range of possible intensities
λ(x) to a smooth subspace, Λ, of the collection of all nonnegative functions on R.
The kernel of the sieve, denoted k(x | z), is a specified pdf for each z ∈ Z, so that

k(x | ·) dx = 1.
R

The kernel and the space Z are arbitrary and can be chosen in any convenient
manner. For example, the kernel is often taken to be an appropriately dimensioned
Gaussian pdf. The sieve restricts λ(x) to the collection of all functions of the form
% (
Λ ≡ λ(x) = k(x | z) ζ(z) dz, for some ζ(z) , (5.92)
Z

where ζ(z) ≥ 0 and Z ζ(z) dz < ∞. In effect, the integral (5.92) is a low pass filter
applied to the intensity ζ(z) to produce a smoothed estimate of λ(x).
The basic idea is to compute the ML estimate 0 ζ(z) from the data, and subse-
quently compute the ML intensity

0
λ M L (x) = k(x | z) 0
ζ(z) dz. (5.93)
Z

The estimate 0 ζ(z) is computed by the EM method.

The likelihood function for the i.i.d. data (5.5) is given by (5.6), where the inten-
sity is given by (5.2). Substituting the smoothed form of λ(x) in (5.92) into (5.2)
gives the measurement intensity

μ(y) = 3
(y | z) ζ(z) dz , (5.94)
Z

where the modified measurement likelihood function is

3
(y | z) = (y | x) k(x | z) dx. (5.95)
R

The likelihood function (5.6) changes slightly to

m
p(Y ; Λ) = e− Z μ(y) dy
μ(y j ).
j=1

If Z = R, the change in domain is essentially inconsequential if the kernel is

unimodal and peaks at x = z. In this case, the Shepp-Vardi iteration proceeds as
before to estimate 0
ζ(z).
5.7 Regularization 145

If Z = R, it is necessary to devise new pixels in Z on which to perform the

EM / Shepp-Vardi iterations. In any event, the end result is an estimate of 0 ζ(z).
This estimate in turn generates the smoothed, or regularized, estimate 0
λ M L (x) that
satisfies (5.93).
Chapter 6
Multiple Target Tracking

Although this may seem a paradox, all exact science is

dominated by the idea of approximation.
Bertrand Russell, The Scientific Outlook, 1931

Abstract Multitarget tracking intensity filters are closely related to imaging

problems, especially PET imaging. The intensity filter is obtained by three different
methods. One is a Bayesian derivation involving target prediction and information
updating. The second approach is a simple, compelling, and insightful intuitive argu-
ment. The third is a straightforward application of the Shepp-Vardi algorithm. The
intensity filter is developed on an augmented target state space. The PHD filter is
obtained from the intensity filter by substituting assumed known target birth and
measurement clutter intensities for the intensity filter’s predicted target birth and
clutter intensities. To accommodate heterogeneous targets and sensor measurement
models, a parameterized intensity filter is developed using a marked PPP Gaus-
sian sum model. Particle and Gaussian sum implementations of intensity filters
are reviewed. Mean-shift algorithms are discussed as a way to extract target state
estimates. Grenander’s method of sieves is discussed for regularization of the mul-
titarget intensity filter estimates. Sources of error in the estimated target count are
discussed. Finally, the multisensor intensity filter is developed using the same PPP
target models as in the single sensor filter. It is closely related to the SPECT medi-
cal imaging problem. Both homogeneous and heterogeneous multisensor fields are
discussed. Multisensor intensity filters reduce the variance of estimated target count
by averaging.

Keywords Intensity filter · PHD filter · Derivation via Bayes Theorem · Derivation
via expected target count · Derivation via Shepp-Vardi · Marked multisensor inten-
sity filter · Particle implementation · Gaussian sum implementation · Mean-shift
algorithm · Surrogate CRB · Regularization · Sources of target count error · Mul-
tisensor intensity filter · Variance reduction · Heterogeneous multisensor intensity
filter

Multitarget tracking in clutter is a joint detection and estimation problem. It com-

prises two important inter-related tasks. One initiates and terminates targets, and
the other associates, or assigns, data to specific targets and to clutter. The MHT
(Multiple Hypothesis Tracking) formulation treats both tasks. It is well matched

R.L. Streit, Poisson Point Processes, DOI 10.1007/978-1-4419-6923-1_6, 147

C Springer Science+Business Media, LLC 2010
148 6 Multiple Target Tracking

to the target physics and to the sensor signal processors of most radar and sonar
systems. Unfortunately, exact MHT algorithms are intractable because the num-
ber of measurement assignment hypotheses grows exponentially with the number
of measurements. These problems are aggravated when multiple sensors are used.
Circumventing the computational difficulties of MHT requires approximation.
Approximate tracking methods based on PPP models are the topics of this chap-
ter. They show much promise in difficult problems with high target and clutter densi-
ties. The key insight is to model the distribution of targets in state space as a PPP, and
then use a filter to update the defining parameter of the PPP—its intensity. To update
the intensity is to update the PPP. The intensity function of the PPP approximation
characterizes the multiple target tracking model. This important point is discussed
further in Section 6.1.1.
The PPP intensity model uses an augmented state space, S + . This enables it
to estimate target birth and measurement clutter processes on-line as part of the
filtering algorithm.
Three approaches to the intensity filter are provided. The first is a Bayesian
derivation given in Appendix D. This approach relies on a “mean field” approxima-
tion of the posterior point process. The relationship between mean field approach
and the “first moment intensity” approximation of the posterior point process used
by Mahler [74] is discussed. The second approach is a short but extraordinarily
insightful derivation that is ideal for readers who wish to avoid the Bayesian analy-
sis, at least on a first reading. The third approach is based on the connection to PET
and the Shepp-Vardi algorithm. The PET interpretation contributes significantly to
understanding the PPP target model. A special case of the intensity filter is the well
known PHD filter. It is obtained by assuming a known target birth-death process, a
known measurement clutter process, and restricting the intensity filter to the non-
augmented target state space S.
Implementation issues are discussed in Section 6.3. Current approaches use
either particle or Gaussian sum methods. An image processing method called the
mean shift algorithm is well suited to point target estimation, especially for particle
methods. Observed information matrices (cf. Section 4.7) are proposed as surrogates
for the error covariance matrices widely used in single target Bayesian filters. The
underlying statistical meaning of OIM estimates is as yet unresolved.
Other topics discussed include a Gaussian sum PPP filter that enables heteroge-
neous target motion and sensor measurement models to be used in an intensity filter
setting. See Section 6.2.2 for details. Sources of error in the estimated target count
are discussed in Section 6.4. Target count estimates are sensitive to the probability
of target detection. This function, PkD (x), depends on target state x at measure-
ment time tk . It varies over time because of slowly changing sensor characteristics
and environmental conditions. Monitoring these and other factors that affect target
detection probability is not typically considered part of the tracking problem.
Another topic is the multiple sensor intensity filter described in Section 6.5. This
topic is the subject of some debate. The filter presented here relies on the validity of
the target PPP model for every sensor. It is also closely related to the imaging prob-
lem SPECT, just as the intensity and PHD filters are related to PET. The variance
6.1 Intensity Filters 149

of the target count is reduced by averaging over the sensors, so that the variance of
estimated target count decreases with the number of sensors.
Several areas of on-going research are not discussed here. Much recent work in
the area is focused on the cardinalized PHD (CPHD) filter recently proposed by
Mahler [41]. This interesting filter does not assume a Poisson distributed number
of targets, so that the posterior finite point process is not an independent scattering
process (see Section 2.9.3). This contributes to what has been appropriately called
[37] “spooky action at a distance.” An even more recent topic of interest is that of
smoothing PHD filters [47, 86, 91]. The intention is to reduce variance by introduc-
ing time lag into the intensity function estimates.

6.1 Intensity Filters

6.1.1 PPP Model Interpretation

The points of a realization of a PPP on the target state space are a poor representation
of the physical reality of a multiple target state. This is especially easy to see when
exactly one target is present, for then ideally

λ(x) dx = 1 . (6.1)
S

From (2.4), the probability that a realization of the PPP has exactly one point target is

p N (n = 1) = e−1 ≈ 37% .

Hence, 63% of all realizations have either no target or two or more targets. Evi-
dently, realizations of the PPP seriously mismodel this simple tracking problem.
The problem worsens with increasing the target count: if exactly n targets are
present, then the probability that a realization has exactly n points is e−n n n / n! ≈
(2 π n)−1/2 → 0 as n → ∞. Evidently, PPP realizations are poor models of real
targets.
One interpretation of the PPP approximation is that the critical element of the
multitarget model is the intensity function, not the PPP realizations. The shift of
perspective means that the integral (6.1) is the more physically meaningful quantity.
Said another way, the concept of expectation, or ensemble average over realiza-
tions, corresponds more closely to the physical target reality than do the realizations
themselves.
A huge benefit comes from accepting the PPP approximation to the multiple tar-
get state—exponential numbers of assignments are completely eliminated. The PPP
approximation finesses the data assignment problem by replacing it with a stochas-
tic imaging problem, and the imaging problem is easier to solve. It is fortuitous
that the imaging problem is mathematically the same problem that arises in PET;see
Section 5.2. The “at most one measurement per target” rule for tracking corresponds
150 6 Multiple Target Tracking

in PET to the physics—there is at most one measurement per positron-electron

annihilation.
Analogies are sometimes misleading, but consider this one: In the language of
thermodynamics, the points of PPP realizations are microstates. Microstates obey
the laws of physics, but are not directly observable without disturbing the system
state. Physically meaningful quantities (such as temperature, etc.) are ensemble
averages over the microstates. In the PPP target model, the points of a realization are
thus “microtargets”, and microtargets obey the same target motion and measurement
models as real targets. The ensemble average over the PPP microtargets yields the
target intensity function. The language of microtargets is helpful in Section 6.5 on
multisensor intensity filtering, but it is otherwise eschewed in this chapter.

6.1.2 Predicted Target and Measurement Processes

6.1.2.1 Formulation
Standard filtering notation is adopted, but modified to accommodate PPPs. The gen-
eral Bayesian filtering problem is reviewed in Appendix C. Let S = Rn x denote
the n x -dimensional single target state space. The augmented space is S + ≡ S ∪ φ,
where φ represents the “target absent” hypothesis.
The single target transition function from time tk−1 to time tk , denoted by
Ψk−1 (y | x) ≡ pΞk |Ξk−1 (y | x), is assumed known for all x, y ∈ S + . The aug-
mented state space enables both target initiation and termination to be incorporated
directly into Ψk−1 as specialized kinds of state transitions. Novel aspects of the
transition function are:

• Ψk−1 (φ | x) is the probability that a target at x ∈ S terminates;

• Ψk−1 (y | φ) is the likelihood that a new target initiates at y ∈ S; and
• Ψk−1 (φ | φ) is the probability that the φ hypothesis is unchanged.

The augmented space S + is discussed in Section 2.12.

The multitarget state at time tk is Ξk . It is a point process on S + , but it is not
a PPP on S + . Nonetheless, Ξk is approximated by a PPP to “close the loop” after
the Bayesian information update. The multitarget state is a realization ξk of Ξk .
If ξk = (n, {x1 , . . . , xn }), then every point x j is either a point in S or is the
hypothesis φ. It is stressed that repeated occurrences of φ are allowed in the list
{x1 , . . . , xn } to account for clutter measurements.
The measurement at time tk is Υk . It is a point process on the (nonaugmented)
space T ≡ Rn z , where n z is the dimension of a sensor measurement. The measure-
ment data set

υk = (m, {z 1 , . . . , z m }) ∈ E(T )
6.1 Intensity Filters 151

Fig. 6.1 Block diagram of the Bayes update of the intensity filter on the augmented target state
space S + . Because the null state φ is part of the state space, target birth and measurement clutter
estimates are intrinsic to the predicted target and predicted measurement steps. The same block
diagram holds for the PHD filter on the nonaugmented space S

is a realization of Υk . The pdf of a point measurement z ∈ T conditioned on a target

in state x ∈ S + at time tk is the measurement pdf pk (z | x). The only novel aspect
of this pdf is that pk (z | φ) is the pdf that z is a clutter measurement.
The Bayesian posterior multitarget state point process conditioned on the data
υ1 , . . . , υk−1 is approximated by a PPP. Denote this PPP by Ξk−1|k−1 . The inten-
sity of Ξk−1|k−1 is f k−1|k−1 (x), x ∈ S + . Let Ξk|k−1 denote the predicted PPP at
time tk . Its intensity is denoted by f k|k−1 (x), and it is the integral of the intensity
f k−1| k−1 (x) of Ξk−1|k−1 , as seen in Section 2.11.2, (2.86).
The goal is to update the predicted PPP, Ξk|k−1 , with the measurement data υk .
The information updated point process is not a PPP, so it is approximated by a PPP.
Let Ξk|k denote the approximating PPP, and let its intensity be f k|k (x).
Figure 6.1 outlines the steps of the intensity filter. The discussion below walks
through the steps in the order outlined.

6.1.2.2 Target Motion and the Bernoulli Split

The first of these steps accounts for target motion and predicts the intensity at the
next time step. The input is a PPP with intensity f k−1|k−1 (x), so the transition pro-
cedure yields an output process that is also a PPP, as is seen in Section 2.11.1. Let
152 6 Multiple Target Tracking

f k|k−1 (x) denote the intensity of the output PPP. Adapting (2.83) to S + gives

f k|k−1 (x) = Ψ (x | y) f k−1|k−1 (y) dy , (6.2)
S+

where the integral over S + is defined as in (2.97).

Target motion is followed by a Bernoulli thinning procedure using the probability
of the sensor detecting a target at the point x, denoted PkD (x). This probability is
state dependent and assumed known. The input PPP intensity is f k|k−1 (x). As seen
in Section 2.9.2, thinning splits it into two PPPs – one for detected targets and the
D
other for undetected targets, denoted by f k|k−1 (x) and f k|k−1
U (x), respectively. These
PPPs are independent (see Section 2.9.2) and, from (2.56), and their intensities are

D
f k|k−1 (x) = PkD (x) f k|k−1 (x)

and

U
f k|k−1 (x) = 1 − PkD (x) f k|k−1 (x) .

Both branches in Fig. 6.1 are now subjected to an information update.

6.1.2.3 Predicted Measurement PPP, and Why It Is Important

As seen in Section 2.11.2, the predicted measurement process is a PPP. Its intensity
is

λk|k−1 (z) = pk (z | x) PkD (x) f k|k−1 (x) dx , for z ∈ T , (6.3)
S+

as is seen from (2.86). The measurement PPP is a critical component of the intensity
filter because, as is seen in (6.9), it weights the individual terms in the sum that
comprises the filter.
Another way to see the importance of λk|k−1 (z) is to recall the classical single-
target Bayesian tracking problem. The standard Bayesian formulation gives

p(z | x) p(x)
p(x | z) = , (6.4)
p(z)

where the denominator is a scale factor that makes the left hand side a true pdf. It
is very easy to ignore p(z) in practice because the numerator is obtained by mul-
tiplication and the product scaled so that it is a pdf. When multiple conditionally
independent measurements are available, the conditional likelihood is a product and
it is the same story again for the scale factor. However, if the pdfs are summed, not
multiplied, the scale factor must be included for the individual terms to be compara-
ble. Such is the case with the intensity filter: the PPP model justifies adding Bayesian
6.1 Intensity Filters 153

pdfs instead of multiplying them, and the scale factors are crucial to making the sum
meaningful.
The scale factor clearly deserves a respectable name, and it has one. It is called
the partition function in statistical physics and the machine learning communities.

6.1.3 Information Updates

It is seen in Appendix C that the mathematically correct information update proce-

dure is to apply the Bayesian method to both the detected and undetected target PPPs
to evaluate their posterior pdfs. These pdfs are defined on the event space E (S + ). If
the posterior pdfs have the proper form, then the posterior point processes are PPPs
and are characterized by their intensity functions on S + .
The information update of the undetected target PPP is the Bayesian updated
process conditioned on no target detection. The posterior point process is identical
to the predicted target point process. It is therefore a PPP whose intensity, denoted
U (x), is given by
by f k|k

U
f k|k (x) = f k|k−1
U
(x)

= 1 − PkD (x) f k|k−1 (x) . (6.5)

This brings the right hand branch of Fig. 6.1 to the superposition stage, i.e., the
block that says “Add Intensities”.
The left hand branch is more difficult because, as it turns out, the information
updated detected target point process is not a PPP. This is a serious dilemma since
it is highly desirable both theoretically and computationally for the filter recur-
sion to remain a closed loop. The posterior point process of the detected targets
is therefore approximated by a PPP. Three methods are given for obtaining this
approximation.
The first method is a Bayesian derivation of the posterior density of the point pro-
cess on the event space E (S + ) followed by a “mean field” approximation. Details
about Bayesian filters in general are given in Appendix C. The Bayesian derivation
is mathematically rigorous, but not particularly insightful. The situation is improved
considerably by showing the close connections between the mean field approxima-
tion and the “first moment” approximation of the posterior point process.
To gain intuition, one need look no further than to the second method. While not
rigorous, it is intuitively appealing and very convincing. The perspective is further
enriched by the third method. It shows a direct connection between the information
update of the Bayesian filter and the Shepp-Vardi algorithm for PET imaging. This
third method also poses an interesting question about iterative updates of a Bayesian
posterior density.
154 6 Multiple Target Tracking

6.1.3.1 First Method: Bayesian Derivation and Mean Field Approximation

The pdf of the Bayesian posterior point process for detected targets is defined on the
event space E (S + ). The Bayesian posterior, or information updated, pdf is defined
on this complex event space. The derivation is straightforward and a delight for
Bayesians. It is relatively long and interferes with the flow of the discussion, so it is
given in Appendix D where it can be read at leisure. The main points are outlined
here, so readers suffer little or no loss of insight by skipping it on a first reading.
Specific equation references are provided here so the precision of the mathematics
is not lost in the flow of words.
The Bayesian derivation explicitly incorporates the “at most one measurement
per target” rule into the measurement likelihood function. It imposes this constraint
via the measurement conditional pdf (cf. (D.5)). This pdf sums over all possible
assignments of the given data to targets. Because of the augmented space S + , a
clutter measurement is accounted for by assigning it to an absent target with state φ.
The usual Bayesian update (cf. (D.6)) leads to the pdf of the Bayesian posterior
point process (cf. (D.10)). This pdf uses the facts that the a priori target process
D
is a PPP with intensity f k|k−1 (x) and that the predicted measurement process is a
PPP with intensity λk|k−1 (z) given by (6.3). The posterior pdf is computationally
intractable except in reasonably small problems. In any event, inspection of the pos-
terior pdf clearly reveals that it does not have the form of a PPP pdf. Approximating
the posterior point process with a PPP is the next step.
The Bayesian posterior pdf is replaced by a mean field approximation, a widely
used method of approximation in machine learning and statistical physics prob-
lems. This approximation is the product of the one dimensional marginal pdfs of the
posterior pdf. The marginal pdfs are identical. The intensity function of the approx-
imating PPP is therefore taken to be proportional to the marginal pdf (cf. (D.13)).
The appropriate scale factor is a constant determined by maximum likelihood. In
essence, the mean field approximation (cf. (D.14)) is proportional to the intensity
function of the approximating PPP.
The mean field method just outlined is closely related to the method originally
used by Mahler [74, 76] to derive the PHD filter via the first moment of the pos-
terior point process. The first moment of a general finite point process is evaluated
via the expected value of a sum of the Janossy densities of the process, where the
Janossy densities are the joint pdfs of the points of the process (conditioned on
their number). Happily, in the tracking application discussed here, only one Janossy
density is nonzero, and it turns out to be the Bayes information update. The details
needed to see this are straightforward and are given in the last section of Appendix
D. The connections between the mean field approximation and the first moment
approximation provides considerable insight into both.

6.1.3.2 Second Method: Expected Target Count

Let pk ( · | x), x ∈ S + , denote the conditional pdf of a measurement in the measure-
ment space T , so that
6.1 Intensity Filters 155

pk ( z | x) dz = 1 , for all x ∈ S + . (6.6)
T

The special case pk (z | φ) is the pdf of a data point z conditioned on absent target
state φ, that is, the pdf of z given that it is clutter. The predicted intensity at time
tk of the target point process is a PPP with intensity PkD (x) f k|k−1 (x). The intensity
D (x) is the intensity of a PPP that approximates the information updated, or Bayes
f k|k
posterior, detected target process.
The measured data at time tk are m k points in a measurement space T . Denote
these data points by Z k = (z 1 , . . . , z m k ).
The information update of the detected target PPP is obtained intuitively as fol-
lows. The best current estimate of the probability that the point measurement z j
originated from a physical target with state x ∈ S in the infinitesimal dx is (see
Section 3.2.2 and also (5.19))

pk (z j | x) PkD (x) f k|k−1 (x) |dx|

, (6.7)
λk|k−1 (z j )

where the denominator is found using (6.3). Similarly, the probability that z j origi-
nated from a target with state φ is

pk (z j | φ) PkD (φ) f k|k−1 (φ)

. (6.8)
λk|k−1 (z j )

Because of the “at most one measurement per target” rule, the sum of the ratios over
all measurements z j is the estimated number of targets at x, or targets in φ, that
generated a measurement.
The estimated number of targets at x ∈ S is set equal to the expected number of
D (x) |dx|. Cancelling |dx| gives
targets conditioned on the data υk , namely f k|k

mk
pk (z j | x) PkD (x) f k|k−1 (x)
D
f k|k (x) = . (6.9)
λk|k−1 (z j )
j=1

Equation (6.9) holds for all x ∈ S + , not just for x ∈ S.

The expected target count method makes it clear that the expected number of
detected targets in any given set R ⊂ S is simply the integral of the posterior pdf

E Number of detected targets in R = D
f k|k (x) dx . (6.10)
R

Similarly, the expected number of targets in state φ is the posterior intensity eval-
uated at φ, namely, f k|k D (φ). The predicted measurement process with intensity

λk|k−1 (z) is a vital part of the intensity filter.

156 6 Multiple Target Tracking

6.1.3.3 Third Method: Shepp-Vardi Iteration

The PET model is interesting here. The measurement data and multiple target mod-
els are interpreted analogously so that:

• The target state space S + corresponds to the space in which the radioisotope is
absorbed.
• The measured data Z k ⊂ T correspond to the measured locations of the annihi-
lation events. As noted in the derivation of Shepp-Vardi, the measurement space
T need not be the same as the state space.
D (x) corresponds to the annihilation event inten-
• The posterior target intensity f k|k
sity.

The analogy makes the targets mathematically equivalent to the distribution of

(hypothetical) positron-electron annihilation events in the state space.
Under the annihilation event interpretation, the information update (6.9) of the
detected target process is given by the Shepp-Vardi algorithm for PET using PPP
sample data. The EM derivation needs only small modifications to accommodate
the augmented state space. Details are left to the reader. The n-th iteration of the
Shepp-Vardi algorithm is, from (5.18),

D (x)(0) ≡ f D
k|k−1 (x) = Pk (x) f k|k−1 (x) initial-
where the predicted intensity f k|k D

izes the algorithm. The first iteration of this version of the Shepp-Vardi algorithm
is clearly identical to the Bayesian information update (6.9) of the detected target
process.
The Shepp-Vardi iteration converges to an ML estimate of the target state inten-
sity given only data at time tk . It is independent of the data at times t1 , . . . , tk−1
except insofar as the initialization influences the ML estimate. In other words, the
iteration leads to an ML estimate of an intensity that does not include the effect of
a Bayesian prior. The problem lies not in the PET interpretation but in the pdf of
the data. To see this it suffices to observe that the parameters of the pdf (5.7) are
not constrained by a Bayesian prior and, consequently, the Shepp-Vardi algorithm
converges to an estimate that is similarly unconstrained. It is, moreover, not obvious
how to impose a Bayesian prior on the PET parameters that does not disappear in
the small cell limit.

6.1.4 The Final Filter

Superposing the PPP approximation of the detected target process and the unde-
tected target PPP gives
6.1 Intensity Filters 157

f k|k (x) = f k|k

U
(x) + f k|k
D
(x), x ∈ S+
⎡ ⎤

m
p (z | x) P D (x)
= ⎣1 − PkD (x) + ⎦ f k|k−1 (x)
k j k
(6.12)
λk|k−1 (z j )
j=1

as the updated intensity of the PPP approximation to Ξk|k .

The intensity filter comprises equations (6.2), (6.3), and (6.12). The first two
equations are more insightful when written in traditional notation. Expanding the
discrete-continuous integral (6.2) gives

f k|k−1 (x) = b̂k (x) + Ψk−1 (x | y) f k−1|k−1 (y) dy, (6.13)
S

where the predicted target birth intensity is

b̂k (x) = Ψk−1 (x | φ) f k−1|k−1 (φ). (6.14)

Also, from (6.3),

λk|k−1 (z) = λ̂k (z) + pk (z | x) PkD (x) f k|k−1 (x) dx, (6.15)
S

where

λ̂k (z) = pk (z | φ) PkD (φ) f k|k−1 (φ) (6.16)

is the predicted measurement clutter intensity. The probability PkD (φ) in (6.16) is
the probability that a φ hypothesis generates a measurement at time tk .
The computational parts of the intensity filter are outlined in Table 6.1. The
table clarifies certain interpretive issues that are glossed over in the discussion.
Implementation methods are discussed elsewhere in this chapter.

6.1.4.1 Likelihood Function of the Data Set

Since f k|k (x)
is the
intensity of
a PPP, it is reasonable to inquire about the pdf of the
data ηk = m k , z 1 , . . . , z m k . The measurement intensity after the information
update of the target state intensity is, applying (2.86) on the augmented space S + ,

λk|k (z) = pk (z | x) f k|k (x) dx . (6.17)
S+
158 6 Multiple Target Tracking

Table 6.1 Intensity filter on the state space S + = S ∪ φ

INPUTS:
Data: z 1:m ≡ {z 1 , . . . , z m } ⊂ T at time tk
Probability of target detection: PkD (x) at time tk

OUTPUT: { f k|k (x), f k|k (φ)} = IntensityFilter f k−1|k−1 (x), f k−1|k−1 (φ), PkD (x), z 1:m

• Predicted target intensity : For x ∈ S ,

Therefore the pdf of the data ηk is

mk
p(ηk ) = e− Rn z λk|k (z) dz
λk|k (z j )
j=1

mk
= e−Nk|k λk|k (z j ) , (6.18)
j=1

where

Nk|k = λk|k (z) dz
R
nz

= pk (z | x) dz f k|k (x) dx
+
S Rn z

= f k|k (x) dx (6.19)

Rn x
6.2 Relationship to Other Filters 159

is the estimated mean number of targets. The pdf of ηk is approximate because

f k|k (x) is an approximation.

6.2 Relationship to Other Filters

The modeling assumptions of the intensity filter are very general, and specializa-
tions are possible. The most important is the PHD filter discussed in Section 6.2.1.
By assuming certain kinds of a priori knowledge concerning target birth and mea-
surement clutter, and adjusting the filter appropriately, the intensity filter reduces to
the PHD filter. The differences between the intensity and PHD filters are nearly all
attributable to the augmented state space S + . That is, the intensity filter uses the
augmented single target state space S + = S ∪ φ, while the PHD filter uses only
the single target space S. Using S practically forces the PHD filter to employ target
birth and death processes to model initiation and termination of targets.
A different kind of specialization is the marked multitarget intensity filter. This
is a parameterized linear Gaussian sum intensity filter that interprets measurements
as target marks. This interpretation is interesting in the context of PPP target models
because it implies that joint measurement-target point process is a PPP. Details are
discussed in Section 6.2.2.

6.2.1 Probability Hypothesis Density (PHD) Filter

The state φ is the basis for the on-line estimates of the intensities of the target birth
and measurement clutter PPPs given by (6.13) and 6.15), respectively. If, however,
the birth and clutter intensities are known a priori to be bk (x) and λk (z), then the
predictions b̂k (x) and λ̂k (z) can be replaced by bk (x) and λk (z). This is the basic
strategy taken by the PHD filter.
The use of a posteriori methods makes good sense in many applications. For
example, they can help regularize parameter estimates. These methods can also
incorporate information not included a priori in the Bayes filter. For example,
Jazwinski [54] uses an a posteriori method to derive the Schmidt-Kalman filter for
bias compensation. These methods may improve performance, i.e., if the a priori
birth and clutter intensities are more accurate or stable than their on-line estimated
counterparts, the PHD filter may provide better tracking performance.
Given these substitutions, the augmented space is no longer needed and can be
eliminated. This requires some care. If the recursion is simply restricted to S and no
other changes are made, the filter will not be able to discard targets and the target
count may balloon out of control. To balance the target birth process, the PHD filter
uses a death probability before propagating the multitarget intensity f k−1|k−1 (x).
This probability was intentionally omitted from the intensity filter because transition
into φ is target death, and it is redundant to have two death models.
160 6 Multiple Target Tracking

The death process is a Bernoulli thinning process applied to the PPP at time tk−1
before targets transition and are possibly detected. Let dk−1 (x) denote the prob-
ability that a target at time tk−1 dies before transitioning to time tk . The surviving
target point process is a PPP and its intensity is (1 − dk−1 (x)) f k−1|k−1 (x). Adding
Bernoulli death and restricting the recursion to S reduces the intensity filter to the
PHD filter.

6.2.2 Marked Multisensor Intensity Filter (MMIF)

The intensity filter assumes targets have the same motion model, and that the sen-
sor measurement likelihood function is the same for all targets and data. Such
assumptions are idealized at best. An alternative approach is to develop a param-
eterized intensity filter that accommodates heterogeneous target motion models and
measurement pdfs by using target-specific parameterizations. The notion of target-
specific parameterizations in the context of PPP target modeling seems inevitably to
lead to the idea of modeling individual targets as a PPP, and then using superposi-
tion to obtain the aggregate PPP target model. Parameter estimation using the EM
method is natural to superposition problems, as shown in Chapter 3. The marked
multisensor intensity filter (MMIF) is one instance of such an approach.
The MMIF builds on the basic idea that a target at state x is “marked” with
a measurement z. If the target is modeled as a PPP, then the joint measurement-
target vector (z, x) is a PPP on the Cartesian product of the measurement and target
spaces. This is an intuitively reasonable result, but the details needed to see that it is
true are postponed to Section 8.1. The MMIF uses a linear Gaussian target motion
and measurement model for each target and superposes them against a background
clutter model. Since the Gaussian components correspond to different targets, they
need not have the same motion model. Similarly, different sensor measurement
models are possible. Superposition therefore leads to an affine Gaussian sum inten-
sity function on the joint measurement-target space. The details of the EM method
and the final MMIF recursion are given in Appendix E.
The MMIF adheres to the “at most one measurement per target rule” but only in
the mean, or on average. It does this by reinterpreting the single target pdf as a
PPP intensity function, and by interpreting measurements as the target marks. The
expected number of targets that the PPP on the joint measurement-target space pro-
duces is one.
Another feature of the MMIF is that the EM weights depend on the Kalman
filter innovations. The weights in other Gaussian sum filters often involve scaled
multiples of the measurement variances, resulting in filters that are somewhat akin
to “nearest neighbor” tracking filters.
The limitation of MMIF and other parameterized sum approaches is the require-
ment to use a fixed number of terms in the sum. This strongly affects its ability to
model the number of targets. In practice, various devices can compensate for this
limitation, but they are not intrinsic to the filter.
6.3 Implementation 161

6.3 Implementation
Simply put, targets correspond to the local peaks of the intensity function and the
areas of uncertainty correspond to the contours, or isopleths, of the intensity. Very
often in practice, isopleths are approximated by ellipsoids in target state space cor-
responding to error covariance matrices. Methods for locating the local peak con-
centrations of intensity and finding appropriate covariance matrices to measure the
width of the peaks are discussed in this section.
Implementation issues for intensity filters therefore concern two issues. Firstly,
it is necessary to develop a computationally viable representation of the informa-
tion updated intensity function of the filter. Two basic representations are proposed,
one based on particles and the other on Gaussian sums. Secondly, postprocessing
procedures are applied to the intensity function representation to extract the num-
ber of detected targets, together with their estimated states and corresponding error
covariance matrices. Analogous versions of both issues arise in classical single tar-
get Bayesian filters.
The fact remains, however, that a proper statistical interpretation of target point
estimates and their putative error covariances is lacking for intensity filters. The
concern may be dismissed in practice because they are intuitively meaningful and
closely resemble their single target Bayesian analogs. The concern is nonetheless
worrisome and merits further study.

6.3.1 Particle Methods

The most common and by far the easiest implementation of nonlinear filters is by
particle, or sequential Monte Carlo (SMC), methods. In such methods the poste-
rior pdf is represented nonparametrically by a set of particles in target state space,
together with a set of associated weights, and estimated target count. Typically these
weights are uniform, so the spatial distribution of particles represents the variabil-
ity of the posterior density. An excellent discussion of SMC methods for Bayesian
single target tracking applications is found in the first four chapters of [104].
Published particle methods for the general intensity filter are limited to date to
the PHD filter. Extensions to the intensity filter are not reported here. An early and
well described particle methodology (as well as an interesting example for tracking
on roads) for PHD filters is given in [111]. Particle methods and their convergence
properties for the PHD filter are discussed in detail in a series of papers by Vo et al.
[137]. Interested readers are urged to consult them for specifics.
Tracking in a surveillance region R using SMC methods starts with an initial set
of particles and weights at time tk−1 together with the estimated number of targets
in R:

xk−1|k−1 (), wk−1|k−1 () : = 1, L S MC and Nk−1|k−1 ,
162 6 Multiple Target Tracking

Table 6.2 PHD filter on the state space S

INPUTS:
Data: z 1:m = {z 1 , . . . , z m } ⊂ T at time tk
Target death probability function: dk−1 (x) = Pr target death at state x ∈ S at time tk−1
Target birth probability function: bk (x) = Pr target birth at state x ∈ S at time tk
Probability of target detection: PkD (x) at time tk
Measurement clutter intensity function: λk (z) at time tk

OUTPUT: f k|k (x) = PHDFilter f k−1|k−1 (x), dk−1 (x), bk (x), λk (z), PkD (x), z 1:m

• Predicted target intensity : For x ∈ S ,

– Surviving targets:
Sk (x) = (1 − dk−1 (x)) f k−1|k−1 (y)
– Propagated targets:
Sk (x) ← S Ψk−1 (x | y) Sk (y) dy
– Predicted target intensity:
f k|k−1 (x) = bk (x) + Sk (x)
• Predicted measurement intensity:
IF m = 0, THEN f k|k (x) = 1 − PkD (x) f k|k−1 (x) STOP
FOR j = 1 : m,
– Intensity contributions
from predicted target intensity:
ν̂k (z j ) = S pk (z j | x) PkD (x) f k|k−1 (x) dx
– Predicted measurement intensity:
λk|k−1 (z j ) = λk (z j ) + ν̂k (z j )
END FOR
• Information updated&target intensity: For x ∈ S , '
m p (z | x) P D (x)
f k|k (x) = 1 − Pk (x) +
D k j k
λk|k−1 (z j ) f k|k−1 (x)
j=1
• END

where wk−1|k−1 () = 1/L S MC for all . For PHD filters the particle method pro-
ceeds in several steps that mimic the procedure outlined in Table 6.2:

• Prediction. In the sequential importance resampling (SIR) method, prediction

involves thinning a given set of particles with the survival probability 1 − dk (x)
and then stochastically transforming the survivors into another particle set using
the target motion model Ψk−1 (y | x) with weights adjusted accordingly. Addi-
tional new particles and associated weights are generated to model new target
initializations.
• Updating. The particle weights are multiplicatively (Bayesian) updated using
the measurement likelihood function pk (z | x) and the probability of detection
PkD (x). The factors are of the form

⎛ ⎞

m
p (z ( j) | x) P D (x)
⎝1 − PkD (x) + k k k ⎠, (6.20)
λk|k−1 (z j )
j=1
6.3 Implementation 163

where z k (1), . . . , z k (m) are the measurements at time tk . The updated particle
weights are nonuniform.
• Normalization. Compute the scale factor, call it Nk|k , of the sum of the updated
particle weights. Divide all the particle weights by Nk|k to normalize the weights.
• Resampling. Particles are resampled by choosing i.i.d. samples from the discrete
pdf defined by the normalized weights. Resampling restores the particle weights
to uniformity.
If the resampling step is omitted, the SMC method leads to particle weight distri-
butions that rapidly concentrate on a small handful of particles and therefore poorly
represents the posterior intensity. There are many ways to do the resampling in
practice.
By computing Nk|k before resampling, it is easy to see that

Nk|k ≈ λk|k (x) dx
R

= E Number of targets in R . (6.21)

The estimated number of targets in any given subset R0 ⊂ R is

Number of particles in R0
Nk|k (R0 ) = Nk|k . (6.22)
L S MC

The estimator is poor for sets R0 that are only a small fraction of the total volume
of R.
The primary limitations of particle approaches in many applications are due to
the so-called Curse of Dimensionality1 : the number of particles needed to repre-
sent the intensity function grows exponentially as the dimension of the state space
increases. Most applications to date seem to be limited to four or five dimensions.
The curse is so wicked that Moore’s Law (the doubling of computational capability
every 18 months) by itself will do little to increase the effective dimensional limit
over a human lifetime. Moore’s Law and improved methods together will undoubt-
edly increase the number of dimensions for which particle filters are practical, but
it remains to be seen if general filters of dimension much larger than say six can be
treated directly.

6.3.2 Mean Shift Algorithm

As is clear from the earlier discussion of PET, it is intuitively reasonable to think

of the multitarget intensity filter as a sequential image processing method. In this

1 The name was first used in 1961 by Richard E. Bellman [9]. The name is apt in very many
problems; however, some modern methods in machine learning actually exploit high dimensional
embeddings.
164 6 Multiple Target Tracking

interpretation an image comprises the “gray scales” of a set of multidimensional

voxels in the target state space Rn x . Such an interpretation enables a host of image
processing techniques to be applied to the estimated intensity function.
One technique that lends itself immediately to extracting target point estimates
from a particle set approach is the “mean-shift” algorithm. This algorithm, based
on ideas first proposed in [38], is widely used in computer vision applications such
as image segmentation and tracking. The mean-shift algorithm is an EM algorithm
for Gaussian kernel density estimators and a generalized EM algorithm for non-
Gaussian kernels [12]. The Gaussian kernel is computationally very efficient in the
mean-shift method.
Denote the non-φ particle set representing the PPP intensity at time tk by

xk|k () : = 1, . . . , L S MC .

The intensity function is modeled as a scalar multiple of the kernel estimator

S MC
λk (x) = Ik N (x ; xk|k (), Σker ) , (6.23)
=1

where N (x ; xk|k (), Σker ) is the kernel. The covariance matrix Σker is specified,
not estimated. Intuitively, the larger Σker , the fewer the number of local maxima
in the intensity (6.23), and conversely. The scale factor Ik > 0 is estimated by the
particle filter and is taken as known here.
The form (6.23) has no parameters to estimate, so extend it by defining

S MC

λk (x ; μ) = Ik N x ; xk|k () − μ, Σker , (6.24)
=1

where μ is an unknown rigid translation of the intensity (6.23). It is not hard to see
that the ML estimate of μ is a local maximum of the kernel estimate, that is, a point
estimate for a target. The vector μ is estimated from data using the EM method. The
clever part is using an artificial data set with only one point in it, namely, the origin.
Let r = 0, 1, . . . denote the EM iteration index, and let μ(0) be a specified
initial value for the mean. The auxiliary function is given by (3.20) with m = 1,
x1 = 0, L = L S MC , θ = μ, and

λ (x ; μ) ≡ Ik N x ; xk|k () − μ, Σker .

The bounded surveillance region R is taken to be Rn x . Define the weights

6.3 Implementation 165

(r ) N 0 ; xk|k () − μ(r ) , Σker
w μ = L
(r )
=1 N 0 ; x k|k ( ) − μ , Σker
S MC

N xk|k () ; μ(r ) , Σker
= L . (6.25)
(r )
=1 N x k|k ( ) ; μ , Σker
S MC

The auxiliary function in the present case requires no sum over j as done in (3.20),
so
L

S MC
Q(μ ; μ(r ) ) = −Ik + w μ(r ) log Ik N 0 ; xk|k () − μ, Σker .
=1
(6.26)

The EM update of μ is found by taking the appropriate gradient, yielding

L S MC
(r +1) =1 w μ(r ) xk|k ( j)
μ = L S MC . (6.27)
(r )
=1 w μ

Substituting (6.25) and canceling the common factor gives the classical mean-shift
iteration:
L S MC
(r +1) =1 N xk|k () ; μ(r ) , Σker xk|k ()
μ = L S MC . (6.28)
(r )
=1 N x k|k () ; μ , Σker

The update of the mean is a convex combination of the particle set. Convergence to
a local maximum μ(r )
k → x̂ k|k is guaranteed as r → ∞.
Different initializations are needed for different targets, so the mean shift algo-
rithm needs a preliminary clustering method to initialize it, as well as to determine
the number of peaks in the data correspond to targets. Also, the size of the ker-
nel depends somewhat on the number of particles and may need to be adjusted to
smooth the intensity surface appropriately.

6.3.3 Multimode Algorithms

Identifiability remains a problem with the mean shift algorithm, that is, there is no
identification of the point estimate to a target except through the way the starting
point of the iteration is chosen. This may cause problems when targets are in close
proximity.
One way to try to resolve the problem is to use the particles themselves as points
to feed into another tracking algorithm. This method exploits serial structure in the
filter estimates and may disambiguate closely spaced targets. However, particles are
serially correlated and do not satisfy the conditional independence assumptions of
measurement data, so the resulting track estimates may be biased.
166 6 Multiple Target Tracking

6.3.4 Covariance Matrices

6.3.4.1 Matrix CRBs
The true error covariance matrix of the ML point estimate computed by the mean
shift method is not available; however, the CRB can be evaluated using (4.21). This
CRB is appropriate if the available data are reasonably modeled as realizations of a
PPP with intensity (6.24). If the intensity (6.24) function is a high fidelity model of
the intensity, the FIM of μ is

−1 −1
J (μ) = Ik Σker Σ(μ) Σker , (6.29)

where the matrix Σ(μ) is

L
L

T
Σ(μ) = w(x, μ ; , ) x − xk|k () + μ x − xk|k ( ) + μ dx
=1 =1 R
nx

(6.30)

and the weighting function is

N x ; xk|k () − μ, Σker N x ; xk|k ( ) − μ, Σker
w(x, μ ; , ) = L S MC .

=1 N x ; x k|k ( ) − μ, Σker
(6.31)

The CRB is J −1 (μ) evaluated at the true value of μ. Because the integral is over all
of Rn x , a change of variables shows that the information matrix J (μ) is independent
of the true value of μ. This means that the FIM is not target specific.
A local bound is desired, since the mean shift algorithm converges to a local
peak of the intensity. By restricting the intensity model to a specified bounded gate
G ⊂ Rn x , the integral in (6.30) is similarly restricted. The matrix Σ(μ) is thus a
function of G. The gated CRB is local to the gate, i.e., it is a function of the target
within the gate.

6.3.4.2 OIM: The Surrogate CRB

The OIM is the Hessian matrix of the negative loglikelihood evaluated at the MAP
point estimate; it is often used as a surrogate for Fisher information when the likeli-
hood function is complicated. Its inverse is the surrogate CRB of the estimate x̂k|k .
The construction of OIMs for the mean-shift algorithm is implicit in [12].
The loglikelihood function of μ using the intensity function (6.23) and the one
data point x1 = 0 is
⎡ ⎤
L

S MC
log p(μ) = −Ik L S MC + log ⎣ N (xk|k () ; μ, Σker )⎦ .
=1
6.3 Implementation 167

Direct calculation gives the general expression

⎡ ⎤
L

T 1 S MC

−1
∇μ ∇μ log p(μ) = Σker + 2⎣ −1
N (μ ; xk|k (), Σker ) Σker μ − xk|k () ⎦
κ
=1
⎡ ⎤T
L

S MC

×⎣ −1
N (μ ; xk|k (), Σker ) Σker μ − xk|k () ⎦
=1
⎡ ⎤
L

1 −1 ⎣ S MC
T
− Σker N (μ;xk|k (),Σker ) μ−xk|k () μ−xk|k () ⎦ Σker
−1
,
κ
=1

where the normalizing constant is

S MC
κ = N (μ ; xk|k (), Σker ) .
=1

The observed information matrix is evaluated at the MAP estimate μ = x̂k|k . The
T
middle term is proportional to ∇μ p(μ) ∇μ p(μ) and so is zero any stationary
point of p(μ), e.g., at the MAP estimate μ = x̂k|k . The OIM is therefore

−1
O I M x̂k|k = Σker
. L T /
−1
S MC
=1 N (x̂k|k;xk|k (),Σker ) x̂k|k −xk|k () x̂k|k −xk|k () −1
−Σker L S MC Σker ,
=1 N ( x̂ k|k ;x k|k ()!Σker )
(6.32)

The CRB surrogate is O I M −1 x̂k|k . The inverse exists because the OIM is
positive definite at x̂k|k . This matrix is, in turn, a surrogate for the error covariance
matrix.
The OIM for x̂k|k can be computed efficiently in conjunction with any EM
method (see [69] for a general discussion). As noted in Section 4.7, the statistical
interpretation of the OIM is unresolved in statistical circles. Its utility should be
carefully investigated in applications.

6.3.5 Gaussian Sum Methods

An alternative to representing the PPP intensity using particle filters is to use a
Gaussian sum instead. One advantage of this method is that the clustering and
track extraction issues are somewhat simpler, that is, target state estimates and
168 6 Multiple Target Tracking

covariance matrices are extracted from the means and variances of the Gaussian
sum instead of a myriad of particles. Gaussian sum methods are also potentially
more computationally practical than particle methods for very large numbers of
targets.
The Gaussian sum approach is especially attractive when constant survival and
detection functions are assumed. This assumption means that the PPP thinning func-
tions are independent of state. In this case, assuming also linear Gaussian target
motion and measurement models, the prediction and information update steps are
closed form.
Gaussian sum implementations of the PHD filter are carefully discussed by Vo
and his colleagues [139]. An unnormalized Gaussian sum is used to approximate
the intensity function. These methods are important because they have the potential
to be useful in higher dimensions than particle filters. Nonetheless, despite some
comments to the contrary, Gaussian sum intensity filters do not escape the curse of
target state space dimensionality.
Gaussian sum methods for intensity estimation comprise several steps:

• Prediction. The target intensity at time tk−1 is a Gaussian sum, to which is added
a target birth process that is modeled by a Gaussian sum. The prediction equation
for every component in the Gaussian sum is identical to a Kalman filter prediction
equation.
• Component Update. For each point measurement, the predicted Gaussian compo-
nents are updated using the usual Kalman update equations. The update therefore
increases the number of terms in the Gaussian sum if there is more than one mea-
surement. This step has two parts. In the first, the means and covariance matrices
are evaluated. In the second, the coefficients of the Gaussian sum are updated by
a multiplicative procedure.
• Merging and Pruning. The components of the Gaussian sum are merged and
pruned to obtain a “nominal” number of terms. Various reasonable strategies
are available for such purposes, as detailed in [139]. This step is the analog of
resampling in the particle method.

Some form of pruning is necessary to keep the size of the Gaussian sum bounded
over time, so the last—and most heuristic—step cannot be omitted.
Left out of this discussion are details that relate the weights of the Gaussian
components to the estimated target count. These details can be found in [139]. For
nonlinear target motion and measurement models, [139] proposes both the extended
and the unscented Kalman filters. Vo and his colleagues also present Gaussian sum
implementations of the CPHD filter in [138, 139].

6.3.6 Regularization

Intensity filters are in the same class of stochastic inverse problems as image recon-
struction in emission tomography—the sequence t0 , t1 , . . . , tk of intensity filter
6.3 Implementation 169

estimates f k|k (x) is essentially a movie (in dimension n x ) in target state space. As
discussed in Section 5.7, such problems suffer from serious noise and numerical arti-
facts. The high dimensionality of the PPP parameter, i.e., the number of voxels of the
intensity function, makes regularization a priority in all applications. Regularization
for intensity filters is a relatively new subject. Methods such as cardinalization are
inherently regularizing.
Grenander’s method of sieves used in Section 5.7.1 for regularizing PET adapts
to the intensity filter, but requires some additional structure. The sieve kernel
k0 (x | u) is a pdf on S + , so that

k0 (x | u) dx = 1 (6.33)
S+

for all points u in the discrete-continuous space U + = U ∪ φ. As before, the

choice of kernel and space U + is very flexible. The multitarget intensity at every
time tk , k = 0, 1, . . . , is restricted to the collection of functions of the form

f k|k (x) = k0 (x | u) ζk|k (u) du for some ζk|k (u) > 0 . (6.34)
U+

The kernel k0 can be a function of time tk if desired. The restriction (6.34) is also
imposed on the predicted target intensity:

f k|k−1 (x) = k0 (x | u) ζk|k−1 (u) du for some ζk|k−1 (u) > 0 . (6.35)
U+

Substituting (6.35) into the predicted measurement intensity (6.3) gives

λk|k−1 (z) = 3
pk (z | u) ζk|k−1 (u) du , (6.36)
U+

where

3
pk (z | u) = pk (z | x) k0 (x | u) dx . (6.37)
S+

is the regularized measurement likelihood function.

An intensity filter is used to update ζk|k (u). This filter employs a transition
function Φk−1 ( · | · ) to provide a dynamic connection between the current intensity
ζk−1|k−1 (u) and predicted intensity ζk|k−1 (u). The function Φk−1 ( · | · ) is specified
for all u and v in U + and is, in principle, any reasonable function, but in practice is
linked to the target motion model Ψk−1 ( · | · ).
One way to define Φk−1 ( · | · ) requires defining an additional kernel. Substitute
(6.34) into the predicted detected target intensity to obtain
170 6 Multiple Target Tracking

f k|k−1 (x) = f k−1|k−1 (y) Ψk−1 (x | y) dy
S+

= k0 (y | u) ζk−1|k−1 (u) du Ψk−1 (x | y) dy
+ U+
S
= 3k−1 (x | u) ζk−1|k−1 (u) du ,
Ψ
U+

where

3k−1 (x | u) =
Ψ Ψk−1 (x | y) k0 (y | u) dy . (6.38)
S+

Define a Bayesian kernel k1 (v | x) so that

k1 (v | x) dv = 1 (6.39)
U+

for all points x ∈ S + . Like the sieve kernel k0 ( · ), the Bayesian kernel k1 (v | x) is
very flexible. It is easily verified that the function

Φk−1 (v | u) = 3k−1 (x | u) dx
k1 (v | x) Ψ (6.40)
S+

is a valid transition function for all k0 ( · ) and k1 ( · ).

The information updated intensity 0 ζk|k (u) is evaluated via the intensity filter using
the regularized measurement pdf (6.37) and the predicted measurement intensity
(6.36). The regularized target state intensity at time tk is the integral

f k|k (x) = k(x | u)0
ζk|k (u) du . (6.41)
U+

The regularized intensity f k|k (x) depends on the sieve and Bayesian kernels k0 ( · )
and k1 ( · ).
The question of how best to define the k0 ( · ) and k1 ( · ) kernels depends on the
application. It is common practice to define kernels using Gaussian pdfs. As men-
tioned in Section 5.7, the sieve kernel is a kind of measurement smoothing kernel.
If dim(U) < dim(S), the Bayesian kernel disguises observability issues, that is,
many points x ∈ S map with the same probability to a given point u ∈ U. This
provides a mechanism for target state space smoothing.
6.4 Estimated Target Count 171

6.4 Estimated Target Count

Estimating the number of targets present in the data is a difficult hypothesis testing
problem. In a track before detect (TBD) approach, it is a track management func-
tion that is integrated with the tracking algorithm. Intensity filters seem to offer an
alternative way to integrate, or fuse, the multitarget track management and state
estimation functions.

6.4.1 Sources of Error

Accurate knowledge of the target detection probability function PkD (x) is crucial
to correctly estimating target count. An incorrect value of PkD (x) is a source of
systematic error. For example, if the filter uses the value PkD (x) = 0.5 but in fact
all targets always show up in the measured data, the estimated mean target count
will be high by a factor of two. This example is somewhat extreme but it makes
the point that correctly setting the detection probability is an important task for the
track management function. The task involves executive knowledge about changing
sensor performance characteristics, as well as executive decisions external to the
tracking algorithm about the number of targets actually present—decisions that
feedback to validate estimates of PkD (x). Henceforth, the probability of detection
function PkD (x) is assumed accurate.
There are other possible sources of error in target count estimates. Birth-death
processes can be difficult to tune in practice, regardless of whether they are modeled
implicitly as transitions into and out of a state φ in the intensity filter, or explicitly
as in the PHD filter. If in an effort to detect new targets early and hold track on them
as long as possible, births are too spontaneous and deaths are too infrequent, the
target count will be too high on average. Conversely, it will be too low with delayed
initiation and early termination. Critically damped designs, however that concept
is properly defined in this context, would seem desirable in practice. In any event,
tuning is a function of the track management system.
Under the PPP model, the estimated expected number of targets in a given region,
A, is the integral over A of the estimated multitarget intensity. Because the number
is Poisson distributed, the variance of the estimated number of targets is equal to
the mean number. This large variance is an unhappy fact of life. For example,
√ if 10
targets are present, the standard deviation on the estimated number is 10 ≈ 3 .
It is therefore foolhardy in practice to assume that the estimated of the number
of targets is the number that are actually present. Variance reduction in the target
count estimate is a high priority from the track management point of view for both
intensity and PHD filters.

6.4.2 Variance Reduction

The multisensor intensity filter discussed in Section 6.5 reduces the variance by
averaging the sensor-level intensity functions over the number of sensors that con-
172 6 Multiple Target Tracking

tribute to the filter. Consequently, if the individual sensors estimate target count
correctly, so does the multisensor intensity filter.
Moreover, and just as importantly, the variance of the target count estimate of the
multisensor intensity filter is reduced by a factor of M compared to that of a single
sensor, where M is the number of sensors, assuming for simplicity that the sensor
variances are identical.
This important variance reduction property is analogous to estimators in other
applications. An especially prominent and well known example is power spectral
estimation of wideband stationary time series. For such signals the output bins of
the DFT of a non-overlapped blocks of sampled data are distributed with a mean
level equal to the signal power in the bin, and the variance equal to the mean. This
property of the periodogram is well known, as is the idea of time averaging the peri-
odogram, i.e., the non-overlapped DFT outputs, to reduce the variance of spectral
estimates.2 The Wiener-Khinchin theorem justifies averaging the short term Fourier
transforms of nonoverlapped data records as a way to estimate the power spectrum
with reduced variance. In practice, the number of DFT records averaged is often
about 25.
The multisensor intensity filter is low computational complexity, and applicable
to distributed heterogeneous sensor networks. It is thus practical and widely useful.
Speculating now for the sheer fun of it, if the number of data records in a power
spectral average carries over to multisensor multitarget tracking problems, then the
multisensor intensity filter achieves satisfactory performance for many practical pur-
poses with about 25 sensors.

6.5 Multiple Sensor Intensity Filters

To motivate the discussion, consider the SPECT imaging application of Section 5.4.
In SPECT, a single gamma camera is moved to several view angles and a snapshot is
taken of light observed emanating from gamma photon absorption events. The EM
recursion given by (5.69) is the superposition of the intensity functions estimated
by each of the camera view angles. Intuitively, different snapshots cannot contain
data from the same absorption event, so the natural way to fuse the multiple cam-
era images into one image is to add them, after first weighting by the fraction of
radioisotope that is potentially visible in each. The theoretical justification is that,
since the number of decays is unknown and Poisson distributed, the estimates of
the spatial distribution of the radioisotope obtained from different view angles are
independent, not conditionally independent, so the intensity functions (images) are
superposed.
The general multisensor multitarget filtering problem is not concerned with
radioisotope decays, but rather with physical entities (aircraft, ships, etc.) that per-
sist over long periods of time—target physics are very different from the physics

2 Averaging trades off variance reduction and spectral resolution. It was first proposed by
M. S. Bartlett [6] in 1948.
6.5 Multiple Sensor Intensity Filters 173

of radioisotopes and gamma photons. Nonetheless, for reasons discussed at the

beginning of this chapter, a PPP multitarget model is used for single sensor mul-
titarget tracking. It is assumed, very reasonably, that the same PPP target model
holds regardless of how many sensors are employed to detect and track targets. The
analogy with SPECT is now clear: each sensor in the multisensor filtering problem is
analogous to a camera view angle in SPECT, and the sensor-level data are analogous
to the camera snapshot data.
The PPP multitarget model has immediate consequences. The most important
is that conditionally independent sensors are actually independent due to the inde-
pendence property of PPPs discussed in Section 2.9. In other words, sensors pro-
vide independent estimates of the multitarget intensity function. The multisensor
intensity filter averages the sensor-level intensities [124]. Averaging gives the ML
estimate of target count when multiple independent Poisson distributed estimates of
a fixed intensity are available. (See (6.53) below.) The intuitive reason for averag-
ing intensity functions and not simply adding them as in SPECT is that targets are
persistent, unlike radioisotope decays—adding the intensities will result in “over
counting” targets. Further discussion and an outline of the derivation of the multi-
sensor filter are given in Section 6.5.1.
The possibility of regularization at the multisensor level is not considered explic-
itly. Although perhaps obvious, the multisensor intensity filter is fully compatible
with sensor-level regularization methods.
Let the number of sensors be M ≥ 1. It is assumed that the target detection
probability functions, PkD (x ; ), = 1, . . . , M, are specified for each sensor.
The sensor-specific state space coverage is defined by
* +
Ck () = x ∈S : PkD (x ; ) > 0 . (6.42)

In homogeneous problems the sensor coverages are identical, i.e., Ck () ≡ Ck for
all . Heterogeneous problems are those that are not homogeneous.
Two sensors with the same coverage need not have the same, or even closely
related, probability of detection functions. As time passes, homogeneous problems
may turn into heterogeneous ones, and vice versa. In practice, it is probably desir-
able to set a small threshold to avoid issues with very small probabilities of detec-
tion. Homogeneous and heterogeneous problems are discussed separately.

6.5.1 Identical Coverage Sensors

For = 1, . . . , M, let the measurement space of sensor be Z(). Denote the

measurement pdf by pk (z | x ; ), where z() ∈ Z() is a point measurement. The
predicted measurement intensity is

λk|k−1 (z ; ) = pk (z | x ; ) PkD (x ; ) f k|k−1
Fused
(x) dx , z ∈ Z() , (6.43)
S+
174 6 Multiple Target Tracking

Fused (x) is the predicted target intensity based on the fused estimate
where f k|k−1
Fused (x) at time t
f k−1|k−1 k−1 . The measured data from sensor is

ξk () = (m k (), {z k (1 ; ), . . . , z k (m k () ; )}) . (6.44)

The sensor-level intensity filter is

f k|k (x ; ) = L k (ξk () | x ; ) f k|k−1

Fused
(x) , (6.45)

where the Bayesian information update factor is

k ()
m

pk (z k ( j ; ) | x ; ) PkD (x ; )
L k (ξk () | x ; ) = 1 − PkD (x ; ) + .
λk|k−1 (z k ( j ; ) ; )
j=1
(6.46)

The multisensor intensity filter is the average:

M
Fused
f k|k (x) = f k|k (x ; )
M
=1
&M '
1

= L k (ξk () | x ; ) f k|k−1

Fused
(x) . (6.47)
M
=1

If the sensor-level intensity filters are maintained by particles, and the number of
particles is the same for all sensors, the multisensor averaging filter is implemented
merely by pooling all the particles (and randomly downsampling to the desired par-
ticle size, if desired).
Multisensor fusion methods sometimes rank sensors by some relative quality
measure. This is unnecessary for the multisensor intensity filter. The reason is that
sensor quality, as measured by the probability of detection functions PkD (x ; ) and
the sensor measurement pdfs pk (z | x ; ), is automatically included in (6.47).
The multisensor intensity filter estimates the number of targets as

Fused
Nk|k = Fused
f k|k (x) dx
S
M
1

= f k|k (x ; ) dx
M
=1 S

M
= Nk|k () , (6.48)
M
=1
6.5 Multiple Sensor Intensity Filters 175

where Nk|k () is the number of targets estimated by sensor . Taking the expectation
of both sides gives

1

M

Fused
E Nk|k = E Nk|k () . (6.49)
M
=1

If the individual sensors are unbiased on average, or in the mean, then E[Nk|k ()] =
N for all , where N is the true number of targets present. Consequently, the multi-
sensor intensity filter is also unbiased.
The estimate Nk|k () is Poisson distributed, and the variance of a Poisson distri-
bution is equal to its mean, so

Var[Nk|k ()] = N , = 1, . . . , M .

The variance of the average in (6.48) is the average of the variances, since the terms
in the sum are independent. Thus,

M

Fused
Var Nk|k = 2
Var Nk|k ()
M
=1
N
= . (6.50)
M

In words, the standard deviation of the estimated target count in the √ multisensor
intensity filter is smaller than that of individual sensors by a factor of M, where M
is the number of fielded sensors. This is an important result for spatially distributed
networked sensors.
The averaging multisensor intensity filter is derived by Bayesian methods in
[124]. It is repeated here in outline. The Bayesian derivation of the single sensor
intensity filter in Appendix D is a good guide to the overall structure of most of the
argument.
The key is to exploit the PPP target model on the augmented space S + . Fol-
lowing the lead of (D.5) in Appendix D, the only PPP realizations with nonzero
M
likelihood have m k = =1 m k () microtargets. The m k PPP microtargets are
paired with the m k sensor data points, so the overall joint likelihood function is
the product of the sensor data likelihoods given the microtarget assignments. This
product is then summed over all partitions of the m k microtargets into parts of size
m k (1), . . . , m k (M).
The sum over all partitions is the Bayes posterior pdf on the event space E(S + ). It
is a very complex sum, but it has important structure. In particular, the single target
marginal pdfs are identical, that is, the integrals over all but one microtarget state
are all the same. After tedious algebraic manipulation, the single target marginal pdf
is seen to be
176 6 Multiple Target Tracking

M
p Fused
X (x) = L k (ξk () | x ; ) f k|k−1
Fused
(x) , x ∈ S+ . (6.51)
mk
=1

The mean field approximation is now invoked as in (D.13). Under this approxima-
Fused (x) = c p Fused (x), where the constant c > 0 is estimated. From (6.17)
tion, f k|k X
and (6.18), the measurement intensity is

λFused
k|k (z) = c pk (z | x) p Fused
X (x) dx , (6.52)
S+

so the likelihood function of c given the data sets ξk () is

⎧ ⎫
M ⎨
k ()
m ⎬
L (c ; ξk (1), . . . , ξk (M)) = e− S+ c p Fused
X (x)dx
λFused (z k ( j ; ))
⎩ k|k ⎭
=1 j=1
−c M m k
∝ e c . (6.53)

Setting the derivative with respect to c to zero and solving gives ML estimate ĉ M L =
mk Fused (x). Further purely technical details
M . The multisensor intensity filter is ĉ M L p X
of the Bayesian derivation provide little additional insight, so they are omitted.
The multiplication of the conditional likelihoods of the sensor data happens at the
PPP event level, where the correct associations of sensor data to targets is assumed
unknown. The result is that the PPP parameters—the intensity functions—are aver-
aged, not multiplied. The multisensor intensity filter therefore cannot reduce the
area of uncertainty of the extracted target point estimates. In other words, the mul-
tisensor intensity averaging filter cannot improve spatial resolution. Intuitively, the
multisensor filter achieves variance reduction in the target count by foregoing spatial
resolution of the target point estimates.

6.5.2 Heterogeneous Sensor Coverages

When the probability of detection functions are not identical, the multisensor inten-
sity filter description is somewhat more involved. At each target state x the only
sensors that are averaged are those whose detection functions are nonzero at x. This
leads to a “quilt-like” fused intensity that may have discontinuities at the boundaries
of sensor detection coverages.
The Bayesian derivation of (6.47) outlined above assumes that all the microtar-
gets of the PPP realizations can be associated to any of the M sensors. If, however,
any of these microtargets fall outside the coverage set of a sensor, then the assign-
ment is not valid. The way around the problem is to partition the target state space
appropriately.
6.5 Multiple Sensor Intensity Filters 177

The total coverage set

C = ∪=1
M
C() (6.54)

contains points in target state space that are covered by at least one sensor. Partition
C into disjoint, nonoverlapping sets Bρ that comprise points covered by exactly ρ
sensors, ρ = 1, . . . , M. Now partition Bρ into subsets Bρ,1 , . . . , Bρ, jρ that are
covered
by different combinations of ρ sensors. To simplify notation, denote the
sets Bρ j by {Aω }, ω = 1, 2, . . . , Ω. The sets are disjoint and their union is all
of C:

C = ∪Ω
ω=1 Aω , Ai ∩ A j = ∅ for i = j . (6.55)

No smaller number of sets satisfies (6.55) and also has the property that each set Aω
in the partition is covered by the same subset of sensors.
The overall multisensor intensity filter operates on the partition {Aω }. The assign-
ment assumptions of the multisensor intensity filter are satisfied in each of the sets
Aω . Thus, the overall multisensor filter is

⎛ ⎞
1

Fused
f k|k (x) = ⎝ L k (ξk () | x ; )⎠ f k|k−1
Fused
(x) , x ∈ Aω . (6.56)
|Aω |
∈I (Aω )

where I (Aω ) are the indices of the sensors that contribute to the coverage of Aω ,
and |Aω | is the number of sensors that do so.
The multisensor intensity filter is thus a kind of “patchwork” with the pieces
being the sets Aω of the partition. The variance of the multisensor filter is not the
same throughout C—the more sensors contribute to the coverage of a set in the
partition, the smaller the variance in that set.
A simple way to write the multisensor filter in the general case is

, 2
M
=1 wk (x ; ) L k (ξk () | x ; )
Fused
f k|k (x) = M
Fused
f k|k−1 (x) , (6.57)
=1 wk (x ; )

where
%
1, if PkD (x ; ) > 0 ,
wk (x ; ) = (6.58)
0, if PkD (x ; ) = 0 ,

is the coverage indicator function for sensor .

178 6 Multiple Target Tracking

6.6 Historical Note

The PHD (Probability Hypothesis Density) filter ([74], [76] and references therein)
was developed by Mahler beginning about 1994. Mahler was influenced in this work
[74, p. 1154] by an intriguing approach to additive evidence accrual proposed in
1993 by Stein and Winter [121]. The recent emphasis in this line of research is in
Mahler’s cardinalized PHD (CPHD) filtering [75]. Kastella independently devel-
oped a unified approach to multiple target detection and tracking called the joint
multitarget probability density (JMPD) method in [58–60]. The JMPD extends his
earlier work on event-averaged maximum likelihood estimation. The JMPD can be
derived using methods of finite point processes [85]. Alternative and insightful bin-
based derivations of both the PHD and CPHD filters were discovered in 2008 by
Erdinc, Willett, and Bar Shalom [30, 31].
The intensity filter of Streit and Stone [130] is very similar to the PHD filter,
differing from it primarily in its use of an augmented target state space, S + , instead
of birth-death processes to model target initiation and termination. The PHD filter
is recovered from the intensity filter by modifying the posterior intensity function.
This paper is the source of the Bayesian approach to the intensity filter given in
Section 6.1. The other two approaches presented in that section follow the discus-
sion given in [125]. It draws on the connections to PET to gain intuitive insight
into the interpretation of the PPP target model. These approaches greatly simplify
the mathematical discussion surrounding intensity filters since they build on earlier
work in PET imaging.
The multisensor intensity filter was first derived in 2008 by Streit [124] using a
Bayesian methodology, followed by the same kind of PPP approximation as is used
in the single sensor intensity filter. The general multisensor problem was presented
there for both homogeneous and heterogeneous sensor coverage. The theoretical and
practical importance of the variance reduction property of the averaging multisen-
sor filter was also discussed in the same paper. The relationship of the multisensor
intensity filter to SPECT is new.
Mahler [74] reports a product form for the multisensor PHD filter. It is unclear if
the product form estimates target count correctly. The problem arises from the need
for each of the sensor-level integrals of intensity as well as the multisensor integral
of intensity to estimate the target count. In any event, the multisensor intensity filter
and the multisensor PHD filter take quite different forms, and therefore are different
filters.
The MMIF tracking filter is new. It was developed by exploring connections
between intensity filters and the PMHT (Probabilistic Multiple Hypothesis Track-
ing) filter, a Gaussian mixture (not Gaussian sum) approach to multitarget tracking
developed by Streit and Luginbuhl [126–128] that dates to 1993. These connections
reveal the PPP underpinnings of PMHT.
Chapter 7
Distributed Sensing

How can I tell what I think till I see what I say?

E.M. Forster, Aspects of the Novel, 1927

Abstract PPPs make several important, albeit somewhat disjointed, contributions

to distributed sensor network detection, tracking, and communication connectivity.
The focus in this chapter is on detection and communication since tracking problems
are specialized forms of the multisensor intensity filter presented in Section 6.5.
Communication path lengths of a randomly distributed sensor field are characterized
by distance distributions. Distance distributions are obtained for sensors located at
the points of a nonhomogeneous PPP realization. Both sensor-to-target and sensor-
to-sensor distances are discussed. Communication diversity, that is, the number of
communication paths between sensors in a distributed sensor field is discussed as a
threshold phenomenon a geometric random graph and related to the abrupt transition
phenomenon of such graphs. Detection coverage is discussed for both stationary
and drifting sensor fields using Boolean models. These problems relate to classical
problems in stochastic geometry and geometric probability. The connection between
stereology and distributed sensor fields is presented as a final topic.

Keywords Distributed sensor detection · Distance distributions · Sensors to target ·

Between sensors · Slivnyak’s Theorem · Geometric random graph · Expected vertex
degree · Communication diversity · Erdos-Renyi random graph · Stationary sensor
fields · Drifting sensor fields · Poisson distribution approximation for k-coverage ·
Anisotropy · Stereology

Two central problems of distributed sensor fields are sensor communication and
target detection. Both problems are discussed in this chapter. Sensor communication
problems deal with connectivity issues. Two aspects of connectivity are discussed,
one local and the other global. Local issues are addressed by the distribution of
distances from a target to the nearest sensor in the field. This distance relates to target
detection capability of the fielded sensors. A different but closely related notion is
the distribution of distances between sensors in the field. This distance relates to the
capability of two sensors to communicate with each other. These distributions are
obtained for sensors that are located at the points of a nonhomogeneous PPP. The
distinction between these distributions is highlighted by Slivnyak’s Theorem.

R.L. Streit, Poisson Point Processes, DOI 10.1007/978-1-4419-6923-1_7, 179

C Springer Science+Business Media, LLC 2010
180 7 Distributed Sensing

Communication issues for the sensor field as a whole are not addressed directly
by distance distributions. Global connectivity issues are discussed in terms of the
statistical properties of the sensor connectivity graph. This work relates concepts of
communication channel diversity to that of k-connectivity of a geometric random
graph. Communication diversity is one example of a threshold phenomenon, that is,
of a property of a random graph that abruptly takes hold as some appropriate scale
is increased. The recognition that threshold phenomena are ubiquitous in geometric
random graphs is exciting new research.
Target detection by a fielded multisensor array also has local and global aspects.
The local issue deals with the ability of a single sensor to detect to target. The
target to nearest sensor distance distributions address this issue. The probability
that a fielded sensor array will detect a target is a global, or field-level, detection
issue. Field level detection capability is modeled as a coverage problem. Homoge-
neous sensor fields with isotropic detection, that is, fields of identical sensors with
the same detection range regardless of sensor location, are the easiest to analyze.
Extensions to fields with directional sensors with uniformly distributed orientations
are also straightforward. Further extensions to anisotropic problems with preferred
directional orientations are also tractable.
Detection problems bring ideas from stochastic geometry into play naturally.
There are six million stories in stochastic geometry. Only a few are told here. Further
details of the beautiful connections to integral geometry are banished (cruelly) to the
references. (Fortunately, [108] is a delight to read.)
A word of caution is in order. Sensor fields are often deployed in a systematic
manner that, while it may have random aspects, is not amenable to exact or approx-
imate PPP modeling. In such cases, the distance distributions and detection proba-
bilities of PPP models are low fidelity, or even inappropriate. Model limitations can
be mitigated in some applications by using the extensions and variations of PPPs
given in Chapter 8. For example, the points in realizations of the Matérn hard core
point process are not closer together than a specified distance, h. This gives a more
efficient, i.e., nonoverlapping, sensor spatial distribution than does a PPP; however,
they are also harder to analyze because the hard core constraint means that the points
are not i.i.d. conditioned on their number. In other words, the minimum separation
constraint reduces the spatial variability of the points. A different example is that
of a cluster process. These processes increase the spatial variability of the points.
Distance distributions for cluster point processes are given in [140]; see also [8].

7.1 Distance Distributions

In distributed sensor detection applications, the target is detected by a field of sen-

sors whose locations are modeled as the points of a PPP. Together with propagation
conditions and sensor characteristics, the distances from the target to the sensors
govern the probability of target detection Pd by one or more sensors in the field.
Defined this way, Pd is an ensemble probability evaluated over all possible sensor
fields distributed according to the PPP intensity—it is not the detection probability
7.1 Distance Distributions 181

of a specific realization of the sensor field. Consequently, this Pd relates to detection

capability of a system and its deployment model.
There are two kinds of distance distributions. One is “point to event,” where the
point is the target and the event comprises the sensor locations, that is, the points of
the PPP realization. The distances of interest are those from a specified point to the
nearest sensor, the next nearest sensor, etc. These distributions are of long standing
interest in diverse applications. They are treated in the first section below.
The other distance distributions are “event to event.” In this case, the distances
of interest are the distances between the closest pair of sensors, the second closest
pair, etc. This problem is substantially different from the first, except when the PPP
is homogeneous. The nature of this difference is clarified by Slivnyak’s Theorem
[113]. This result, which dates only to 1962, deals with the expectation of a random
sum that is pertinent to this problem. It is discussed in Section 7.1.2.

7.1.1 From Sensors To Target

For the moment, suppose the target is located at the origin. Let D1 ≤ D2 ≤ · · ·
denote the distances arranged in increasing order from the origin to the points of a
PPP with intensity λ(s) on S ⊂ Rm . The pdf of these distances are easily computed.
The approach here is based on the essentially geometric method of [132]. More
accessible references are [16, 43, 52]. It is assumed that S λ(s) ds = ∞, so there
are infinitely many points in the realization.
The distances Dn are random variables, since they depend on the realization of
the PPP. Let rn be a realization of Dn , and let r0 = 0. Let

S(r ) = S ∩ {x : x < r } . (7.1)

Thus, for a < b, S(b) − S(a) is the shell centered at the origin of Rm with inner
and outer radii a and b, respectively, intersected with S. Let NA denote the number
of points of the PPP in A ⊂ S.
The event 0 < r1 < r2 < · · · < rn comprises several events: no points are
in S(r1 ) − S(0), one point is in S(r1 + r1 ) − S(r1 ), no points are in S(r2 ) −
S(r1 + r1 ), one point is in S(r2 + r2 ) − S(r2 ), etc. These shells are nested and
not overlapped, so their probabilities are independent. Hence, setting r0 + r0 ≡ 0,

Pr[0 < r1 < r2 < · · · < rn ]

n
= Pr[N S(r j ) − S(r j−1 + r j−1 ) = 0] Pr[N S(r j + r j ) − S(r j ) = 1] . (7.2)
j=1

Let

μ(r ) = λ(s) ds .
S(r )
182 7 Distributed Sensing

Then

Pr[N S(r j ) − S(r j−1 + r j−1 ) = 0] = e−μ(r j ) + μ(r j−1 + r j−1 ) (7.3)

and

Pr[N S(r j + r j ) − S(r j ) = 1] = e−μ(r j + r j ) + μ(r j ) μ(r j + r j ) − μ(r j ) .

By the Mean Value Theorem of Calculus, there is a point r̃ j ∈ [r j , r j + r j ] such

that

μ(r j + r j ) − μ(r j−1 ) = μ (r̃ j ) r j ,

where μ (r ) is the derivative of μ(r ) with respect to r . Then,

Pr[N S(r j + r j ) − S(r j ) = 1] = e−μ(r j + r j ) + μ(r j ) μ (r̃ j ) r j . (7.4)

Substituting (7.3) and (7.4) into (7.2) gives

n
Pr[0 < r1 < r2 < · · · < rn ] = e−μ(rn +rn ) μ (r̃ j ) r j .
j=1

Dividing by r1 · · · rn and taking the limits as r j → 0 gives the pdf of the
ordered event 0 < r1 < r2 < · · · < rn :

n
p(r1 , . . . , rn ) = e−μ(rn ) μ (r j ). (7.5)
j=1

Integrating over r1 , . . . , rn−1 gives p Dn (rn ), the pdf of Dn . The required integrals
are consistent with the event order 0 < r1 < r2 < · · · < rn , so that
rn r2
pDn (rn ) = ··· p(r1 , . . . , rn ) dr1 · · · drn−1 .
0 0

The result is

μn−1 (r ) −μ(r ) d
pDn (r ) = e μ(r ) . (7.6)
(n − 1)! dr

The densities (7.6) hold for distances to the origin.

7.1 Distance Distributions 183

Distances to a target located at the non-origin point, say a, are obtained by

centering the spheres at a. Let

S(r ; a) = {x : x − a < r } , (7.7)

and set

μ(r ; a) = λ(s) ds . (7.8)
S(r ; a)

Proceeding as before leads to densities (7.6) that depend parametrically on the point
a for nonhomogeneous PPPs. This dependence can be quite dramatic. Consider,
e.g., a strongly peaked unimodal intensity. The distance distributions for a point a
far removed from the mode of the PPP intensity will have large means compared to
the means of the distance distributions obtained from a point b that is near the mode
of the intensity.

Example 7.1 Homogeneous PPPs. For homogeneous PPPs the densities (7.6) are
independent of the choice of a. In this case,

μ(r ) ≡ λ cm r m , (7.9)

where λ is the homogeneous intensity and

π m/2
cm = m (7.10)
! 2 +1

is the volume of the unit radius sphere in Rm . From (7.6),

m
(λ cm )n r mn − 1 e− λ cm r .
m
pDn (r ) = (7.11)
(n − 1)!

The j-th moment of (7.11) is

∞
j
E Dn = r j p Dn (r ) dr
0

! n + mj j
= (λ cm )− m . (7.12)
(n − 1)!

Example 7.2 Distances in the plane. In the plane, the pdf of the distance from a
target to the nearest sensor is

p D1 (r ) = 2 λ π r e−λ π r .
2
184 7 Distributed Sensing
√
Its mean is E[D1 ] = 1/ 2 λ , and its variance is

4−π
Var[D1 ] = E D12 − E[D1 ]2 = .
4λπ

If λ = π/4, the mean distance of a fixed −1/2 ≈

√ target to the nearest sensor is π
0.564 , and its standard deviation is 4 − π /π ≈ 0.295 . Thus, a back of the
envelop calculation suggests that if the sensor detection range is r0 = 0.5, the
field-level detection probability is
r0
p D1 (r ) dr = 1 − exp −λ π r02 ≈ 0.460 .
0

This kind of calculation is reasonable for sensors with idealized “cookie

cutter” detection regions.
Example 7.3 Distances to level curves (or “shells”). The derivation of (7.6) holds
for more general functions also: Let h(x), x ∈ Rm , be a continuous function such
that
• minx h(x) = h(O) = 0 , where O is the origin in Rm ,
• the sets S̃(r ) ≡ {x : h(x) ≤ r } are nested, and
• S̃(r ) → S as r → ∞ .
Let

μ̃(r ) = λ(s) ds .
S̃(r )

Then, replacing μ by μ̃ in (7.6) gives the pdf of the n-th ordered value of {h(·)}
evaluated at the points of the PPP realizations. Examples include:
1/2
• h ! (x) = x T ! x , where ! is a positive definite matrix, and
m 1/ p
• h p (x) = i=1 |x i |
p , p > 0.
For 0 < p < 1, the function h p (x) is not a generalization of the concept of dis-
tance since it is not convex. The functions h p (x) are of great interest in compressive
sensing.
Example 7.4 Connection to Extreme Value Distributions. For n = 1, the nearest
neighbor distribution in Rm is the integral of the pdf of the distance D1 :
x
F(x) = p D1 (r ) dr , x > 0,
0 x
m λ cm r m − 1 e−λ cm r dx
m
=
0
= 1 − e−λ cm r .
m
(7.13)
7.1 Distance Distributions 185

The rate of decay of this function increases very rapidly as the dimension m
increases, yet another symptom of the Curse of Dimensionality.
The function F(x) is the famous Weibull distribution. It is one of the three stable
laws of extreme value statistics that are allowed by the so-called Trinity Theorem.
The other two stable laws are named for Fréchet and Gumbel. The Trinity Theo-
rem holds whenever the cumulative distribution function of the underlying sample
distribution is continuous and has an inverse.
The nearest neighbor is the minimum of a large number of realizations of i.i.d.
random variables, so the Trinity Theorem must hold for nonhomogeneous intensi-
ties. The limiting distribution for nonhomogeneous PPPs is currently unavailable,
but it is likely that it is Weibull with an intensity equal to the local intensity λ(a)
at the point a from which nearest neighbor distances are centered. Higher order
correction terms would need to be determined by asymptotic methods.

7.1.2 Between Sensors

The distances between a sensor and all the other sensors in a given realization is a
conceptually more difficult problem. For one thing there is no legitimate concept of
a reference sensor since different realizations comprise different points.
One way to approach the problem is to average over the sensors in a given real-
ization, and then to seek the expectation of this sum over all PPP realizations. For
example, the distance between a point in a realization and its nearest neighbor can
be averaged over all the points of the realization. Such sums are random sums, but
of a more specialized kind than are used in Campbell’s Theorem. Evaluating the
expectation of these random sums is the goal of Slivnyak’s Theorem.

7.1.2.1 Slivnyak’s Theorem

Let ξ = (n, {x1 , . . . , xn }) denote a realization of a point process with outcomes in
the event space E(S). Let f denote a real-valued function with two arguments, the
first in the space S and the second in the event space

E(S) = {∅} ∪∞
n=1 {x 1 , . . . , x n } : x j ∈ S, j = 1, . . . , n . (7.14)

The event space E(S) is identical to the event space E(S) with the integer counts
removed. An example of such a function is the nearest neighbor distance from an
arbitrary point x ∈ S to the points in ξ :

f N N (x, {x1 , . . . , xn }) = min xi − x , (7.15)

1≤i≤n

with f N N (x, {∅}) = 0. The pdf of f N N is given in the previous section when ξ is
a realization of a PPP.
186 7 Distributed Sensing

For distributed sensors a more interesting example is the nearest neighbor dis-
tance from an arbitrary vertex x j in ξ to the other points in the same realization:

f V V x j , {x1 , . . . , xn } \ x j ≡ f V V x j , {x1 , . . . , x j−1 , x j+1 , . . . , xn }
= min xi − x j . (7.16)
1≤i≤n
i= j

The sum of the nearest neighbor distances of the vertices in the PPP realization ξ is

n

F(ξ ) = f V V x j , {x1 , . . . , xn } \ x j . (7.17)
j=1

For n = 0, F(ξ ) ≡ 0. Slivnyak’s Theorem is concerned with the expected value

of random sums of the form (7.17) for general functions f .
Denote the intensity of the PPP by λ(x), x ∈ S ⊂ Rn x . By definition (2.26), the
expectation of F(ξ ) for a general function f , of which f N N and f V V are examples,
is
⎡ ⎤

n

E [F] = E ⎣ f x j , {x1 , . . . , xn } \ x j ⎦
j=1

∞

n

e− S λ(x)dx
= ··· f x j , {x1 , . . . , xn } \ x j
n! S S
n=1 j=1

n
× λ(x j ) dx1 · · · dxn
j=1

∞

e− S λ(x)dx
n
= ··· f (xn , {x1 , . . . , xn−1 }) λ(x j ) dx1 · · · dxn .
(n − 1)! S S
n=1 j=1

The second step moves the integrals of the expectation inside the sum over j. The
last step follows from recognizing that the n-fold integrals over S are identical.
Now, for every n, the integral over xn is unchanged by replacing xn with the dummy
variable, x. Therefore,
,
∞ −

e S λ(x)dx
E [F] = ··· f (x, {x1 , . . . , xn−1 })
S n=1 (n − 1)! S S
⎫

n−1 ⎬
× λ(x j ) dx1 · · · dxn−1 λ(x) dx
⎭
j=1

= E f (x, {x1 , . . . , xn }) λ(x) dx ,
S
7.1 Distance Distributions 187

where the last step is merely shifting the index n ← n − 1 so that the infinite sum
starts at n = 0. This gives Slivnyak’s Theorem:
⎡ ⎤

n

E⎣ f x j , {x1 , . . . , xn } \ x j ⎦ = E f (x, {x1 , . . . , xn }) λ(x) dx .
j=1 S
(7.18)

The result relates two different kinds of averages. One is how a point in a realization
relates to other points in the very same realization. The other is how a given point
relates to the points of an arbitrary PPP realization. The latter average can be easier
to evaluate analytically.

7.1.2.2 Geometric Random Graphs

A graph G comprises a finite number n of vertices V and the edges E that link
them. In general the vertices are points in an arbitrary space. In distributed sensor
applications, the vertices correspond to sensors, and the (undirected) edges of G
represent communication links between sensor pairs. Typically, the vertices are the
point locations of the sensors and lie in a bounded set R ⊂ Rn x , where n x = 2.
The number n of vertices in an Erdös-Rényi random graph is fixed, but the edges
are drawn uniformly at random between vertex pairs. More carefully, in a realization
of the random Erdös-Rényi graph, n(n − 1)/2 i.i.d. Bernoulli trials with success
probability p are performed. Each trial corresponds to a possible edge: edges are
drawn only between vertices with a successful Bernoulli trial.
A geometric random graph, or nearest neighbor graph, is quite different from
Erdös-Rényi. The vertices of G are the locations of a realization of a point process,
so the number of sensors is a random variable. Edges are drawn between pairs of
vertices that are within a specified distance r ≥ 0 of each other. Denote this graph
by G r . For a given realization of the vertices, G r is a subgraph of G s if r ≤ s.
The vertex degree dG (v) of a vertex v ∈ G is the number of edges with an
endpoint at v. For a specified realization of a geometric random graph, the vertex
degree is a monotone increasing step function with minimum zero and maximum
value n. The average behavior over all possible geometric random graphs generated
by a PPP with intensity λ(x) is obtained from Slivnyak’s Theorem. To see this, let

fr (x, {x1 , . . . , xn }) = N [{i : x − xi ≤ r }] , (7.19)

where N [ · ] is a set function that counts the number of points in its argument. In
terms of the function fr , the degree of vertex x j is

fr x j , {x1 , . . . , xn } \ x j , (7.20)

so the aggregate vertex degree, summed over all vertices in a given realization of
the graph G r , is the random sum
188 7 Distributed Sensing

n

fr x j , {x1 , . . . , xn } \ x j . (7.21)
j=1

Let d Σ(r ) denote the mean total vertex degree.

The expected value on the right hand side of (7.18) is straightforward for the
function (7.19). For x ∈ R, by a fundamental property of PPPs,

E fr (x, {x1 , . . . , xn }) = λ(y) dy
S(r ; x)

= λ(y + x) dy ,
S(r )

where S(r ) is given by (7.1) and S(r ; x) by (7.7). Hence, from (7.18), the expected
total vertex degree is

d Σ(r ) = λ(y + x) dy λ(x) dx . (7.22)
S S(r )

Interchanging the order of integration and dividing by 2 gives

d Σ(r ) = λ(y + x) λ(x) dx dy . (7.23)
S(r ) S

The mean vertex degree is the ratio of d Σ(r ) and the expected number of vertices.
This ratio, written in a form that emphasizes convexity, is

S λ(y + x) λ(x) dx
d(r ) = dy . (7.24)
S(r ) S λ(x) dx

The average vertex degree d(r ) is an increasing function of r . It is strictly increasing

if λ(x) > 0.
Another example is the nearest neighbor distance function f N N given by (7.15).
In this case the expected value on the left hand side of (7.18) is the expected value
of the sum of the nearest vertex distances. The right hand side is evaluated for non-
homogeneous intensities after first computing the mean in the manner described just
before Example 7.2. This can be done in specific cases.
The mean in the homogeneous
√ case is easy because the required expectation is
given in Example 7.2 as 1/ 2 λ . Thus, the right hand side of (7.18) for f N N is
√
(|S| λ)/ 2 λ , where |S| is the area of S. Dividing both sides by the expected
number of points (vertices) in S gives the average nearest vertex distance:
7.2 Communication Diversity 189

4 √
(|S| λ) 2 λ
D1V =
E[Number of vertices]
1
= √ . (7.25)
2 λ

The average nearest vertex distance is identical to the mean nearest neighbor dis-
tance for homogeneous PPPs, a result that accords well with intuition.

7.2 Communication Diversity

The question considered in this section is “How many communication paths are
there between pairs of sensors in the network?” The question is answered with high
probability, a phrase that is commonplace in the theory of random graphs. As is
implicitly clear, the answer provided uses only the i.i.d. property of PPPs.
Several standard definitions from graph theory are useful. Let G denote a given
(nonrandom) graph.
• The minimum degree of any vertex in G is denoted by δ(G), that is,

δ(G) = min dG (v) . (7.26)

v∈G

• A path in G from the vertex a ∈ G to the vertex b ∈ G is a sequence of edges

that connect a and b.
• Two paths in G are independent if any vertex common to both paths is an end
point of both paths.
• The graph G is k-connected if there are at least k pairwise independent paths
connecting every pair of vertices in G.
• The maximum value of k for which G is k-connected is the connectivity, κ(G),
of G.
Let the number of vertices of G r be n, and let their locations be i.i.d. independent
samples on a bounded set R ⊂ Rn x . Penrose [94] shows that, for all k ≥ 1,

lim Pr min { κ(G r ) ≥ k } = min { δ(G r ) ≥ k } = 1. (7.27)
n→∞ r r

This result holds if n x ≥ 2 and R is the n x -dimensional unit hypercube. A small

modification of (7.27) is needed for n x = 1.
If the vertices of G r are realizations of a homogeneous PPP on R, the conditions
for (7.27) hold as the intensity goes to infinity. It is tempting to think that some
modified version of (7.27) also holds for Matérn hard core processes whose intensity
is given by (8.9), but this is currently unknown.
Despite the limitations of (7.27), the result is important in sensor network appli-
cations. Paraphrasing [94], for sufficiently many sensors in R, with high probability,
190 7 Distributed Sensing

if the sensor communication graph G r is “grown” by slowly increasing r from

some small initial value, then edges are added consecutively, and the increasingly
complex graph becomes k-connected for the very same value of r at which the
minimum vertex degree becomes k. If this value of r , denoted rthresh , is less than the
transmission range rtran , then the network has at least k independent communication
paths between any pair of sensors. The numerical value of rthresh is determined by
simulation.
In practical applications, if rthresh rtran , it may be possible to reduce the num-
ber of sensors n in the network to reduce cost, or to reduce the sensor transmit power
to extend the lifetime of the fielded sensors. A discussion of these and related topics
in the context of multi-objective Pareto optimization of sensor networks is given in
[13].
The result (7.27) is related to a well known result for (nongeometric) Erdös-
Rényi random graphs [10, Section 7.2]. The graph in which every pair of the n
vertices are connected by an edge is the complete graph, K n , with n2 edges. Sup-

pose, when drawing K n , that edges are drawn one at a time. The n2 ! possible ways
n
to draw K n in this manner correspond to the 2 ! orderings of the edges. If the order-
ings are equally likely, then, with high probability, the graph becomes k-connected
at the same time the minimum vertex degree becomes k. The difference between
this result and (7.27) is that the edges of G r are not drawn uniformly at random.

7.3 Detection Coverage

Coverage refers to set containment. For example, in the classic papers of Robbins
[105, 106], a germ-grain Boolean model is used. In this model the germs are the
centers x j of n grains, or discs, D(x j ) of radius r > 0. The germs are distributed
uniformly in the plane so that every grain has nonempty intersection with a given
rectangle R. The coverage of R is the set

C = ∪nj=1 D(x j ) ∩ R .

Let |A| denote the area of the set A. The expected coverage is E[ |C| ], where the
expectation is over the disc/grain centers. The probability that any given point x in
R is covered by (contained in) C is defined by

E [ |C| ]
Pr[x ∈ C] = . (7.28)
|R|

The connection of this problem to detection with distributed sensors is clear—the

grains are the detection areas of a sensor and the spatial distribution of the germs
is the manner in which sensors are deployed in the application. If the germs are
distributed in a well defined random manner, the probability of target detection is
the probability that any given point is covered by at least one grain.
7.3 Detection Coverage 191

Connections between coverage and distributed detection of a moving target with

fixed (i.e., motionless) sensors are presented in Section 7.3.1 using isotropic cover-
age models. Isotropic models are independent of direction of target motion.
Anisotropic models are needed in two situations. One is when the target moves,
and the other is when the sensor field experiences a “bulk” drift. Anisotropic prob-
lems are discussed in Section 7.3.2 under the assumption that motion is linear and
that the direction of motion is randomly distributed with known pdf.
Recent work on distributed detection using coverage models are discussed in
[67, 88, 107, 142]. They discuss extensions to sensors with nonconvex detection
regions and a heterogeneous mix of sensors.
The results presented in this section are intended only to reveal the flavor of the
kind of results that are possible. The number of points is assumed known and is
not a random variable. The points are i.i.d. with uniform pdf on the surveillance set
R. Extending these results to homogeneous PPPs in which the number of points is
Poisson distributed is straightforward, but not discussed here. Extensions to nonho-
mogeneous pdfs seems not at all straightforward.

7.3.1 Stationary Sensor Fields

Defining probability as the ratio of areas as is done in (7.28) is a natural procedure.

Ratios of other quantities of the same dimension, such as a ratio of lengths and
volumes, is also natural. Extended use is now made of this fact.
A line in the plane is uniquely parameterized by two parameters, say (ρ, θ ), so
the set of lines L is identified with the set of points:

L ⇔ G ≡ {(ρ, θ ) : (ρ, θ ) is a line in L } ⊂ R2 . (7.29)

The coordinate system is chosen so that the double integral over the point set G on
the right hand side of (7.29) does not change when the lines in L are subjected to a
rigid motion in the plane.1 It can be verified that the natural coordinate system with
this invariance property is the so-called normal form of the line:

x cos θ + y sin θ = ρ , (7.30)

where ρ ≥ 0 is the distance from the origin to the foot of the perpendicular to
the line , and θ, 0 ≤ θ < 2π, is the angle measured counterclockwise from
the positive x-axis to the perpendicular line. The angle θ is not limited to [0, π )
because the perpendicular is a line segment with one endpoint always at the origin.
See Fig. 7.1. The double integral over the point set G in (7.29) has units of length.

1 The importance of invariance was not recognized in early discussions of the concept of a random
line. Bertrand’s paradox (1888) involves three different answers to a question about the length of a
random chord of a circle. The history is reviewed in [61, Chapter 1].
192 7 Distributed Sensing

Fig. 7.1 Depiction of coordinate system and the line (ρ, θ). The support function p(θ) and sup-
port line for a convex set K is also shown. The origin O is any specified point interior to K . The
line (ρ, θ) intersects K whenever 0 ≤ ρ ≤ p(θ). The thickness T (θ) is defined later in (7.45)

Let K be a bounded convex set in the plane, and take as the origin any point
interior to K . The double integral over all lines that intersect K is very simple:

dρ dθ = L , (7.31)
G∩K =∅

where L is the perimeter of K . To see this, note that

2π p(θ) 2π
dρ dθ = dρ dθ = p(θ ) dθ , (7.32)
G∩K =∅ 0 0 0

where p(θ ) is the distance from the origin at angle θ to the tangent line to K . The
function p(θ ) is called the support function of K , and the tangent line is called a
support line. The support function and line are depicted in Fig. 7.1. It can be shown
that p(θ ) + p (θ ) > 0 and that the infinitesimal of arclength of K is

ds = p(θ ) + p (θ ) dθ . (7.33)

Thus,

L 2π 2π
L = ds = p(θ ) + p (θ ) dθ = p(θ ) dθ , (7.34)
0 0 0

where the integral over p (θ ) is zero because p(θ ) is periodic. Comparing (7.32)
and (7.34) gives (7.31).
7.3 Detection Coverage 193

Let K 1 denote a convex subset of K . Then

dρ dθ = L 1 , (7.35)
G∩K 1 =∅

where L 1 is the perimeter of K 1 . Therefore, with the same notion of probability

as used in (7.28), the probability that a random line intersects K 1 given that it
intersects K is

|K 1 |
Pr ∩ K 1 = ∅ | ∩ K = ∅ = . (7.36)
|K |

Further details are found in [108, p. 30].

This remarkably simple and elegant result (due to Crofton, 1885) translates
immediately into the distributed sensor detection application: if a surveillance
region K is a disc of radius R and the detection region of a sensor is a disc K r
of radius r , the probability that the sensor has the opportunity to detect a random,
but constant course, target traversing K is r/R. The possibly more intuitive result,
namely the ratio of areas r 2 /R 2 , is thus overly pessimistic.
Continuing this example, the mean time that the target is within the surveillance
region K of the sensor field is the duration of the target detection opportunity. This
“dwell” time is proportional (depending on target velocity) to the average chord
length, that is, the average length of the line segment that lies in the convex set K .
The average chord length is given by another of Crofton’s Theorems:

π |K |
E[chord length] = , (7.37)
L

where L is the perimeter of K . To see this, let the line (ρ, θ ) intersect K . Denote
by σ (ρ, θ ) the length of the line segment, or chord, subtended by K . The mean
chord length of lines that intersect K is the ratio

G∩K =∅ σ (ρ, θ ) dρ dθ
E[chord length] = .
G∩K =∅ dρ dθ

From (7.31), the denominator is L. The numerator is

2π p(θ)
σ (ρ, θ ) dρ dθ .
0 0

For any angle θ , the infinitesimal σ (ρ, θ ) dρ is an element of area for K . One part
of K is covered by area elements with angle θ , and the other part is covered by
elements with angle θ + π . The area elements are not overlapped, so integrating
over all area elements gives the numerator as
194 7 Distributed Sensing

& '
π p(θ) p(θ + π ) π
σ (ρ, θ ) dρ + σ (ρ, θ ) dρ dθ = |K | dθ = π |K | .
0 0 0 0
(7.38)

The establishes (7.37).

If the sensor surveillance region K is not convex, that is, if K is any closed and
bounded subset of R2 , then

π |K |
E[chord length] = , (7.39)
K d p dθ

In the general case, chord length is replaced by the sum of the subtended line seg-
ments [2].
A field of n sensors with congruent detection regions, denoted K 1 , . . . , K n , are
dropped at random so that for every j the region K j intersects the surveillance
region K 0 . The orientation of the sensors is also random, an assumption that matters
only when the sensor detection capability is not circular.
Let Ik be the area of R that is covered by exactly k sensors. Let F0 and L 0 denote
the area and perimeter of K 0 , respectively. Similarly, let F and L denote the area
and perimeter of the sets K j . The expected value of the area of Ik is

n (2 π F)k (2 π F0 + L 0 L)n − k F0
E [ |Ik | ] = . (7.40)
k (2 π (F + F0 ) + L 0 L)n

It is no accident that this expression resembles the binomial theorem; see [108,
pp. 98–99]. To see this result it is necessary to introduce the concept of mixed areas
(also called quermasse integrals) that were first studied by Minkowski (c. 1903).
The product L 0 L in (7.40) is a mixed area. This endeavor is left to the references.
Dividing (7.40) by the area of the surveillance region gives the probability that a
target is detected by precisely k sensors:

E [ |Ik | ]
Pr[ k-coverage ] =
F0

n (2 π F)k (2 π F0 + L 0 L)n−k
= . (7.41)
k (2 π (F + F0 ) + L 0 L)n

For k = 0 the result gives the probability that a target is not detected by even one
sensor. The set I0 is called the vacancy, and it is of much interest in modern coverage
theory [44].

Example 7.5 Poisson Approximation to k-Coverage. The Poisson distribution

approximates the probability that a random point is detected by exactly k sensors.
To see this, let the surveillance region K 0 expand to the entire plane in such a way
that the ratio of the number of sensors n and the area of K 0 satisfy
7.3 Detection Coverage 195

n F λF
= λ ⇔ = .
F0 F0 n

Regardless of the shape of K 0 , the ratio of its perimeter to its area L 0 /F0 → 0.
Manipulating (7.41) and taking the limit as n → ∞ gives

(λF)k −λF
Pr[ k-coverage ] = e . (7.42)
k!

To see this, a variation of the Poisson approximation to the binomial distribution is

used. Let

2 πF κ λF
= , (7.43)
2 πF + 2 πF + L 0L n

where
−1
λF L0 L
κ = +1+ .
n 2 πF0

Clearly, κ → 1 as n → ∞. Now, substituting (7.43) into (7.41) and manipulating

the algebra gives the identity
−k
(κλF)k n! κ λF κ λF n
Pr[ k-coverage ] = 1− 1− .
k! (n − k)! n k n n

In the limit as n → ∞, the middle terms both go to one, and the last term goes to
the exponential. This establishes (7.42).

7.3.2 Drifting Fields and Anisotropy

Anisotropic geometric distributions arise in several applications. Anisotropic cover-

age arises when the entire sensor field experiences a bulk motion, or drift. An equiv-
alent problem arises when targets move through a fixed sensor field with a preferred
direction of motion. If a sensor has a circular detection region with detection range
R D and drifts for a known time period t = t − t0 with known constant velocity
v, its detection coverage is the “pill-shaped” region depicted in Fig. 7.2. The overall
length of the pill depends on the speed and duration of drift, and the sensor detection
range. The preferred orientation of the pill-shaped areas corresponds to the direction
of sensor drift or to target motion, as the case may be. The total field-level detection
coverage is the union of the pill-shaped regions.
Figure 7.3a,b depict two different fixed drift angle configurations for 10 sen-
sors. They show that the field-level coverage changes as a function of drift angle.
Figure 7.3c depicts 10 sensors with drift angles that vary independently about a
196 7 Distributed Sensing

Fig. 7.2 Depiction of the pill-shaped detection region of a sensor with a circular detection region
with detection range R D drifting for a known time period t = t − t0 and known constant
velocity v. The point P(t0 ) is the initial sensor location, and P(t) is its location at time t

Fig. 7.3 Depiction of the varying coverage of 10 drifting sensors pill-shaped detection regions.
The length of the rectangular section of the pill is proportional to drift speed and duration of drift;
the width is the detection range of the sensor. (a) All sensors drift to the east. (b) If, instead, all
sensors drift to the southeast, the total coverage and sensor overlap changes. (c) All 10 drift to the
east on average, but each with a slightly different angle. (d) Random orientations as required by
the coverage theory of Section 7.3.1

mean drift direction. As depicted in Fig. 7.3d, pills scattered randomly in angle and
location do not exhibit a preferred orientation. The coverage results of the previous
section are only valid for the situation depicted in Fig. 7.3d. Therefore, they are
either inappropriate or at best an approximation if the anisotropy is small, that is, if
the pill-shaped regions approximate a circle because v ≈ 0.
A different kind of example of the need for anisotropy arises in barrier problems.
In these problems the convex surveillance set K is typically long and narrow, and
long traverses are unrealistic. These infrequent, but long, traverses make the average
chord length longer than may be reasonable in some applications. For example, the
average chord length for a rectangular barrier region K of width and length D
is, by Crofton’s Theorem (7.37),
7.3 Detection Coverage 197

πD π
E[chord length] = ≈ ≈ 1.57 . (7.44)
2 + 2 D 2

Long transit lengths, while infrequent, nonetheless make the average transit length
significantly larger than the shortest possible transit . If target trajectories are more
likely to be roughly perpendicular to the long side of K , the isotropic model is of low
fidelity and needs modification. Given a pdf of the angle θ of transit, an expression
for the expected transit length can be developed using the methods discussed below.
Anisotropic problems are studied by Dufour [25] in Rκ , κ = 2, 3. Much of this
work seems to be unpublished, but accessible versions of some of the results are
given in [120, pp. 55–74] and also [108, p. 104]. The former discusses anisotropy
generally and gives an interesting application of anisotropy to motor vehicle traffic
flow analysis. Most if not all of Dufour’s results are analytically explicit in the sense
that they are amenable to numerical calculation. Of the many results in [25], two are
discussed here to illustrate how anisotropy enters the problem.
The distribution dF(θ ) of the orientation angle is assumed known. This notation
accommodates diverse angle distributions,including the case when there is only one
possible orientation. If F is differentiable, then F (θ ) is the pdf of orientation angle.
Let K ⊥ (θ ) denote the orthogonal projection of a bounded set K ⊂ R2 onto a line
through the origin with angle θ . The set K ⊥ (θ ) is an interval on the line if K is
connected (i.e., if K is not the union of two or more disjoint sets). The thickness of
K is defined by

T (θ ) = dρ . (7.45)
K ⊥ (θ)

For convex sets, T (θ ) = p(θ ) + p(θ + π ), as is illustrated in Fig. 7.1. The

expected thickness of K is the average thickness over all angles:
π
E[T ] = T (θ ) dF(θ ) . (7.46)
0

The mean thickness is an important quantity in several kinds of intersection

problems.
A line segment of length R is characterized by its midpoint and orientation. The
midpoint is uniformly distributed over an area that contains the bounded convex
set K 0 , and the orientation of the line is distributed as dF(θ ). Such a line is an
anisotropic random line segment. Let K 1 be a convex subset of K 0 . The probability
that a random line segment intersects K 1 given that it intersects K 0 is

Pr line segment intersects K 1 | line segment intersects K 0
|K 1 | + R E [T1 ]
= , (7.47)
|K 0 | + R E [T0 ]

where E [Ti ] is the mean thickness of K i , i = 0, 1. As the line segment length

R → ∞, the probability simplifies to
198 7 Distributed Sensing

E [T1 ]
Pr line segment intersects K 1 | line segment intersects K 0 = .
E [T0 ]

These probabilities can be computed analytically for specific sets K 0 and K 1 .

A very different problem involves determining the mean area of the intersection
of n random convex sets K i and a specified convex set K 0 . The random sets are
anisotropic with distribution dF(x). The ratio of the mean area to the total area of K 0
is interpreted as the probability that a target presents n sensor detection opportunities
to a drifting sensor field. The expected area of the intersection ∩i=0
n K i is the product
of n terms involving the areas of K i and the expected mixed area of K i with K 0 [25,
Theorem 14]. Roughly speaking, the expected mixed area is defined in a manner
analogous to the expected thickness. The mean perimeter of the intersection can
also be computed. The expression in this case is somewhat more complicated than
the expected area, but it also involves the expected mixed areas. The expected area
and perimeter are both amenable to numerical computation.

7.4 Stereology
Stereology is most often defined as the study of three dimensional objects by means
of plane cross-sections through the object. A 3-D object in R3 can be studied by
2-D and 1-D sections, or by combination of them. More generally, it is the study
of the structure of objects in Rm that are known only via measurements of lower
dimensional linear sections. Stereology is of great practical importance in many
fields ranging from geology (e.g., core samples) to biology (e.g., microscope slide
smears, tissue sections, needle biopsies).
Tomography might be seen as a specialized form of stereology because it
involves (nondestructive) reconstruction of 3-D properties from thin 2-D sections.
However, tomography computes as many plane sections as are needed to image an
object fully, and is therefore typically data rich compared to stereology. Only data
starved tomographic applications (e.g., acoustic tomography of the ocean volume)
might be considered comparable to the problems of stereology.
Distributed multiple sensor problems are not currently thought of as problems of
stereology. The field of view (FOV) of any given sensor in a distributed sensor field
is a limited cross-section of the medium in which the sensors reside. Interpreting
this limited data to understand and estimate statistical properties of the medium as
a whole is a stereological interpretation of the problem. Admittedly, the connection
seems tenuous at best, but the potential of stereology to contribute to understanding
the problems involved is real enough to justify inclusion of a brief mention of the
topic here.
Problems in stereology are difficult, but sometimes have surprising and pleasing
solutions. An excellent example of this is Delesse’s principle from mineralogy, and
it dates to 1848. For concreteness, suppose a rock sample is sliced by a saw and the
cross-section polished. The cross-section is examined for the presence of a specified
mineral. The ratio of the area of the cross-section that intersected the mineral to the
7.4 Stereology 199

total area of the cross-section is called the area fraction, denoted A A . Similarly, the
ratio of the total volume of the mineral to the total volume of the rock sample is
the volume fraction, denoted by VV . Assuming the rock sample is a representative
specimen from a much larger homogeneous volume, Delesse’s principle says that
the expected volume fraction equals the expected area fraction:

A A = VV .

The expectation is over all possible two-dimensional slices through the rock. The
area fraction is an unbiased estimator of the volume fraction. The variance of the
estimate is much harder to evaluate. Delesse’s principle is very practical since the
area fractions of several rock samples are much easier to measure then volume
fraction.
Rosiwal extended the result in 1898 to line fractions. Draw an equispaced grid
of parallel lines on the polished rock surface. Measure the length of the line seg-
ments that overlay the mineral of interest, and divide by the total length of the line
segments. This ratio is the line fraction L L . Rosiwal’s principle says that

L L = A A = VV ,

provided, as before, the rock is a sample from a larger homogeneous volume. Line
fractions are even easier to measure than area fractions.
Similar applications in microscopy are complicated by the simple fact that even
the sharpest knife may not cut a cell, but simply push it aside, unless the tissue is
frozen first. Freezing is not always an option. There are other practical issues as
well.
These methods do not provide estimates of the number of objects in a vol-
ume. In the biological application, for example, the number of cells cut by a two-
dimensional tissue slice does not in general indicate the number of cells in the tissue
volume.
Modern stereology is a highly interdisciplinary field with many connections to
stochastic geometry. An excellent modern book on the subject is [3], which also
gives references to the original papers of Delesse and Rosiwal. The relationships
between stereology and integral geometry, convexity, and Crofton-style formulae
are discussed in [2] and the references cited therein. An informative overview is
also given in [44, Section 1.9].

Example 7.6 Connections. An old puzzle related to Delesse’s principle goes as fol-
lows: a region in the plane is such that every line through the origin intersects it in
a line segment of length 2. Is the region a unit radius circle centered at the origin?
The answer is no, and the cardioid r = 1 − cos θ is a nice counter-example. If,
however, the region is centrally symmetric, the answer is yes. Extended to three
dimensions, the problem is: an object in R3 is such that the area of its intersection
with every plane through the origin is π . Is the object the unit sphere centered at the
200 7 Distributed Sensing

origin? The answer [32] in this case is the same as in the plane, but the solution is
also much deeper mathematically. Counter-examples are found by using spherical
harmonics. This problem is a special case of the Funk-Hecke theorem [29, 109] for
finding the spherical harmonic expansion of a function knowing only its integrals
on all the great circles. While seeking the analog of this theorem in the plane (given
integrals of a function on all straight lines), Radon found in 1917 what is now called
the Radon transform, which is recognized as the mathematical basis of tomography.
Part III
Beyond the Poisson Point Process
Chapter 8
A Profusion of Point Processes

The time has come, the Walrus said,

To talk of many things:
Of shoes–and ships–and sealing-wax–
Lewis Carroll, The Walrus and The Carpenter
in Through the Looking-Glass and
What Alice Found There, 1872

Abstract Generalizations of PPPs are useful in a large variety of applications.

A few of the better known of these point processes are presented here, with emphasis
on the processes themselves, not the applications. Marked processes are relatively
simple extensions of PPPs that model auxiliary phenomena related to the point
distribution. Other processes model the point-to-point correlation that may exist
between the otherwise random occurrences of points. These processes include hard
core processes and cluster processes. Cox processes are briefly reviewed, along with
two stochastic processes, namely, Markov modulated Poisson processes and filtered
processes. Gibbs (or Markov) point processes are not straightforward generaliza-
tions of PPPs. Generating realizations of Gibbs processes is typically done using
MCMC methods.

Keywords Marked process · Marking Theorem · Hard core process · Matern

process · Cluster processes · Poisson cluster process · Neyman-Scott cluster process
· Cox process · Bartlett’s Theorem · Markov modulated Poisson process (MMPP) ·
Gibbs point process

This chapter discusses only a few of the bewilderingly large variety of point pro-
cesses that are not PPPs and are also useful in applications. The MCMC revolution,
as it has been called, will undoubtedly increase the variety and number of non-PPPs
that find successful application in real world problems.
Many useful point processes are built upon a PPP foundation, which assists in
their simulation and aids in their theoretical development. Marked PPPs are dis-
cussed first. Despite appearances, marked PPPs turn out in the end to be equivalent
to PPPs.
Hard core processes are discussed next. They have less spatial variability than
PPPs because points are separated by a specified minimum distance. Loosely

R.L. Streit, Poisson Point Processes, DOI 10.1007/978-1-4419-6923-1_8, 203

C Springer Science+Business Media, LLC 2010
204 8 A Profusion of Point Processes

speaking, the points are “mutually repelled” from each other. The Matérn hard core
processes use dependent thinning to enforce point separation.
Cluster processes are presented next. They have greater spatial variability than
PPPs because, as the name suggests, points tend to gather closely together—points
are in a loose sense “mutually attractive.” Poisson and Neyman-Scott cluster pro-
cesses are discussed. A special case of the latter is the Matérn cluster process, which
uses a resampling procedure to encourage point proximity.
The Cox, or doubly stochastic, processes are discussed next. These are complex
processes that push the concept of the ensemble to include the intensity function. In
other words, the intensity function is itself a random variable. It is shown that a Cox
process whose intensity function is a random sum is a Neyman-Scott process.
Many useful point processes are not directly related to PPPs. One of these—the
Gibbs, or Markov, point process—is discussed briefly here.

8.1 Marked Processes

Marked PPPs model problems in which the points are accompanied by a

“mark.” The mark is commonly called a “feature vector” in applications. The mark
can be discrete, continuous, or discrete-continuous. One of the oldest examples is in
forestry, where a point represents the location of a tree, and its mark comprises
one or more of the following kinds of information: its species and health (dis-
crete/categorical), circumference at a fixed height above the ground (continuous),
or both. Other examples are easily conceived.
Realizations of a marked PPP are in principle almost as easy to generate as those
of a PPP:

• Use the two-step procedure of Section 2.3 to generate a realization

ξ = (m, {x1 , . . . , xm })

of a PPP with intensity λ(s) on the space S. Given m, the points x j are i.i.d.
samples of the random variable X whose pdf is p X (x) = λ(x)/ S λ(s) ds .
• The mark is a random variable U on a mark space U with conditional pdf
pU |X (u | x). Given the PPP realization ξ , generate the marks

{u 1 , . . . , u m } ⊂ U

as independent realizations of pU |X ( · | x j ) , j = 1, . . . , m .
• Pair the points with their corresponding marks:

ξ ≡ (m, {(x1 , u 1 ) , . . . , (xm , u m )}) . (8.1)

The realization of the marked PPP is ξ .

8.1 Marked Processes 205

The mark space U can be very general, but it is typically either a discrete set or
a subset of the Euclidean space Rκ . An example is the MMIF of Section 6.2.2 and
Appendix E. In this application, the targets are the points of a PPP, the measurements
are the marks, and the measurement likelihood function is the conditional mark pdf.
In the simplest case, the marks are independent of ξ , so that

pU | X (u | x j ) ≡ pU (u) .

The marks in this case are independent of the locations of the points in ξ as well as
the number m. This kind of marked PPP is called a compound PPP by Snyder, and
its theory is well developed in [118, Chapter 3] and [119, Chapter 4].
Several compound PPPs are used earlier in the book, but without comment. The
“Coloring Theorem” of Example 2.7 in Chapter 2 is an example of a marked PPP.
The marks are the colors assigned to the points. In Chapter 3, marks are intro-
duced as part of the complete data; that is, the marks are the missing data of the
EM method. A specific example is the complete data (3.12) for estimating super-
posed PPPs, which is a realization of a compound PPP in which the mark space is
U = {1, . . . , L}.
Yet another example, one that could have been discussed as a marked PPP but
was not, is the intensity filter of Chapter 6 when the target PPP is split during the
prediction step into the detected and undetected target PPPs. In this case, the detec-
tion process is equivalent to a marking procedure with marks U = {0, 1}, where
zero/one denotes target non-detection/detection.

8.1.1 Product Space and Marking Theorem

Marked PPPs are intuitive models with a rich structure that enables them to model
phenomena in diverse applications. The mark structure might make them seem
fundamentally different from ordinary PPPs, but this is not so. Marked PPPs are
equivalent to PPPs on the Cartesian product of the space S on which the PPP Ξ is
defined and the mark space U with a joint intensity function μ given by

μ(x, u) = pU |X (u | x) λ(x) . (8.2)

This result is important as well as insightful. The similarity of (8.2) to conditional

Bayesian factorization is self evident.
First observe that the realization ξ of (8.1) is an element of the PPP event space
E(S × U), so it has the form of a realization of a PPP on S × U. Since the two-step
procedure is the definition of a PPP, it is only necessary to verify that the intensity
function of the point process with realizations ξ takes the form (8.2). In light of the
result of Section 2.6.1, it is enough to show that for functions f (x, u) defined on
the Cartesian product S × U the characteristic function of the random sum
206 8 A Profusion of Point Processes

F(ξ ) ≡ F (m, (x1 , u 1 ) , . . . , (xm , u m ))

m
= f (x j , u j )
j=1

is in the form given by Campbell’s Theorem, namely,

% (
−F − f (x, u)
E e = exp e − 1 pU |X (u | x) λ(x) dxdu
%S U (
− f (x, u)
= exp e − 1 μ(x, u) dxdu . (8.3)
S ×U

The random variables f (X j , U j ) are independent given m, so the expectation of

F(ξ ) with respect to the marks, conditioned on the points x1 , . . . , xm , is
m
E e−F | x1 , . . . , xm = EU1 ···Um |X 1 ···X m e− j=1 f (x j , U j )
⎡ ⎤

m
= EU1 ···Um |X 1 ···X m ⎣ e− f (x j , U j ) ⎦
j=1

m
= EU j |X j e− f (x j , U j )
j=1
m
= e− f (x j , u j ) pU |X (u j | x j ) du j
j=1 U

− mj=1 g(x j )
= e , (8.4)

where, for any x ∈ S,

g(x) = − log e− f (x, u) pU |X (u | x) du . (8.5)
U

Applying Campbell’s Theorem gives the expectation of the expression (8.4) with
respect to the PPP ξ with intensity function λ(x):
M
E e−F = E Ξ e− j=1 g(X j )
% (
= exp e−g(x) − 1 λ(x) dx .
S

Substituting (8.5) gives

8.1 Marked Processes 207

% (
E e−F = exp e− f (x, u) pU |X (u | x) du − 1 λ(x) dx
%S U (
= exp e− f (x, u) − 1 pU |X (u | x) du λ(x) dx .
S U

The last expression is equivalent to (8.3).

This result applies with only minor changes when the mark space is discrete.

8.1.2 Filtered Processes

A filtered Poisson process is the output of a function that is similar to (2.30). Let
h(x, y ; u) be a real valued function defined for all x, y ∈ S and u ∈ U. Given
the realization ξ ≡ (m, {(x1 , u 1 ), . . . , (xm , u m )}) of the marked PPP, define the
random sum
%
m 0, if m = 0,
F(y) = (8.6)
j=1 h(y, x j ; u j ), if m ≥ 1,

where the function h(y, x; u) is the response at y to a point at x with mark

u. If the marked PPP is a compound PPP, then (8.6) is called a filtered Poisson
process.
Filtered Poisson processes are often discussed as stochastic processes when they
are defined in the time domain, so that S ≡ R1 and h is the impulse response
function of some specified linear system. If h(y, x; u) = 0 for y < x, then
h is said to be causal. The simplest example in this case is shot noise, for which
h(y, x ; u) ≡ h(y, x). Shot noise is an model for white noise. Shot noise is
extensively studied in [101] and in [18, Chapter 7]. These discussions are classical
applications of PPPs to signal processing.
Interesting topics such as the central limit theorem for filtered Poisson processes
and Poisson driven Markov processes are discussed in [118, Chapter 4] and [119,
Chapter 5]. These excellent discussions are widely available, so the topic is not
pursued further here.

8.1.3 FIM for Unbiased Estimators

A marked PPP is equivalent to a PPP in the Cartesian product space S × U, so the
FIM (4.21) for unbiased estimators applies unchanged to general marked PPPs. The
data comprise a realization of the PPP with intensity function λ(s ; θ ) together with
the corresponding marks.
For compound PPPs with continuous marks in the space U with pdf

pU (u) ≡ pU (u ; θ ) ,
208 8 A Profusion of Point Processes

the FIM for unbiased estimators of θ is

1
F(θ ) = (8.7)
U S λ(s ; θ ) pU (u; θ )
T
∇θ {λ(s ; θ ) pU (u ; θ )} ∇θ {λ(s ; θ ) pU (u ; θ )} ds du .

For discrete marks in the space U ≡ {u 1 , u 2 , . . .}, the FIM is

∞

1
F(θ ) = (8.8)
S λ(s ; θ ) pU (u k ; θ )
k=1
T
∇θ {λ(s ; θ ) pU (u k ; θ )} ∇θ {λ(s ; θ ) pU (u k ; θ )} ds .

Further discussion of this case is also given in [119, p. 213].

8.2 Hard Core Processes

A hard core process is a point process in which no two points in a realization are
closer together than a specified minimum distance, say 2h. The points of such a
process are the centers of non-overlapping m-dimensional spheres with radius h.
The (solid) spheres are the “hard cores” of the process. The points of a hard core
process are not independent, since positioning one point constrains the placement of
the others. Hard core processes often differ significantly in the spatial dependence
of the points, with the details depending on the application.
Hard core models correspond to hard-packing problems and are important in
physics, material science, and coding theory. A highly readable account by Diaco-
nis [24, Section 4] sketches the interesting connections between high density close
packing problems and the very important problem (in physics) of solid-liquid-gas
phase transitions. The references therein are a good entrée to the subject. The larger
purpose of [24] is to review MCMC (Markov chain Monte Carlo) methods, but one
of his examples generates realizations of a high density hard core process using a
Metropolis algorithm.
Hard core processes are important in less physics-based kinds of applications too.
For example, the sensors in spatially distributed fields are often deployed to avoid
overlapping detection coverage.
Matérn [77] gives two procedures for generating realizations a hard core process
on Rm by thinning a homogeneous PPP [77, 78]. Both yield homogeneous point
processes, but the second method accommodates higher packing density than the
first. These methods are given in the next two examples.
Example 8.1 Matérn Method II. Let ξ = (, {x1 , . . . , x }) denote a realization of
a homogeneous PPP with intensity λ0 . The points {x j } of ξ are marked by samples
{u j } of a uniform random variable on [0, 1]. The marks are independent of each
other and of the location of the points x j . The point x j with mark u j is retained
8.2 Hard Core Processes 209

if the sphere of radius h centered at x j contains no points with marks smaller than
u j . Points are deleted from the realization only after all points are determined to
be either thinned or retained. This kind of thinning is not Bernoulli independent
thinning.
The intensity function of the resulting Matérn hard core process is

1
1 − e−λ0 cm h ,
m
λMatérn(II) = m
(8.9)
cm h

where cm is the volume of the unit radius sphere in Rm (see (7.9)). To see this, follow
the method of [123]: note that the process that deletes points x with mark u < t
is a thinned PPP whose intensity is λ0 t. Hence, r (t) = exp (−λ0 cm h m t) is the
probability that the thinned PPP has no points in the sphere of radius t. Equivalently,
r (t) is the probability that a point at x with mark t is retained. Hence,

1 − e−λ0 cm h
1 m

p = r (t) dt =
0 λ0 cm h m

is the probability that the point x is retained. The intensity λMatérn(II) is the product
of p and the PPP intensity λ0 .
The intensity of the Matérn process increases with increasing initial intensity λ0 .
From (8.9), the limiting intensity is

1
λMaxMatérn(II) = lim λMatérn = .
λ0 →∞ cm h m

The limiting intensity is one point per unit sphere. For m = 2 and h = 1,

1
λMaxMatérn(II) = ≈ 0.318 .
π

For comparison, regular hexagonal packing gives the maximum possible√ intensity
of one point per hexagon (inscribed in the unit circle), or λHex = 2 3/9 ≈ 0.385
with h = 1.

Example 8.2 Matérn Method I. This method starts with the same realization ξ as in
the previous example, but more aggressively thins the points of the PPP. Here, pairs
of points in the realization ξ that are separated by a distance 2h or less from each
other are removed from ξ . Pairs of points are removed only after first identifying all
pairs to be removed. The intensity function of the resulting hard core process is

λMatérn(I) = λ0 e−λ0 cm h .
m
(8.10)

To see this it is only necessary to see that the probability that a given point is retained
is e−λ0 cm h , and to multiply this probability by λ0 .
m
210 8 A Profusion of Point Processes

Variable size hard core models are defined by specifying a mark that denotes, say,
the radius of the hard core. Soft core models are defined in [123, p. 163]. Hard core
processes are very difficult to analyze theoretically. In most applications, numerical
modeling and simulation is often necessary in practice. As pointed out by Diaconis
[24], MCMC methods play an important role in the modern developments of these
problems.

8.3 Cluster Processes

A cluster process ΞC is a finite point process generated from realizations of a parent

process and a family of daughter processes. The parent process is denoted by Ξ ,
and the daughter processes are denoted by Ξ (x) because there is one process for
each point x ∈ S. These point processes are finite point processes and none, in
general, are PPPs. The set of points in a realization of the daughter process is called
a cluster. It is assumed that the clusters corresponding to realizations of the daughter
processes Ξ (x) and Ξ (y) are distinct with probability one if x = y.
Let ξ = (n, {x1 , . . . , xn }) denote a realization of the parent process Ξ . Every
point x j in ξ is replaced by a realization, denoted by ξ( j), of the daughter process
Ξ (x j ). The union of the clusters

ξC = ∪nj=1 ξ( j)

is a realization of the cluster process ΞC . The parent points x j are not in the realiza-
tion ξC . (In some applications, parent points are retained [103].) Cluster processes
are very general, and it is desirable to specialize them further.

8.3.1 Poisson Cluster Processes

Poisson cluster processes are cluster processes in which the parent process is a non-
homogeneous PPP. The family of daughter processes remain general finite point
processes.
The Neyman-Scott process is a Poisson cluster process in which the daughter
processes take a special form; realizations are generated via the following proce-
dure. The first step yields a realization ξ = (n, {x1 , . . . , xn }) of the parent PPP
Ξ whose intensity function is λ(x), x ∈ S. The second step draws i.i.d. samples
k j , j = 1, . . . , n, from a discrete random variable K on the nonnegative integers
with specified probabilities

p K (k) = Pr[K = k] ≡ pk , k = 0, 1, 2, . . . . (8.11)

The discrete variable K is not in general Poisson distributed, and it is independent

of the parent process Ξ . The parent point x j produces k j daughters. The output
8.3 Cluster Processes 211

(n, {(x1 , k1 ), . . . , (xn , kn )})

is equivalent to that of a marked PPP. Let h(x), x ∈ S, be a specified pdf. The final
step draws k j i.i.d. samples from the shifted pdf h(x − x j ). Denote these samples by
xi j , i = 1, . . . , k j . The i-th child of x j is xi j . Let k = nj=1 k j . The realization
of the Neyman-Scott cluster process is

ξC = k, {x j + xi j : i = 1, . . . , k j and j = 1, . . . , n} ∈ E(S) . (8.12)

The defining parameters of the Neyman-Scott cluster process are the PPP parent
intensity function λ(x), the distribution of the number K , and the daughter pdf h(x).
The probability generating functional of the general Neyman-Scott process is given
in the next section.
A special case of the Neyman-Scott process is the Matérn cluster process. In this
case, the clusters are homogeneous PPPs with intensity function λ0 on the sphere of
radius R. The discrete pdf K in the simulation is in this case the Poisson distribution
with parameter cκ λ0 , where cκ is the volume of the unit sphere in Rκ . The points
xi j of the simulation are i.i.d. and uniformly distributed in the sphere of radius R.
The defining parameters of the Matérn cluster process are the PPP intensity function
λ(x), the homogeneous PPP intensity λ0 , and the radius R.

8.3.2 Neyman-Scott Processes

The characteristic functional of the general Neyman-Scott process Ξ on the

Euclidean space S = Rκ is evaluated by a straightforward calculation. Let Ξparent
denote the parent PPP, and let λ(x), x ∈ S, denote its intensity function. Let

Ξparent = (N , X |N ) ,

where N is the random number of points and

X |N = {X 1 , . . . , X N }

are the conditionally i.i.d. points of Ξparent . Denote the number of daughter points
generated by the parent point X j by the discrete random variable K j . By definition
of the Neyman-Scott process, the variables K 1 , . . . , K N are i.i.d. with pdf given by
(8.11). Let η(t) denote the probability generating function of the discrete random
variable K :
∞

η(t) = p K (k) t k . (8.13)

k=0

The daughter points of the parent point X j are the random variables
212 8 A Profusion of Point Processes

X1 j , . . . , X K j , j .

By definition of the Neyman-Scott process, they are i.i.d. with conditional pdf
)
p X i j |X j x ) X j = x j = h(x − x j ) , i = 1, . . . , K j , (8.14)

where the cluster pdf h(x) is given. The characteristic functional of Ξ evaluated for
a given function f is, by definition,
⎡ ⎤
Kj

N
G Ξ ( f ) = EΞ ⎣ f (X i j )⎦ .
j=1 i=1

The random variables X j of the parent points do not appear in the product because
they are not points of the output process. The expectation is evaluated in the nested
form:
⎡ ⎡ ⎡ ⎤⎤ ⎤
Kj

N
G Ξ ( f ) = E Ξparent ⎣ E K j |X j ⎣ E X 1 j X 2 j ···X K j , j | K j X j ⎣ f (X i j )⎦ ⎦ ⎦ .
j=1 i=1
(8.15)

Because the daughter points are conditionally i.i.d., the innermost expectation fac-
tors into the product of expectations:
⎡ ⎤
Kj Kj

E X 1 j X 2 j ···X K j , j | K j X j ⎣ f (X i j )⎦ = E X i j | X j f (X i j )
i=1 i=1

Kj Kj

= f (x) h(x − X j ) dx = f (x + X j ) h(x) dx
i=1 S i=1 S
K j
= f (x + X j ) h(x) dx . (8.16)
S

The expectation over K j |X j is expressed in terms of the probability generating

function η( · ) as

∞

k j
E K j |X j [ · ] = f (x + X j ) h(x) dx pk j
k j =0 S

= η f (x + X j ) h(x) dx . (8.17)
S
8.4 Cox (Doubly Stochastic) Processes 213

The outermost expectation over Ξparent is simply

⎡ ⎤

N
G Ξ ( f ) = E Ξparent ⎣ η f (x + X j ) h(x) dx ⎦ .
j=1 S

Using the generating functional of the parent PPP (see Eqn. (2.53)) gives the char-
acteristic functional of the Neyman-Scott cluster process as
% (
G Ξ ( f ) = exp η f (x + s) h(x) dx − 1 λ(s) ds . (8.18)
S S

This expression is used in the next section.

8.4 Cox (Doubly Stochastic) Processes

A Cox process is a PPP in which the intensity function is randomly selected from a
well defined space of possible intensities, say Λ. Thus, realizations of a Cox process
are the result of sampling first from the space Λ to obtain the intensity function λ(x),
and then finding a realization of the PPP with intensity λ(x) via the two step pro-
cedure of Section 2.3. The idea is that the intensity space Λ characterizes possible
environments in which a PPP might be a good model—provided the right intensity
is used.
To set ideas, consider a homogeneous PPP whose intensity λ is selected so that
the mean number of points μ = λR in the window R is selected from an exponen-
tial pdf:

1 μ
p M (μ) = exp − , (8.19)
μ0 μ0

where μ0 > 0 is a specified (Bayesian) parameter. The pdf of the number, n, of

points in the bounded window R is then
∞
p N (n) = p N |M (n | μ) p M (μ) dμ
0 ∞
1 −μ n 1 −μ/μ0
= e μ e dμ
0 n! μ0
μn0
= . (8.20)
(1 + μ0 )n+1

The one dimensional version of this example is used in [83] to model neural spike
trains. Carrying the general notion over to PPPs gives the Cox process.
214 8 A Profusion of Point Processes

A natural way to specify the random intensity function of a Cox process is to

use realizations of a nonhomogeneous PPP. One class of Cox process is equiv-
alent to a Neyman-Scott cluster process, as is seen in the first subsection below.
The subsections following it discuss two stochastic processes that appear in various
applications.

8.4.1 Equivalent Neyman-Scott Process

An interesting example of a Cox process is one in which the intensity function on
S ⊂ Rκ is the random sum:

N

Λ(x ; N , X | N ) = μ h x − Xj , (8.21)
j=1

where μ > 0 is a known scale constant, h(x) is a specified pdf on S, and where
the points X = {X 1 , . . . , X N } are the random points of a realization of a PPP
with intensity function λ(x) on S. Bartlett (1964) showed that this Cox process is a
Neyman-Scott cluster process with a Poisson distributed number of daughter points,
i.e., the random variable K in (8.11) is Poisson distributed.
To see Bartlett’s result, evaluate the characteristic functional of the Cox process
Ξ . Let {U1 , . . . , U N } denote N points of a realization of Ξ . The required expecta-
tion is equivalent to the nested expectations
⎡ ⎡ ⎤⎤

N
G Ξ ( f ) = E Λ ⎣ E Ξ |Λ ⎣ f (U j )⎦ ⎦ .
j=1

The inner expectation is the characteristic functional of the PPP with intensity
Λ(x ; N , X | N ); explicitly,
% (
E Ξ |Λ [ · ] = exp ( f (u) − 1) Λ(x ; N , X | N ) du
⎧S ⎫
⎨

N
⎬
= exp ( f (u) − 1) μ h u − X j du
⎩ S ⎭
j=1

N
= Ψ (X j ) ,
j=1

where
% (
Ψ (X j ) = exp μ f (u + X j ) h(u) du − 1 .
S
8.4 Cox (Doubly Stochastic) Processes 215

Since the points X j are the points of a PPP with intensity λ(x), the characteristic
functional of Ξ is
⎡ ⎤

N
G Ξ ( f ) = EΛ ⎣ Ψ (X j )⎦
j=1
% (
= exp [Ψ (x) − 1] λ(x) dx . (8.22)
S

Comparing this to the Neyman-Scott characteristic functional of f shows that

η (s) = eμ (s − 1) , (8.23)

where

s = f (u + x) h(u) du . (8.24)
S

The right hand side of (8.23) is the characteristic function of the discrete Poisson
distribution with mean μ. Therefore, the Cox process has a Poisson distributed
number of points that are distributed around parent points with pdf h(u), u ∈ S.
Since probability generating functionals characterize finite orderly point processes
([16, p. 625]), the Cox process with random intensity function given by (8.21) is a
Neyman-Scott process with a Poisson distributed number of points. (General condi-
tions under which a Poisson cluster process is a Cox process are not known. Further
details are given in [16, pp. 663–664].)

8.4.2 Intensity Function as Solution of an SDE

Cox processes are useful in applications in which the parameter vector of the inten-
sity function is the solution of a stochastic differential equation (SDE). An example
is the intensity function (4.38) when the mean μt of the Gaussian component is time
dependent and satisfies the Ito diffusion equation [90, p. 104]

dμt = b(μt ) dt + σ (μt ) dBt , t ≥ 0, μ0 ≡ a0 , (8.25)

where Bt is Brownian motion of appropriate dimension, and the functions b and σ

satisfy certain regularity conditions. PPPs in time and space are also very interesting
in tracking applications. See [119, Chapter 7], as well as Snyder’s charmingly titled
paper [117] and the extended paper [36]. These topics lead to interesting tracking
filters, but they are outside the scope of this book.
Temporal (one dimensional) Cox processes are a special case of self-
exciting point processes [119, p. 348]. Self-exciting point processes are in turn
216 8 A Profusion of Point Processes

closely related to renewal processes (see [119, Chapters 6 and 7]). These processes
are well discussed elsewhere.

8.4.3 Markov Modulated Poisson Processes

A special kind of Cox process is the Markov modulated Poisson process (MMPP).
It is defined in conjunction with a continuous time Markov chain A on a count-
able number of states. Denote the states by C ≡ {c(1), c(2), . . .}, the transition
probability function by
)
Ac(i), c( j) (s, t) ≡ Pr A(t) = c( j) ) A(s) = c(i) for all s ≤ t , (8.26)

and the initial state probability mass function by π( · ) on C. The initial time is
t0 ≡ t (0). Homogeneous PPPs are defined for each state c( j) with intensity λc( j) .
(Nonhomogeneous PPPs can also be used.)
A realization of a MMPP on the interval [t (0), T ] is obtained from a two stage
procedure. The first stage generates a realization of A on the time interval [t (0), T ]
(see [100]). The initial state c(t (0)) at time t (0) is a realization of π( · ). Including
t (0), the switching times and states of the Markov chain are {t (0), t (1), . . . , t ()}
and {c(t (0)), c(t (1)), c(t (2)), . . . , c(t ())}, respectively. Let t ( + 1) = T . The
second stage generates a realization of the PPP with intensity λc(t ( j)) on the time
interval [t ( j), t ( j + 1)), j = 0, 1, . . . , . The concatenation of the realizations
of these Markov switched PPPs is a realization of the MMPP.
MMPPs are stochastic processes often used to model bursty phenomena in vari-
ous applications, especially in telecommunications. Other applications include load
modeling that changes abruptly depending on the outcome of certain events, say, an
alternating hypothesis SPRT (sequential probability ratio test) that controls arrival
rates in a queue. The superposition of MMPPs is an MMPP, a fact that facilitates
applications. The transition function of the superposed MMPP in terms of the com-
ponent MMPPs can be found in [95].

8.5 Gibbs Point Processes

The Gibbs distribution is defined on sets, or configurations, of n points in Rm , where

n is given. All that is needed to make the distribution into the Gibbs point process is
method for generating n randomly. Physicists refer to the case with random n as
the grand canonical ensemble. Another name for Gibbs processes is Markov point
processes. Many choices for the distribution of n are possible.
For current purposes, the Gibbs pdf is written

1
p(x1 , . . . , xn | n, λ) = exp (−E(x1 , . . . , xn )) , (8.27)
Z n n!
8.5 Gibbs Point Processes 217

where E(·) is a specified “energy” function of the points, or configuration, and Z n

is the “configurational” partition function that normalizes (8.27) so that it is a pdf.
If E(x1 , . . . , xn ) = λ n, the conditional density of the points is that of a homoge-
neous PPP, although the number of points need not be Poisson distributed.
The standard choice for the energy function is a sum over “interaction poten-
tials.” The potentials (i.e., the summands) may involve on any number of particles,
depending on the application. Common choices are one and two particle potentials,
nearest neighbor potentials, and clique potentials. Further discussion of the many
interesting properties of Gibbs point processes is outside the scope of this paper.
A good starting point for further reading is [123, Chapter 5.5].
Chapter 9
The Cutting Room Floor

Finished products are for decadent minds.

Isaac Asimov, Second Foundation, 1953

Abstract Several topics of interest not discussed elsewhere in this book are
mentioned here.

The main properties of PPPs that seem (to the author) insightful and useful in appli-
cations are reviewed in this book. They do not appear to be gathered into one place
elsewhere. The applications to medical imaging and tomography, multiple target
tracking, and distributed sensor detection are intended to provide insight into meth-
ods and models of PPPs and related point processes, and to serve as an entry point
to exciting new applications of PPPs that are of active research interest.
Many interesting topics are naturally omitted from the book. For reasons already
mentioned in the introductory Chapter 1, nearly all discussion of one dimensional
point processes is omitted. Other omissions are detailed below in a chapter by chap-
ter review. A brief section on possible directions for further work follows.

Keywords History of PPPs · PET · SPECT · Posterior Cramer-Rao bound

(PCRB) · Hammersley-Clifford-Robbins Bound · PCRB for multitarget
tracking · Monotone functions of random graphs · Threshold property · Palm inten-
sity · Papangelou intensity · Coupling from the past

9.1 Further Topics

Chapter 1 is missing a section on the history of PPPs. The Poisson distribution

dates to the 1840s, but Poisson point processes seem not to have really begun until
probability was formulated on a measure theoretic foundation in the 1930s. The
name “point process” apparently was first coined by Palm in his 1943 paper [92].
The history of point processes and PPPs is quite short when compared to mainstream
mathematical topics. Appreciating that much of the work on PPPs is relatively recent
may encourage readers to seek new applications and methods. Omitting the history
may not be egregious, but it is unfortunate.

R.L. Streit, Poisson Point Processes, DOI 10.1007/978-1-4419-6923-1_9, 219

C Springer Science+Business Media, LLC 2010
220 9 The Cutting Room Floor

Chapter 2 on the basics of PPPs omits many topics simply because they are
not used in the applications presented later in the book. Many of these topics have
diverse applications in one dimensional processes, and they are also interesting
in themselves. The connections with Poisson random measures is not mentioned.
More generally, the connections between stochastic processes and point processes
are only briefly mentioned. Markov point processes are omitted. This is especially
unfortunate because it means that physically important point processes with spatial
point to point correlations or interactions are treated only via example. The Matérn
hard core processes of Section 8.2 are excellent examples of Markov point pro-
cesses.
Chapter 3 on estimation is heavily weighted toward superposition because of
the needs of applications. Superposition leads rather naturally to an emphasis on
the method of EM. The dependence of the convergence rate of the EM method on
the OIM is not discussed. Since the convergence rate of EM based algorithms is
ultimately only linear, other algorithms merit consideration. Some methods use the
OIM to accelerate EM algorithm convergence; others use hybrid techniques that
switch from EM to other more rapidly convergent algorithms. MCMC methods are
also omitted from the discussion.
Chapter 4 discussion of the CRB is fairly complete for the purposes of PPPs. The
Bayesian CRB, or posterior CRB (PCRB), is not treated even though it is useful
in some applications, e.g., tracking. The notion of a lower bound for parameters
that are inherently discrete is available, but not discussed here, even though such
bounds are potentially useful in applications. The Hammersley-Chapman-Robbins
(HCR) bound holds for discrete parameters [14, 45]. It is a special case of the earlier
Barankin bound [5, 134]. When the parameter is continuously differentiable, the
HCR bound becomes the CRB.
Chapter 5 on PET, SPECT, and transmission tomography only scratches the sur-
face. Absent for the chapter are details about specific technical issues involved in
practical applications of the Shepp-Vardi and related algorithms. These details are
important to specialists and not without interest to others. Overcoming these issues
often involves exploiting the enormous flexibility of the algorithm. Examples that
illustrate to current state of the art would entice readers to learn more about these
topics, without delving too deeply into the intricacies of the methods. Happily, the
CRB is already available for PET and methods are available for computing the CRB
for subsets of the full PET image [49].
Chapter 6 on multitarget tracking applications of PPPs is an active area of
research. It is reasonable to ask about the CRB of the intensity filter, since the
intensity is the defining parameter. In this case, however, the target motion model is a
Bayesian prior and the CRBs of Chapter 4 must be extended to include this case. The
PCRB for the intensity filter is not available, but it is important for understanding
the quality of the intensity filter estimate. Computing it will surely be a complicated
affair because the intensity function is the small cell limit of a sequence of step
functions in the single target state space. Presumably, after first finding the PCRB
for a finite number of cells, taking the small cell limit of a suitably normalized
version yields a function on the product space S × S.
9.2 Possible Trends 221

Chapter 7 on distributed sensor detection does not mention specific telecommu-

nication applications. This is a rich and fertile area that deserves more attention than
it is given. The connections between k-connectivity in geometric random graphs and
percolation in lattices of two or more dimensions is omitted. The so-called threshold
behavior of monotone properties of geometric random graphs is not discussed. The
critical probability, pc , beyond which the probability that a given vertex is part of
an infinitely large clump, is an early example of a threshold.
Chapter 8 on point processes that are not PPPs is too brief. Concepts from PPPs
that generalize to finite point processes are not explored fully. Boolean models are
not discussed, although they are mentioned in Chapter 1. Palm and Papangelou
intensities are not discussed. Interesting processes are not mentioned, e.g., Strauss
hard core point processes and their simulation via MCMC. Estimation methods for
non-PPPs are an especially important topic, but they too are omitted. In particular,
Ripley’s K -function and the pair-correlation function are not discussed.
Chapter 9 is mercifully short. In Italian, it is a misura d’uomo—just the right
size—for a work of nonfiction.
Appendices. An appendix on measure theory would be helpful in bridging the
gap between the measure theory free methods chosen for this book and the mea-
sure theoretic methods of much of the literature. Such an appendix would make
clear that measure theory is not an arcane mathematical abstraction of no benefit
in applications, but rather the opposite: It explains why the “test sets” that appear
occasionally in the book with little explanation are important; it provides a natural
model for certain kinds of generalized functions such as the Dirac delta function;
and it clarifies important material presented in Chapter 2, e.g., the proof that a PPP
after Bernoulli thinning is still a PPP.
An appendix on the many uses of PPPs in classical physics would be worthwhile.
The brief mention of Olber’s paradox scratches the surface, but it is an example of a
common use of PPPs as a model of spatially distributed ambient physical phenom-
ena. For instance, in underwater acoustics, active sonar reverberation processes from
the ocean surface, bottom, and volume are modeled as PPPs and then superposed
[33]. Another example is [1], where the heavy tails of the K -distribution are fit to
active sonar reverberation data. The K -distribution arises from a negative binomial
distribution on the number of homogeneously distributed and coherently summed
scatterers, and then letting the mean number go to infinity [53]. Except for the use
of the negative binomial distribution, the point process generated in this manner
seems akin to the binomial point process (BPP) mentioned in Section 1.3.

9.2 Possible Trends

Some argue that theoretical analysis of point processes is of little value in real world
applications because only extensive high fidelity simulations can give quantitative
understanding of the many variables of practical interest. This argument is easily
refuted in applications such as PET imaging, but more difficult to refute fully in
222 9 The Cutting Room Floor

applications such as distributed sensing that use point processes to gain insight
rather than as an exact model.
The debate between theory and simulation will undoubtedly continue for years,
but it is a healthy debate reminiscent of the debate between theoretical and exper-
imental physicists. Practical problems will undoubtedly challenge theory, and the
explanatory power of theoretical methods will grow. Because of its extraordi-
nary flexibility, MCMC methods will become a progressively more important tool
as demands on model sophistication and fidelity increase. The idea of perfect
simulation via the “coupling from the past” (CFTP) technique of Propp and Wilson
[96] will undoubtedly enrich applications. These methods will blur the distinction
between theory and simulation.
Geometrically distributed sensors and social networks (e.g., the Internet) are
both graphs whose vertices (points, sensors, agents, etc.) are connected by edges.
Edges represent connectivity that arises from geometric proximity, or from some
non-physical social communication link. The empirical evidence shows that the
vertex connectivity of geometric random graphs is very different from that of social
network graphs—complex social networks typically have a power law distribution
on connectivity, whereas geometric random graphs do not. Detecting subgraphs that
are proximate both geometrically and socially is an important problem. Adding
the element of time makes the problems dynamic and even more realistic. Inte-
gral geometry and its methods, especially Boolean models, may eventually be seen
as a key component of a mathematical foundation of detection in these kinds of
problems. It will be interesting to see what manner of contribution MCMC methods
make to the subject.
Appendix A
Expectation-Maximization (EM) Method

Expectation-Maximization (EM) is a method for deriving algorithms to maximize

likelihood functions—using it with different likelihood functions results in different
ML (or MAP, as the case may be) estimators. It is, on first encounter, a mysterious
method with strong appeal to those who love to see elaborate mathematical con-
structions telescope into simple forms.
EM is reviewed in this appendix. The distinctions between the named random
variables—incomplete, missing, and complete—are discussed, together with the
auxiliary function called Q, and the E- and M-steps. It is shown that increasing the
auxiliary function increases the likelihood function. This means that all EM based
algorithms monotonically increase the data likelihood function p Z (z ; θ ). Iterative
maximization is a strictly numerical analysis method that interprets the auxiliary
function Q as a two parameter family of curves whose envelop is the data likelihood
function.
EM works amazingly well in some applications and is seemingly useless in
others. It is often a hand-in-glove fit for likelihood functions that involve sums
(integrals), for in these cases the natural missing data are the summation indices
(variables of integration) and they simplify the likelihood function to a product of
the summands (integrands). Such products are often much simpler to manipulate.
However, other missing data are possible, and different choices of missing data can
lead to different EM convergence rates.

A.1 Formulation
The observed data z are called the incomplete data. As is seen shortly, the name
makes sense in the context of the EM method. The measurements z are a realization
of the random variable Z , and Z takes values in the space Z. The pdf of the data
z is specified parametrically by p Z (z ; θ ), where θ ∈ Θ is an unknown parameter
vector. Here, Θ is the set of all valid parameter values.
The likelihood function of θ is the pdf p Z (z ; θ ) thought of as a function of θ for
a given z. It is assumed that the likelihood of the data for any θ ∈ Θ is finite. It is
also assumed that the likelihood function of θ is uniformly bounded above, that is,
p Z (z ; θ ) ≤ B < ∞ for all θ ∈ Θ. The latter assumption is important.

R.L. Streit, Poisson Point Processes, DOI 10.1007/978-1-4419-6923-1, 223

C Springer Science+Business Media, LLC 2010
224 Appendix A: Expectation-Maximization (EM) Method

The maximum likelihood estimate of θ is

θ̂ M L = arg max p Z (z ; θ ) .
θ∈Θ

For pdfs differentiable with respect to θ , the natural way to compute θ̂ M L is solve
the so-called necessary conditions

∇θ p Z (z ; θ ) = 0

using, say, a safeguarded Newton-Raphson algorithm. This approach can encounter

difficulties in some nonlinear problems, especially when the dimension of θ is large.
The EM method is an alternative way to compute θ̂ M L . Let K be a random vari-
able whose realizations k occur in the set K, that is, k ∈ K. For ease of exposition,
the discussion in this appendix assumes that K is a continuous variable. If K is
discrete, the discussion below is modified merely by replacing integrals over K by
sums over K.
Now, K is called missing data in the sense of EM if

p Z (z ; θ ) = p Z K (z, k ; θ ) dk , (A.1)
K

where p Z K (z, k ; θ ) is the joint pdf of Z and K . The pair of variables (Z , K )

are called the complete data. In words, (A.1) says that the pdf of the data Z is the
marginal of the complete data pdf over the missing data K . The user of the EM
method is at liberty to choose K and the joint pdf p Z K ( · ) as fancy or need dictates,
as long as the (A.1) holds. This is crucial. In many applications, however, the choice
of missing data is very natural.
The pdf of K conditioned on Z is, by the definition of conditioning,

p Z K (z, k ; θ ) p Z K (z, k ; θ )
p K |X (k | x ; θ ) = = . (A.2)
p Z (z ; θ ) K p Z K (z, k ; θ ) dk

Let n ≥ 0 denote the EM iteration index. Let θ (0) ∈ Θ be an initial (valid) value of
the parameter.

A.1.1 E-step

For n ≥ 1, the EM auxiliary function is defined by the conditional expectation:

Q θ ; θ (n−1) = E K |X ;θ (n−1) log p Z K (z, k ; θ )
)
≡ (log p Z K (z, k ; θ )) p K |Z k ) z ; θ (n−1) dk . (A.3)
K
A.1 Formulation 225

The sole purpose of the E-step is to evaluate Q θ ; θ (n−1) . In practice, this is very
often a purely symbolic step, that is, the expectation (A.3) is manipulated so that it
takes a simple analytic form. When manipulation does not yield an analytic form, it
is necessary to replace the expectation with a suitably chosen discrete sum (defined,
e.g., via Monte Carlo integration).

A.1.2 M-step

The EM updated parameter is

θ (n) = arg max Q θ ; θ (n−1) . (A.4)
θ∈Θ

The maximum in (A.4) can be computed by any available method. It is sometimes

thought that it is necessary to solve the maximization step explicitly, but this is
emphatically not so—a strictly numerical method is quite acceptable.
It is unnecessary to solve for the maximum in (A.4). Instead it is only necessary
to find a value θ (n) such that

Q θ (n) ; θ (n−1) > Q θ (n−1) ; θ (n−1) . (A.5)

If the update θ (n) is chosen in this way, the method is called the generalized EM
(GEM) method. The GEM method is very useful in many problems. It is the starting
point of other closely related EM-based methods. A prominent example is the SAGE
(Space Alternating Generalized EM) algorithm [35].

A.1.3 Convergence
Under mild conditions, convergence is guaranteed to a critical point of the likeli-
hood function, that is, θ (n) → θ ∗ such that

∇θ p Z (z | θ ) θ = θ ∗ = 0 .

Experience shows that in practice, θ ∗ is almost certainly a local maximum and not
a saddle point, so that θ ∗ = θ̂ M L .
One of the mild conditions that cannot be overlooked is the assumption that
the likelihood function of θ is uniformly bounded above. If it is not, then the EM
iteration will converge only if the initial value of θ is by chance in the domain of
attraction of a point of local maximum likelihood. Otherwise, it will diverge, mean-
ing that the likelihood function of the iterates will grow unbounded. Unboundedness
would not be an issue were it not for the fact that the only valid values of θ have
finite likelihoods, so the iterates—if they converge to a point—converge to an invalid
parameter value, that is, to a point not in Θ. Estimation of heteroscedastic Gaussian
226 Appendix A: Expectation-Maximization (EM) Method

sums is notorious for this behavior (due to covariance matrix collapse). See Section
3.4 for further discussion.
The details of the EM convergence proof are given in too many places to repeat
all the details here. The book [80] gives a full and careful treatment of convergence,
as well as examples where convergence fails in various ways.
For present purposes, it is enough to establish two facts. One is that if the update
θ (n) satisfies (A.5), then it also increases the likelihood function. The other is that
a critical point of the auxiliary function is also a critical point of the likelihood
function, and conversely.
To see that each EM step monotonically increases the likelihood function, recall
that log x ≤ x − 1 for all x > 0 with equality if and only if x = 1. Then, using
only the above definitions,

0 < Q θ ; θ (n−1) − Q θ (n−1) ; θ (n−1)
)
= (log p Z K (z, k ; θ )) p K |Z k ) z ; θ (n−1) dk
K
)
− log p Z K z, k ; θ (n−1) p K |Z k ) z ; θ (n−1) dk
K
& , 2'
)
p Z K (z, k ; θ ) ) z ; θ (n−1) dk
= log p K |Z k
K p Z K z, k ; θ (n−1)
, 2
p Z K (z, k ; θ ) p Z K z, k ; θ (n−1)
≤ −1 dk
K p Z K z, k ; θ (n−1) p Z (z ; θ (n−1) )
* +
1 (n−1)
= p Z K (z, k ; θ ) − p Z K z, k ; θ dk
p Z z ; θ (n−1) K
1 * +
= p Z (z ; θ ) − p Z z ; θ (n−1) .
p Z z ; θ (n−1)

Clearly, any increase in Q will increase the likelihood function.

To see that critical points of the likelihood function are critical points of the
auxiliary function, and conversely, simply note that

0 = ∇θ0 p Z (z ; θ0 )

= ∇θ0 p Z K (z, k ; θ0 ) dk
K
= p Z K (z, k ; θ0 ) ∇θ0 log p Z K (z, k ; θ0 ) dk
K

p Z K (z, k ; θ0 )
= p Z (z ; θ0 ) ∇θ0 log p Z K (z, k ; θ0 ) dk
K p Z (z ; θ0 )
A.2 Iterative Majorization 227

= p Z (z ; θ0 ) p K |Z (k | z ; θ0 ) ∇θ log p Z K (z, k | θ ) θ = θ dk
0
K
= p Z (z ; θ0 ) [∇θ Q (θ ; θ0 )]θ = θ0 .

For further discussion and references to the literature, see [80].

A.2 Iterative Majorization

The widely referenced paper [23] is the synthesis of several earlier discoveries of
EM in a statistical setting. However, EM is not essentially statistical in character, but
is rather only one member of a larger class of strictly numerical methods called iter-
ative majorization [19–21]. This connection is not often mentioned in the literature,
but the insight was noticed almost immediately after the publication of [23].
The observed data likelihood function p Z (z ; θ ) is the envelop of a two param-
eter family of functions. This family is defined in the EM method by the auxiliary
function, Q(θ ; φ). As seen from the results in [23], the data likelihood function
majorizes every function in this family. For each specified parameter φ, the func-
tion Q(θ ; φ) is tangent to the p Z (z ; θ ) at the point φ. This situation is depicted
in Fig. A.1 for the sequence φ = θ0 , θ1 , θ2 , . . . . It is now intuitively clear that
EM based algorithms monotonically increase the data likelihood function, and that
they converge with high probability to a local maximum of the likelihood function
p Z (z ; θ ) that depends on the starting point.
It is very often observed in practice that EM based algorithms make large strides
toward the solution in the early iterations, but that progress toward the solution

Fig. A.1 Iterative majorization interpretation of the observed (incomplete) data likelihood function
as the envelop of the EM auxiliary function, Q(θ, φ)
228 Appendix A: Expectation-Maximization (EM) Method

slows significantly as the iteration progresses. The iterative majorization interpreta-

tion of EM provides an intuitive insight into both phenomena. See [80] for further
discussion and, in particular, for a proof that the rate of convergence is ultimately
only linear. This convergence behavior explains the many efforts proposed in the
literature to speed up EM convergence.

A.3 Observed Information

The observed information matrix (OIM) corresponding to the ML estimate θ̂ M L is

defined by

O I M(θ̂ M L ) = − ∇θ (∇θ log p Z (z ; θ ))T . (A.6)
θ = θ̂ M L

The matrix O I M(θ ) is an OIM only if θ is an ML estimate. Evaluating the OIM

is often a relatively straightforward procedure. Moreover, EM based ML estima-
tion algorithms are easily adapted to compute the OIM as a by-product of the EM
iteration at very little additional computational expense. See [69] for details.
The OIM and FIM differ in important ways:
• The FIM computes the expectation of the negative Hessian of the loglikelihood
function with respect to the data, while the OIM does not.
• The OIM is evaluated at the ML estimate θ̂ M L , while the FIM is evaluated at the
true value of the parameter.
The expected value of the OIM is sometimes said to be the FIM, but this is not
precisely correct.
The OIM and FIM are alike in that they both evaluate the negative Hessian of
the loglikelihood function. Because of this and other similarities between them, the
OIM is often used as a surrogate for the FIM when the FIM is unknown or otherwise
unavailable. See [26] for further discussion of the OIM, its origins, and potential
utility.
Appendix B
Solving Conditional Mean Equations

A conditional mean equation used in intensity estimation is discussed in this

appendix. The equation is seen to be monotone, and monotonicity implies that the
equation is uniquely solvable.
The conditional mean equation of interest is the vector equation (3.8), which is
repeated here:

s N (s ; μ, Σ) ds
R = x̄ , (B.1)
R N (s ; μ, Σ) ds

where

m
x̄ ≡ x j ∈ Rn x .
m
j=1

The j-th component of x̄ is denoted by x̄ j , which should not be confused with the
data point x j ∈ Rn x . The solution to (B.1) is unique and straightforward to compute
numerically for rectangular multidimensional regions

R = [a1 , b1 ] × · · · × [an x , bn x ]

and diagonal covariance matrices Σ = Diag σ12 , . . . , σn2x .
The conditional mean of a univariate Gaussian distributed random variable con-
ditioned on realizations in the interval [−1, 1] is defined by
1
s N (s ; μ, σ 2 ) ds
M μ, σ 2
≡ −11 , (B.2)
−1 N (s ; μ, σ 2 ) ds

where μ ∈ R and σ > 0. Thus,

lim M μ, σ 2 = 1
μ→∞

229
230 Appendix B: Solving Conditional Mean Equations

and

lim M μ, σ 2 = −1.
μ→−∞

The most important fact about the function M is that it is strictly monotone increas-
ing as a function of μ. Consequently, for any number c such that −1 < c < 1, and
variance σ 2 , the solution μ of the general equation

M μ, σ 2 = c (B.3)

exists
and is unique. To
see this,
it is only necessary to verify that the derivative
∂
M μ, σ 2 ≡ ∂μ M μ, σ 2 > 0 for all μ. The inequality is intuitively obvious

from its definiton as a conditional mean. The function M μ, σ 2 is plotted for sev-

eral values of σ in Fig. B.1. Evaluating the inverse function M −1 μ, σ 2 efficiently
is left as an exercise.

Fig. B.1 Plots of M[μ, σ 2 ] from σ = 0.1 to σ = 1 in steps of 0.1. The monotonicity of
M[μ, σ 2 ] is self evident. The steepness of the transition from −1 to +1 increases steadily with
decreasing σ

The conditional mean of the j-th component of (B.1) is

b1 bn x 1n x
a1 ··· an x s j i=1 N si ; μi , σi2 ds1 · · · dsn x
b1 bn x 1n x = x̄ j . (B.4)
a1 · · · an x i=1 N si ; μi , σi2 ds1 · · · dsn x

The integrals over all variables except x j cancel, so (B.4) simplifies to

Appendix B: Solving Conditional Mean Equations 231

bj
aj sjN s j ; μ j , σ j2 ds j
bj = x̄ j . (B.5)
aj N s j ; μ j , σ j2 ds j

Substituting

bj − aj aj + bj
sj = x + ,
2 2

into (B.5) and using the function (B.2) gives

. a j +b j 2 /
μ− 2σ aj + bj
M b j −a j
2
, + = x̄ j . (B.6)
bj − aj 2
2

Solving
. 2 /
2σ aj + bj
M μ̃ j , = x̄ j − (B.7)
bj − aj 2

for μ̃ j gives the solution of (B.6) as

bj − aj aj + bj
μj = μ̃ j + .
2 2

The expression (B.4) holds with obvious modifications for more general regions
R. However, the multidimensional integral over all the variables except x j is a func-
tion of x j in general and does not cancel from the ratio as it does in (B.5).
Appendix C
Bayesian Filtering

A brief review of general Bayesian filtering is given in this appendix. The discussion
sets the conceptual and notational foundation for Bayesian filtering on PPP event
spaces, all without mentioning PPPs until the very end. Gentler presentations that
readers may find helpful are widely available (e.g., [4, 54, 104, 122]).
The notation used in this appendix is used in Section 6.1 and also in the alterna-
tive derivation of Appendix D.

C.1 General Recursion

Two sequences of random variables are involved: the sequence Ξ0 , Ξ1 , . . . , Ξk

models target motion and Υ1 , . . . , Υk models data. These sequences correspond to
measurement (or scan) times t0 , t1 , . . . , tk , where t j−1 < t j for j = 1, . . . , k. The
only conditional dependencies between these variables are the traditional ones: the
sequence {Ξ j } is Markov, and the conditional variables {Υ j | Ξ j } are independent.
The governing equation for Bayesian tracking is the following joint pdf of the track
(x0 , x1 , . . . , xk ) and the measurements (z 1 , . . . , z k ):

p Ξ0 ,Ξ1 ,...,Ξk ,Υ1 ,...,Υk (x0 , x1 , . . . , xk , z 1 , . . . , z k )

k
= pΞ0 (x0 ) pΞ j |Ξ j−1 (x j | x j−1 ) pΥ j |Ξ j (z j | x j ). (C.1)
j=1

The product form in (C.1) is due to the conditioning.

The Bayesian filter is the posterior pdf of Ξk conditioned on all data up to and
including time tk . Conditioning and marginalizing gives

pΞ0 ,...,Ξk |Υ1 ,...,Υk (x0 , . . . , xk | z 1 , . . . , z k )

pΞ0 ,...,Ξk ,Υ1 ,...,Υk (x0 , . . . , xk , z 1 , . . . , z k )
=
pΥ1 ,...,Υk (z 1 , . . . , z k )
1
pΞ0 (x0 ) kj=1 pΞ j |Ξ j−1 (x j | x j−1 ) pΥ j |Ξ j (z j | x j )
= .
S · · · S pΞ0 ,...,Ξk ,Υ1 ,...,Υk (x 0 , . . . , x k , z 1 , . . . , z k ) dx 0 · · · dx k

233
234 Appendix C: Bayesian Filtering

The posterior pdf is the integral over all states except xk :

pΞk |Υ1 ,...,Υk (xk | z 1 , . . . , z k )

= ··· pΞ0 ,...,Ξk | Υ1 ,...,Υk (x0 , . . . , xk | z 1 , . . . , z k ) dx0 · · · dxk−1 .
S S

The multidimensional integral is evaluated recursively.

pk|k (xk ) ≡ pΞk |Υ1 ,...,Υk (xk | z 1 , . . . , z k )

The numerator of (C.2) is the predicted target pdf:

pk|k−1 (xk ) ≡ pΞk |Υ1 ,...,Υk−1 (xk | z 1 , . . . , z k−1 )

The denominator is the pdf of the measurement z k given that it is generated by the
target with pdf pk|k−1 (xk ):

πk|k−1 (z k ) ≡ pΥk |Υ1 ,...,Υk−1 (z k | z 1 , . . . , z k−1 )

Substituting (C.3) and (C.4) into (C.2) gives

The forward recursion is defined by (C.3)–(C.5).

The pdf πk|k−1 (z k ) is called the partition function in physics and in the machine
learning community. In many tracking applications, it is often simply dismissed as
a scale factor; however, its form is important in PPP applications to tracking. Since
its form is known before the measurement z k is available, it is called the predicted
measurement pdf here.
C.2 Special Case: Kalman Filtering 235

C.2 Special Case: Kalman Filtering

The classic example of Bayesian filters is the Kalman filter, that is, the single target
Bayes-Markov filter with additive Gaussian noise and no clutter. In this case Ξ j
represents the state of a single target, the state space is S ≡ Rn x , n x ≥ 1, and Υ j
represents a data point in the event space T ≡ Rn z , n z ≥ 1. The model is usually
written in an algebraic form as

x j = F j−1 (x j−1 ) + v j−1 (C.6)

and

z j = H j (x j ) + w j , (C.7)

for j = 1, . . . , k, where x j is a realization of the Markovian state variable Ξ j , and

{z j } is a realization of the conditional variables Υ j | Ξ j = x j . The process noises
v j−1 and measurement noises w j in (C.6) are zero mean, Gaussian and independent
with covariance matrices Q j−1 and R j , respectively. The functions F j−1 (·) and
H j (·) are known. The pdf of Ξ0 is N (x0 ; x̄0 , P0 ), where x̄0 and P0 are given. The
equivalent system of pdfs is

pΞ0 (x0 ) = N (x0 ; x̄0 , P0 ) (C.8a)

pΞ j |Ξ j−1 (x j | x j−1 ) = N (x j ; F j−1 (x j−1 ), Q j−1 ) (C.8b)
pΥ j |Ξ j (z j | x j ) = N (z j ; H j (x j ), R j ). (C.8c)

The joint pdf is found by substituting (C.8a)–(C.8c) into (C.1). The recursion
(C.3)–(C.5) gives the posterior pdf on S = Rn x .
The linear Gaussian Kalman filter assumes that

F j−1 (x j−1 ) = F j−1 x j−1 (C.9a)

H j (x j ) = H j x j , (C.9b)

where F j−1 is an n x × n x matrix called the system matrix, and H j is n z × n x and

is called the measurement matrix. The Kalman filter is, in this instance, a recursion
that evaluates the parameters of the posterior pdf

pk|k (xk ) = N xk | x̂k|k , Pk|k , (C.10)

where x̂ j|k is the point estimate of the target at time tk and Pk|k is the associated
error covariance matrix. Explicitly, for j = 0, . . . , k − 1,
236 Appendix C: Bayesian Filtering

These equations are not necessarily good for numerical purposes, especially when
observability is an issue. In practice, the information form of the Kalman filter is
always preferable.
The Kalman filter is often written in terms of measurement innovations when the
measurement and target motion models are linear. The predicted target state at time
t j is

x̂ j| j−1 = F j−1 x̂ j−1| j−1 , j = 1, . . . , k , (C.12)

and the predicted measurement is

ẑ j| j−1 = H j x̂ j| j−1 , j = 1, . . . , k . (C.13)

The innovation at time t j is the difference between the actual measurement and the
predicted measurement:

ν j| j−1 = z j − ẑ j| j−1 , j = 1, . . . , k . (C.14)

The reason for the name innovation is now self-evident. The information updated
target state is

x̂ j| j = x̂ j| j−1 + W j ν j| j−1 , j = 1, . . . , k . (C.15)

The update (C.15) is perhaps more intuitive, but it is the same as before.
For completeness, the smoothing (or, lagged) Kalman filter is given here. The
posterior pdf is denoted by

p j|k (x j ) = N x j | x̂ j|k , Σ j|k , (C.16)

where x̂ j|k is the point estimate of the target at time t j given all the data {z 1 , . . . , z k }
up to and including time tk , and Σ j|k is the associated error covariance matrix. These
quantities are computed by the backward recursion: For j = k −1, . . . , 0, the point
estimates are
−1
x̂ j|k = x̂ j| j + P j| j F jT P j+1| j x̂ j+1 − F j x̂ j| j . (C.17)

The innovation form of the filter, if it is desirable to think of the smoothing filter
in such terms, is written in terms of the state innovations, x̂ j+1 − F j x̂ j| j . The
corresponding error covariance matrices are
C.2 Special Case: Kalman Filtering 237

−1 −1
Σ j|k = P j| j + P j| j F jT P j+1| j Σ j+1 − P j+1| j P j+1| j F j P j| j . (C.18)

The smoothing recursions (C.17)–(C.18) were first derived in 1965 by Rauch, Tung,
and Striebel [98].

C.2.1 Multitarget Tracking

A more exciting example of Bayesian involves multitarget tracking. There are two
approaches to modeling the multitarget state. One involves stacking, or concatenat-
ing, the states of several targets and proceeding with the Bayesian filter using the
stacked state. There are natural symmetries in the joint pdf of this kind of multitarget
state that need to be incorporated into the Bayesian filter. Very general measurement
likelihood functions can also be used. However, the drawback to this approach is
that the target state becomes unmanageably large very quickly. This problem is not
easily overcome.
An alternative approach is to model the multitarget state as a PPP random vari-
able Ξ j . The state space is the PPP event space S ≡ E(Rn x ), and the random vari-
ables Υ j represent data sets. The conditioning is the same as required for Bayesian
filtering, so the posterior pdf is defined on the state space E(Rn x ) and can—in prin-
ciple only—be evaluated using (C.3)–(C.5). The Bayesian posterior is a finite point
process, but it is not a PPP, so it is approximated by one. This approach to multitarget
tracking is the subject of Chapter 6. As discussed there, the intensity function of the
approximating PPP may be more physically meaningful than the points of the PPP
realizations themselves.
Appendix D
Bayesian Derivation of Intensity Filters

The multitarget intensity filter is derived by Bayesian methods in this appendix. The
posterior point process is developed first, and then the posterior point process is
approximated by a PPP. Finally, the last section discusses the relationship between
this method and the “first moment” approximation of the posterior point process.
The steps of the intensity filter are outlined in Fig. 6.1. The PPP interpretations of
these steps are thinning, approximating the Bayes update with a PPP, and superpo-
sition. The PPP at time tk is first thinned by detection. The two branches of the thin-
ning are the detected and undetected target PPPs. Both branches are important. Their
information updates are different. The undetected target PPP is the lesser branch. Its
information update is a PPP. The detected target branch is the main branch, and
its information update comprises two key steps. Firstly, the Bayes update of the
posterior point process of Ξk on E(S + ) given data up to and including time tk is
obtained. The posterior is not a PPP, as is seen below from the form of its pdf in
(D.10). Secondly, the posterior point process is approximated by a PPP, and a low
computational complexity expression for the intensity of the approximating PPP is
obtained. The two branches of detection thinning are recombined by superposition
to obtain the intensity filter update.

D.1 Posterior Point Process

The integral in (D.1) is defined as in (2.97).

239
240 Appendix D: Bayesian Derivation of Intensity Filters

The point process Ξk|k is the sum of detected and undetected target processes,
denoted by Ξk|kD and Ξ U , respectively. They are obtained from the same realizations
k|k
of Ξk|k−1 , so they would seem to be highly correlated. However, the number of
points in the realization is Poisson distributed, so they are actually independent. See
Section 2.9.
The undetected target process Ξk|k U is the predicted target PPP Ξ
k|k−1 thinned by
1 − PkD (s), where PkD (s) is the probability of detecting a target at s. Thus Ξk|k
U is a

PPP, and

U
f k|k (x) = 1 − PkD (x) f k|k−1 (x) (D.2)

is its intensity.
The detected target process Ξk|k
D is the predicted target PPP Ξ
k|k−1 that is thinned
by Pk (s) and subsequently updated by Bayesian filtering. Thinning yields the pre-
D

dicted PPP Ξk|k−1

D , and

D
f k|k−1 (x) = PkD (x) f k|k−1 (x) (D.3)

is its intensity.
The predicted measurement process Υk|k−1 is obtained from Ξk|k−1 D via the pdf
of a single point measurement z ∈ T conditioned on a target located at s ∈ S + . The
quantity pk (z | φ) is the likelihood of z if it is a false alarm. See Section 2.12. Thus,
Υk|k−1 is a PPP on T and

λk|k−1 (z) = pk (z | s) PkD (s) f k|k−1 (s) ds, (D.4)
S+

is its intensity.
The measurement set is υk = {m, {z 1 , . . . , z m }, where z j ∈ T . The conditional
pdf of υk is defined for arbitrary target realizations ξk = (n, {x1 , . . . , xn }) ∈
E(S + ). All the points x j of ξk , whether they are a true target (x j ∈ Rn x ) or are clutter
(x j = φ), generate a measurement so that only when m = n is the measurement
likelihood non-zero. The correct assignment of point measurements to targets in ξk
is unknown. All such assignments are equally probable, so the pdf averages over all
possible assignments of data to false alarms and targets. Because φ is a target state,
the measurement pdf is
% 1 1m
σ ∈Sym (m) pk (z σ ( j) | x j ), m = n
pΥk |Ξk (υk | ξk ) = m! j=1 (D.5)
0, m = n,

where Sym(m) is the set of all permutations on the integers {1, 2, . . . , m}.
The lower branch of (D.5) is a consequence of the “at most one measurement
per target” rule together with the augmented target state space S + . To elaborate, the
points in a realization ξ of the detected target PPP are targets, some of which have
D.2 PPP Approximation 241

state φ. The augmented state space accommodates clutter measurements by using

targets in φ, so only realizations with m = n points have nonzero probability.
The posterior pdf of Ξk|kD on E(S + ) is, from (C.5),

The pdf’s pk|k−1 (ξk ) and πk|k−1 (υk ) of Ξk|k−1

From (D.3) and (D.4),

D
f k|k−1 (s) ds = λk|k−1 (z) dz . (D.9)
S+ T

Substituting (D.7), (D.8), and (D.5) into (D.6) and using obvious properties of per-
mutations gives the posterior pdf of Ξk|k
D :

1

m
pk (z σ ( j) | x j ) PkD (x j ) f k|k−1 (x j )
pk|k (ξk ) = . (D.10)
m! λk|k−1 (z σ ( j) )
σ ∈Sym (m) j=1

If ξk does not contain exactly m points, then pk|k (ξk ) = 0. Conditioning Ξk|k
D on m

points gives the pdf of the points of the posterior process as

1

m
pk (z σ ( j) | x j ) PkD (x j ) f k|k−1 (x j )
pk|k (x1 , . . . , xm ) = .
m! λk|k−1 (z σ ( j) )
σ ∈Sym (m) j=1
(D.11)

The pdf (D.11) holds for x j ∈ S + , j = 1, . . . , m .

D.2 PPP Approximation

The pdf of the posterior point process Ξk|k
D is clearly not that of a PPP. This causes

a problem for the recursion. One way around it is to approximate Ξk|k D by a PPP and

recursively update the intensity of the PPP approximation.

242 Appendix D: Bayesian Derivation of Intensity Filters

The pdf pk|k (x1 , . . . , xm ) = pk|k (xσ (1) , . . . , xσ (m) ) for all σ ∈ Sym(m); there-
fore, integrating it over all of arguments except, say, the th argument gives the same
result regardless of the choice of . The form of the “single target marginal” is, using
(D.4),

m
pk (z σ () | x ) PkD (x ) f k|k−1 (x )
=
m! λk|k−1 (z σ ( j) )
r =1 σ ∈Sym (m)
and σ ()=r

1
pk (zr | x )
m
PkD (x ) f k|k−1 (x )
= . (D.12)
m λk|k−1 (zr )
r =1

This identity holds for arbitrary x ∈ S + .

The joint conditional pdf is approximated by the product of its marginal pdf’s:

m
pk|k (x1 , . . . , xm ) ≈ pk|k (x j ). (D.13)
j=1

The product approximation is called a mean field approximation in the machine

learning community [55, pp. 35–36]. Both sides of (D.13) integrate to one.
The marginal pdf is proportional to the intensity of the approximating PPP. Let
D (x) = c p (x) be the intensity. The likelihood function of the unknown con-
f k|k k|k
stant c is

1 − + c pk|k (s) ds
m

L (c | ξk ) = e S c pk|k (x j ) ∝ e−c cm .
m!
j=1

The maximum likelihood estimate is ĉ M L = m, so that

m
pk (zr | x) PkD (x) f k|k−1 (x)
D
f k|k (x) = (D.14)
λk|k−1 (zr )
r =1

is the intensity of the approximating PPP.

D.3 First Moment Intensity and Janossy Densities 243

D.2.1 Altogether Now

The PPP approximation to the point process Ξk|k is the sum of the undetected target
U and the PPP that approximates the detected target process Ξ D . Hence,
PPP Ξk|k k|k

f k|k (x) = f k|k

U
(x) + f k|k
D
(x) , x ∈ S+
. /

m
pk (zr | x) PkD (x)
= 1 − Pk (x) +
D
f k|k−1 (x) (D.15)
λk|k−1 (zr )
r =1

is the updated intensity of the PPP approximation to Ξk|k .

The intensity filter comprises (D.1), (D.4), and (D.15). The first two equations
are more insightful when written in traditional notation. From (D.1),

f k|k−1 (x) = b̂k (x) + Ψk−1 (x | s) (1 − dk−1 (s)) f k−1|k−1 (s) ds, (D.16)
S

where the predicted target birth intensity is

b̂k (x) = Ψk−1 (x | φ) (1 − dk−1 (φ)) f k−1|k−1 (φ). (D.17)

Also, from (D.4),

λk|k−1 (z) = λ̂k (z) + pk (z | s) PkD (s) f k|k−1 (s) ds , (D.18)
S

where

λ̂k (z) = pk (z | φ) PkD (φ) f k|k−1 (φ) (D.19)

is the predicted measurement clutter intensity.

The above derivation of the intensity and PHD filters was first given in [130]. A
more intuitive “physical space” approach is given by [30]. An analogous derivation
for multisensor multitarget intensity filter is given in [124].

D.3 First Moment Intensity and Janossy Densities

An alternative method due to Mahler [74, 76] is often used in the literature to obtain
the intensity function (D.14) for the detected target posterior point process. The pro-
cess Ξk|k
D is not a PPP, but it is a special case of the class of finite point processes—its

realizations contain exactly m points. The theory of general finite point processes
dates to the 1950s. (An excellent reference is [17, Chapter 5].) This theory is now
applied to Ξk|k
D .
244 Appendix D: Bayesian Derivation of Intensity Filters

Let ξ = (N , X |N ) denote a realization of Ξk|k

D , where N is the number of points

and X |N is the point set. From [17, Section 5.3], the Janossy probability density of
a finite point process is defined by

jn (x1 , . . . , xn ) = p N (n) pX |N ({x1 , . . . , xn } | n) for n = 0, 1, 2, . . . .

(D.20)

Janossy densities were encountered (but left unnamed) early in Chapter 2, (2.10).
Using the ordered argument list as in (2.13) gives

jn (x1 , . . . , xn ) = n! p N (n) pX |N (x1 , . . . , xn | n) for n = 0, 1, 2, . . . .

(D.21)

Intuitively, from [17, p. 125],

⎡ ⎤
Exactly n points in a realization
jn (x1 , . . . , xn ) = Pr ⎣ with one point in each infinitesimal ⎦ . (D.22)
[xi + dxi ), i = 1, . . . , n

D , p (m) = 1 and p (n) = 0 if n = m,

Now, for the finite point process Ξk|k N N
so only one of the Janossy functions is nonzero. The Janossy densities are
%
m! pk|k (x1 , . . . , xm ) , if n = m ,
jn (x1 , . . . , xn ) = (D.23)
0, if n = m ,

where pk|k (x1 , . . . , xm ) is the posterior pdf given by (D.11). The first moment
intensity is denoted in [17] by m 1 (x). From [17, Lemma 5.4.III], it is given in terms
of the Janossy density functions by
∞

1
m 1 (x) = ··· jn+1 (x, x1 , . . . , xn ) dx1 · · · dxn . (D.24)
n! S + S+
n=0

From (D.23), only the term n = m − 1 is nonzero, so that

1
m 1 (x) = ··· m! pk|k (x, x1 , . . . , xm−1 ) dx1 · · · dxm−1 .
(m − 1)! S + S+
(D.25)

The integral (D.25) is exactly m times the integral in (D.12), so the first moment
approximation to Ξk|k
D is identical to the intensity (D.14).
Appendix E
MMIF: Marked Multitarget Intensity Filter

This appendix derives the marked multitarget intensity filter (MMIF) recursion for
linear Gaussian target and measurement models via the EM method. Targets are
modeled as PPPs that are “marked” with measurements.
As seen from Section 8.1, measurement-marked target PPPs are equivalent to
ordinary PPPs on the Cartesian product of the measurement and target spaces. These
joint PPPs are superposed, and the target states estimated via the EM method. As
mentioned in Section 6.2.2, the MMIF satisfies the “at most one measurement per
target rule” in the mean.

E.1 Target Modeling

The multiple target state of L targets at time tk is the vector

xk ≡ (xk (1), . . . , xk (L)) , (E.1)

where xk () ∈ Rn x for 1 ≤ ≤ L. All targets move according to a linear Gauss-

Markov model. Target states are estimated at the discrete times t0 < t1 < t2 <
. . ., where t0 corresponds to a starting time at which the a priori target pdfs are
specified. The pdf of a target in state xk−1 () at time tk−1 transitioning to state xk ()
at time tk is

Ψk−1 (xk () | xk−1 ()) = N (xk () ; Fk−1 () xk−1 (), Q k−1 ()) , (E.2)

where the system matrix Fk−1 () ∈ Rn x ×n x and the process noise covariance
matrix Q k−1 () ∈ Rn x ×n x are specified. The target motion model is equivalent
to xk () = Fk−1 () xk−1 () + u k−1 () , where the process noise u k−1 () ∈ Rn x is
zero mean Gaussian distributed with covariance matrix Q k−1 (). The process noises
are assumed independent from target to target.
Each target is modeled as a PPP. It is assumed, recursively, that the intensity
function of target at time tk−1 is

245
246 Appendix E: MMIF: Marked Multitarget Intensity Filter

f k−1|k−1 (x) = 0
Ik−1|k−1 () N x ; 0
xk−1|k−1 (), Pk−1|k−1 () , (E.3)

where PkD () is the probability of detecting target at time tk and is assumed inde-
pendent of target state x. Also, the predicted state and covariance matrix of target
are

0
xk|k−1 () = Fk−1 () 0
xk−1|k−1 () (E.5)
Pk|k−1 () = Fk−1 () Pk−1|k−1 () T
Fk−1 () + Q k−1 () . (E.6)

The coefficient Ik () is estimated from data at time tk as part of the MMIF recursion.

E.2 Joint Measurement-Target Intensity Function

An arbitrary measurement z ∈ Rn z at time tk originates either from one of the L
targets or from the background clutter. If it originates from the -th target with state
x, then the pdf of z conditioned on x is

p Z |X k () (z | x) = N (z ; Hk () x, Rk ()) , (E.7)

where the measurement matrix Hk () ∈ Rn z ×n x and the measurement noise covari-
ance matrix Rk () ∈ Rn z ×n z are both specified. The measurement model is equiv-
alent to z = Hk () x + vk () , where the measurement noise vk () ∈ Rn z is zero
mean Gaussian distributed with covariance matrix Rk (). The measurement and
target process noises are assumed independent.
Measurements are modeled as marks that are associated with targets that are
realizations of a target PPP. Marked processes are described in a general setting in
Chapter 8. As seen from the Marking Theorem of Section 8.1, a measurement-
marked target PPP is equivalent to a PPP on the Cartesian product of the mea-
surement and target spaces, that is, on Rn z × Rn x . The measurement process is not
assumed to be a PPP.
The intensity function of the joint measurement-target PPP of target in state
x ∈ Rn x at time tk is, from the expression (8.2),

λk|k (z, x) = f k|k−1

(x) p Z |X k () (z | x) . (E.8)

From the basic property of PPPs, the expected number of marked detected targets,
that is, the number of targets with a measurement, is the multiple integral over Rn z ×
Rn x :
E.2 Joint Measurement-Target Intensity Function 247

λk|k (z, x) dz dx =
f k|k−1 (x) dx = PkD () Ik () . (E.9)
R n z ×R n x Rn x

This statement is equivalent to the “at most one measurement per target rule,” but
only in the mean.
Substituting (E.4) and (E.7) gives the joint measurement-target intensity function

λk|k (z, x) = PkD () Ik () N x ; 0
xk|k−1 (), Pk|k−1 () N (z ; Hk () xk (), Rk ())

= PkD () Ik () N x ; 0
xk|k (z ; ), Pk|k () N z ; 0
z k|k−1 (), Sk|k () ,
(E.10)

where, using 0
xk|k−1 () and Pk|k−1 () above, the usual Kalman filter equations give

Sk|k () = Rk () + Hk () Pk|k−1 () Hk−1

The joint measurement-target PPPs are independent because measurements are

independent when conditioned on target state, and because targets are assumed
independent. The measurement clutter intensity function is

λ0k|k (z) = Ik (0) qk (z) , (E.12)

where qk (z) is a specified clutter pdf, i.e., Rn z qk (z) dz = 1. In the language of
Section 8.1, the clutter model is a compound PPP. To ease the notational burden
later in the EM method, let λ0k|k (z) ≡ λ0k|k (z, ∅).
The joint measurement-multitarget PPP at time tk is the superposition of target
and clutter intensity functions:

L
λk|k (z, x) = λ0k|k (z) + λk|k (z, x)
=1
= Ik (0) qk (z)

L

+ PkD () Ik () N x ; 0
xk|k (z ; ), Pk|k () N z ; 0
z k|k−1 (), Sk|k () .
=1
(E.13)

This sum parameterizes the likelihood function of the MMIF filter. The EM method
uses it in the next section to derive a recursion for estimating target states and the
intensity coefficients.
248 Appendix E: MMIF: Marked Multitarget Intensity Filter

E.2.1 Likelihood Function

Denote the number of measurements at time tk by m k ≥ 1 and the measurements
themselves by

z k (1 : m k ) = {z k (1), . . . , z k (m k )} ,

where z k ( j) ∈ Rn z , j = 1, . . . , m k . (Details for the special case m k = 0 are

omitted.) In a joint measurement-target PPP, every measurement z is always paired
with a point x in target state space, but whether or not x corresponds to a target
or to clutter is unknown. Denote the target states associated with the measurements
(marks) by

xk (1 : m k ) = {xk (1), . . . , xk (m k )} ,

where xk ( j) ∈ Rn x , j = 1, . . . , m k . The paired data are

Zk = {(z k ( j), xk ( j)) : j = 1, . . . , m k } .

Because the target model is a PPP, the data Zk are a realization of the measurement-
target PPP with intensity (E.13). Its likelihood function is

mk
p (Zk ) = e− Rn z ×Rn x λk|k (z, x) dz dx
λk|k (z k ( j), xk ( j))
j=1
, 2
L
mk

L
= e−Ik (0) − =1 Pk () Ik () λk|k (z k ( j), xk ( j)) ,
D
λ0k|k (z k ( j)) +
j=1 =1
(E.14)

where (E.9) is used in the last equation.

The difficulty is that there are as many unknown target states as there are data,
while there are L target modes and a clutter mode. Denote the states of the L target
modes by

χk (1 : L) = {χk (1), . . . , χk (L)} , (E.15)

where χk ( j) ∈ Rn x , j = 1, . . . , m k . The clutter mode is mode zero, and its

state is χk (0) = ∅. The unobserved target state xk ( j) of the measurement z k ( j)
corresponds to one of the L target modes or to clutter. Let σ j denote the index of
this mode, so that σ j ∈ {0, 1, . . . , L}. It is now assumed that

xk ( j) = χk (σ j ), j = 1, . . . , m k . (E.16)
E.2 Joint Measurement-Target Intensity Function 249

In other words, measurements that arise from the same mode have exactly the same
target state. The constraints (E.16) violates the exact form of the “at most one mea-
surement per target rule”, but it is not violated in the mean. The target states to be
estimated are χk (1 : L).
The superposition in (E.14) is a clear indication of the utility of the EM method
for computing MAP estimates. In EM parlance, (E.14) is the incomplete data pdf.
It is natural (indeed, other choices seem contrived here) to let the indices σ ≡
{σ1 , . . . , σm k } denote the missing data. The complete data pdf is defined by

L
mk
σ
p (Zk , σ ) = e−Ik (0) − =1 PkD () Ik ()
λk|kj z k ( j), χk (σ j ) . (E.17)
j=1

Let Ik (0 : L) ≡ (Ik (0), Ik (1), . . . , Ik (L)). The posterior pdf of σ is, by the defi-
nition of conditioning,

) p (Zk , σ )
p σ ) χk (1 : L), Ik (0 : L) =
p (Zk )

m k
= wσ j (z k ( j) ; χk (1 : L), Ik (0 : L) , (E.18)
j=1

where, for 1 ≤ ≤ L, the weights for an arbitrary measurement z are given by

The weight for = 0 is

w0 (z ; χk (1 : L), Ik (0 : L))
Ik (0) qk (z)
= L .
Ik (0)qk (z) + =1 PkD ()Ik ()N χk (); 0
xk|k (z; ), Pk|k () N z;0
z k|k−1 (), Sk|k ()
(E.20)

L
The coefficient e−Ik (0) − =0 Pk () Ik () cancels out in the weight calculation. The
D

weights are ratios of intensities. They are the probabilities that the measurement z
is generated by target , or by clutter if = 0.
Let r = 0, 1, . . . be the EM iteration index, and let χk(0) (1 : L) and Ik(0) (0 : L)
be specified initial values of the target states and their intensity coefficients. The EM
auxiliary function is the conditional expectation
250 Appendix E: MMIF: Marked Multitarget Intensity Filter
) (r )
(r )
Q χk (1 : L), Ik (0 : L) ) χk (1 : L), Ik (0 : L)

mk
= {log p(Zk , σ )} w z ; χk(r ) (1 : L), Ik(r ) (0 : L) . (E.21)
j=1

Proceeding algebraically in the manner used frequently in Chapter 3 and dropping

terms that do not depend on χk (1 : L) and Ik (0 : L) gives the simplified expression

) (r )

L
(r )
Q χk (1 : L), Ik (0 : L) ) χk (1 : L), Ik (0 : L) = −Ik (0) − PkD () Ik ()
=1

mk
(r ) (r )
+ w z ; χk (1 : L), Ik (0 : L) log λk|k (z k ( j), χk ()) .
=0 j=1
(E.22)

Maximizing the auxiliary function with respect to χk (1 : L) and Ik (0 : L) gives the

EM recursion. Details are straightforward and are omitted.

E.3 MMIF Recursion

Using (3.38) gives the EM update for the intensity coefficient of the -th target as

mk
(r +1) (r ) (r )
Ik () = D w z k ( j) ; χk (1 : L), Ik (0 : L) . (E.23)
Pk () j=1

The factor PkD () cancels the same factor in the weights (E.19). For clutter, = 0,
the updated intensity coefficient is

mk
(r +1) (r ) (r )
Ik (0) = w0 z k ( j) ; χk (1 : L), Ik (0 : L) . (E.24)
j=1

These updates accord well with the interpretation of the weights.

Finding the updated state for target is little different. Setting the gradient with
respect to χk () equal to zero and solving gives the update

m k
(r ) (r )
j=1 w z k () ; χk (1 : L), Ik (0 : L) 0 xk|k (z k ( j) ; )
(r +1)
χk () = m k .
(r ) (r )
j=1 w z k () ; χk (1 : L), Ik (0 : L)
(E.25)
E.3 MMIF Recursion 251

A more intuitive way to write the result is to substitute for 0 xk|k (z k ( j) ; ) using
(E.11). By linearity, the updated state is given by the Kalman filter
* +
(r +1) (r +1)
χk () = Fk−1 () 0
xk−1|k−1 () + Wk () 3z k|k () − 0
z k|k−1 () , (E.26)

where the “synthetic” measurement for target is defined by

m k
(r ) (r )
j=1 w z k () ; χ k (1 : L), I k (0 : L) z k ( j)
(r +1)
3
z k|k () = m k . (E.27)
(r ) (r )
j=1 w z k () ; χ k (1 : L), I k (0 : L)

This concludes one iteration of the EM algorithm.

On convergence, at say iteration rlast , the MAP estimates of the state of target
and its intensity are

0 (r )
Ik|k () = Ik|klast () 0≤≤ L, (E.28)
(r )
xk|k () =
0 χk|klast () 1≤≤ L. (E.29)

EM iteration stopping criteria are discussed elsewhere.

The M-step of the EM update recursions are explicitly solvable because the
surveillance region is the entire measurement space Rn z . Bounded regions are nec-
essary in practice. As mentioned in Section 3.2.4, the update equations are replaced
by appropriately modified versions of (3.34) and (3.37), respectively, for bounded
regions. If all targets are well inside the surveillance region, the equations here are
excellent approximations.
Appendix F
Linear Filter Model

The output of a linear filter with stationary input signal is exponentially distributed
when the passband of the filter is a subset of the input signal band. This appendix
derives this result as the limit of a PPP model. The points of the PPP realizations are
in the spectral (frequency) domain.
The frequency domain output of a linear filter with a classical stationary Gaussian
input signal is equivalent to a compound PPP, assuming the input signal bandwidth
is larger than the filter bandwidth. In this case the pdf of the instantaneous filter
output power, a 2 , in the filter passband is
2
1 a
p(a 2 ; S 2 ) = 2
exp − 2 . (F.1)
S S

The parameter of this exponential distribution is equal to the signal power, S 2 < ∞,
in the filter passband.

F.1 PPP Signal Model

The input signal is modeled in the frequency domain, not the time domain. (This
contrasts sharply with the time domain model mentioned in Section 8.1.2.) The
frequency domain PPP “emits” points, or shots, across the entire signal band. The
only shots that pass through the filter are those in the filter passband. The shots that
pass through the filter comprise a noncausal PPP process in a frequency domain
equal to that of the filter passband. In essence, the filter acts as a thinning process
on the input PPP. The filtered, or thinned, points are not observed directly in the
filter output. Instead, each point carries a complex-valued mark, or phase, and the
measurement is the coherent sum u of the marks of the points that pass through the
filter. Thus, the measurement u is known, but the number n of shots in the filter
output and the individual marks comprising the coherent sum are all unknown.
Let λ(ω) be the filtered PPP intensity—it is defined in the frequency domain
over the filter passband, B. It is shown in this appendix that the function λ(ω) is the
signal power spectrum.

253
254 Appendix F: Linear Filter Model

Let ν ≡ B λ(x) dx be the mean number of points in B. Now, suppose there
are n points with marks u k = (ck , sk )T , where ck and sk are the in-phase and
quadrature components of the k-th mark, respectively. The marks are assumed i.i.d.
with pdf
& '
ck 0 h̄ 2 1 0
p(u k ) ≡ N ; ,
sk 0 2 01
& '
1 ck2 + sk2
= exp − . (F.2)
π h̄ 2 h̄ 2

The mark power is the variance of its squared magnitude, or h̄ 2 . (The choice of the
the limit h̄ → 0 will
symbol h̄ 2 to represent mark power is intended to suggest that
n
eventually be taken.) The pdf of the coherent mark sum u = k=1 u k ≡ [c, s]
T

is

1 uT u
p(u | n) = exp − . (F.3)
π n h̄ 2 n h̄ 2

The pdf of the joint event (u, n) is, from (2.4) and (F.3),

p(u, n) = p(n) p(u | n)

,
e−ν, if n = 0
= νn uT
e−ν 1
n! π n h̄ 2 exp − u
, if n ≥ 1.
n h̄ 2

Marginalizing over n gives

∞

p(u) = p(u, n)
n=0

∞
= p(n) p(u | n)
n=0
, ∞ 2
−ν 1
νn uT u
= e 1+ exp − . (F.4)
π h̄ 2 n! n
n=1
n h̄ 2

F.2 Poisson Limit

Substituting the constraint

ν h̄ 2 = S 2 ⇔ h̄ 2 = ν −1 S 2 (F.5)

into (F.4) gives

F.2 Poisson Limit 255
, ∞ 2
−ν ν
νn (u T u) ν
p(u) = e 1+ exp − . (F.6)
π S2 n! n n S2
n=1

Taking the limit as ν → ∞ yields the result (see below)

1 uT u
p(u; S ) = lim p(u) =
2
exp − 2 . (F.7)
ν→∞ π S2 S

Changing to power a 2 = u T u gives the pdf (F.1).

The limit (F.7) shows that as mark power h̄ 2 → 0 and mean shot intensity
ν → ∞ in such a way that the total power S 2 is constant, the coherent sum of marks
is identically distributed to the output of the linear filter with stationary Gaussian
input.
The limit (F.7) is obtained by taking the Fourier transform of p(u) ≡ p(c, s) in
(F.4). Let δ(·) denote the Dirac delta function. Then,
∞ ∞
Φ(ω1 , ω2 ) ≡ p(c, s) e i (c ω1 + s ω2 ) dc ds
−∞ −∞
, ∞ n ∞ ∞

ν
−ν
= e δ(ω1 ) δ(ω2 ) +
n! −∞ −∞
n=1
(
1 c2 + s 2 i (c ω1 + s ω2 )
exp − e dc ds
π n h̄ 2 n h̄ 2
, & '2
n h̄ 2 2

∞ n
−ν ν
= e δ(ω1 ) δ(ω2 ) + exp − ω1 + ω2 2
n! 4
n=1
, . & '/2
−ν h̄ 2 2
= e δ(ω1 ) δ(ω2 ) − 1 + exp ν exp − ω1 + ω2 2
.
4

The interchange of summation and double integral in the first step is justified
because the series is absolutely convergent. Substituting (F.5) and taking the limit
gives

lim Φ(ω1 , ω2 ) = lim e−ν {δ(ω1 ) δ(ω2 ) − 1}

ν→∞ ν→∞

S2 2
+ lim exp −ν + ν exp − ω1 + ω22
ν→∞ 4ν

S2 2
−2
= lim exp −ν + ν 1 − ω1 + ω2 + O ν
2
ν→∞ 4ν

S2 2
= exp − ω1 + ω22 .
4

The limit (F.7) follows immediately from the Fourier inversion formula.
256 Appendix F: Linear Filter Model

F.2.1 Utility
Frequency domain PPPs that model the outputs of filters with nonoverlapped pass-
bands are independent. This follows from the interpretation of the PPPs as thinned
versions of the same PPP. (See the end of Section 2.8.) In contrast, for the usual
time domain model, the complex-valued outputs of the cells of a discrete Fourier
transform are asymptotically independent if the windows are not overlapped, and if
the input is a wideband stationary Gaussian signal.
The marked PPP model of signal power spectrum supports the development of
algorithms based on the method of EM.
Glossary

Affine Sum Used in parameter estimation problems in which the intensity function
is of the form f 0 (x) + Σi f i (x ; θi ), where the estimated parameters are {θi }. The
word affine refers to the fact that no parameters are estimated for the term f 0 (x).
Augmented Target State Space Typically, a target state space S + = S ∪ φ com-
prising a continuous component S, such as S ⊂ Rn , and a discrete component φ
not in S. The points in S + represent mutually exclusive and exhaustive statistical
hypotheses about the target. The state φ is interpreted as the hypothesis that a target
generates measurements statistically indistinguishable from clutter, i.e., the target is
a “clutter target”. More generally, a finite or countable number of discrete hypothe-
ses can be used.
Bayes-Markov Filter A sequential estimation method that recursively determines
the posterior density, or pdf, of the available data. Point estimates and some measure
of the area of uncertainty (AOU) are extracted from the posterior density in differ-
ent ways, depending on the application; however, point estimates and their AOUs
characterize the posterior density only for the linear-Gaussian Kalman filter.
Binomial Point Process An i.i.d. point process in which the number of points in a
specified set R ⊂ S is binomially distributed with parameter equal to the integral
over R of a specified pdf on S. It provides an interesting contrast to the Poisson
point process.
Campbell’s Theorem A classic theorem (1909) that gives an explicit form for the
expected value of the random sum Σi f (xi ), where {xi } are the points of a realiza-
tion of a PPP. It is a special case of Slivnyak’s Theorem.
Clutter Point measurements that do not originate from any physical target under
track. Clutter can be persistent, as in ground clutter, and it can be statistical in nature,
that is, arise from the locations of threshold crossings of fluctuations in some ambi-
ent noise background.
Cramér-Rao Bound (CRB) A lower bound on the variance of any unbiased
parameter estimate. It is derived directly from the likelihood function of the data.

257
258 Glossary

Data to Target Assignment Problem A problem that arises with tracking single
targets in clutter, and multiple targets either with or without clutter.
Dirac Delta Function Not really a function at all, but an operator that performs a
point evaluation of the integrand of an integral. Often defined as a limit of a sequence
of test functions.
Expectation-Maximization A method used to obtain ML and MAP parameter
estimation algorithms. It is especially well suited to likelihood functions of pro-
cesses that are sums or superpositions of other, simpler processes. Under broad con-
ditions, it is guaranteed convergent to a local maximum of the likelihood function.
Finite Point Process A geometrical distribution of random occurrences of finitely
many points, the number and locations of which are typically random. In general,
finite point processes are not Poisson point processes.
Fisher Information Matrix (FIM) The inverse of the Cramér-Rao Bound matrix.
Gaussian Mixture A Gaussian sum whose integral is one. Equivalently, a Gaussian
sum whose weights sum to one. Every Gaussian mixture is a pdf.
Gaussian Sum A weighted sum of multivariate Gaussian probability density func-
tions, where the weights are non-negative. Gaussian sums are not, in general, pdfs.
Generalized Functions These are not truly functions at all, but operators. The
concept of point masses leads to the need to define integrals over discrete sets
in a nontrivial way. One way to do this is measure theory, another is generalized
functions. The classic example is the Dirac delta function δ(x); see Test Functions
below. The classic book [71] by Lighthill is charming, short, and very readable.
Homogeneous Poisson Point Process A PPP whose intensity is a (non-negative)
constant.
Independent Increments A concept defined for stochastic processes X (t) which
states that the random variables X (b) − X (a) and X (d) − X (c) are independent
if the intervals (a, b) and (c, d) are disjoint.
Independent Scattering A concept defined for point processes which states that
the numbers and locations of points of a point process in two different sets are inde-
pendent if the sets are disjoint. (Sometimes called independent increments, despite
the confusion in terminology with stochastic processes.)
Intensity The defining parameter of a PPP. In general, intensity is the sum of a
nonnegative ordinary function and at most countably many Dirac delta functions.
Intensity Filter A multi-target tracking filter that recursively updates the intensity
function of a PPP approximation to the target state point process.
Intensity Function The intensity for orderly PPPs is an ordinary function, typically
denoted by λ(x). Intuitively, λ(x) dx specifies the expected number of points of the
Glossary 259

PPP that occur in an infinitesimal volume dx located at x. The expectation is over

the ensemble of all possible PPP realizations.
Iterative Majorization The numerical analysis interpretation of the method of
Expectation-Maximization.
Likelihood Function Any positive scalar multiple of the pdf of a realization of a
random variable when thought of as a function of a parameter to be estimated. In
applications, the realization is typically called the measured data.
Likelihood Ratio The ratio of two different likelihood functions evaluated afor the
same data. The numerator is the likelihood function of a random variable character-
ized by an assumption called the alternative hypothesis, or H1 , and the denominator
is the likelihood function of a random variable characterized by the null hypothesis,
or H0 .
Markov Chain Monte Carlo (MCMC) A statistical technique in which a sample
is drawn from a pdf p(x) by finding a realization of a specially-devised Markov
chain that is run for “many” steps. This realization is a sample from the long term
probability vector q(x) of the Markov chain. The procedure works because the chain
is devised so that (1) q(x) exists, and (2) q(x) is identical to p(x). Typically, MCMC
is used only when p(x) cannot be sampled from directly.
Measure Theory A beautiful elaboration of the concept of length, volume, etc. that
underpins the highest level of axiomatic mathematical rigor for point processes.
Readers unacquainted with it will undoubtedly find the language difficult at first.
This is the price paid for rigor, and the reason for books like this one.
Microtarget A concept related to the interpretation of PPPs as models for real
targets. The intensity function of the multitarget PPP is that of the microtargets, and
the peaks in the intensity function correspond to the locations of real targets. (The
word microtarget is analogous to the concept of microstate in statistical mechanics.)
Observed Information Matrix (OIM) A surrogate for the FIM that is potentially
useful when the FIM is not easily evaluated. It is the negative Hessian matrix of the
likelihood function of the data, evaluated not at the true value of the parameter but
at the ML parameter estimate. The expectation of the OIM over all data is not the
FIM in general.
Orderly Point Process A finite point process in which the points in a realization
are distinct with probability one. An orderly PPP is a PPP in which the intensity is
an intensity function.
Point Process A geometrical distribution of random occurrences of points. Both
the number and locations of the points are typically random. In general, point pro-
cesses are not Poisson point processes.
Poisson Distribution A discrete pdf with a single parameter μ defined on the
non-negative integers 0, 1, 2, . . . by Pr[n] = e−μ μn /n! . (Readers new to point
260 Glossary

processes need to be vigilant when reading to avoid confusing the Poisson distribu-
tion with the Poisson point process.)
Poisson’s Gambit A term that describes the act of modeling the number of trials
in a sequence of Bernoulli trials as Poisson distributed. The name is pertains to
situations wherein the Poisson assumption is imposed by the modeler rather than
arising naturally in the application. The advantage of the assumption is, e.g., that the
numbers of occurrences of heads and tails in a series of coin flips are independent
under Poisson’s gambit.
Poisson Point Process (PPP) A special kind of point process characterized (param-
eterized) by an intensity function λ(x) that specifies both the number and probability
density of the locations of points. The number of points in a bounded set R is Pois-
son distributed with parameter μ = R λ(x) dx. Conditioned on n, the points are
i.i.d. in R with pdf λ(x)/μ . Compare to Binomial point processes (BPP).
Positron Emission Tomography (PET) A widely used medical imaging method-
ology that is based estimating the spatial intensity of positron decay. The well known
Shepp-Vardi algorithm (1982) is the basis of most, if not all, intensity estimation
algorithms.
Posterior Cramér-Rao Bound (PCRB) A lower bound on the variance of an unbi-
ased nonlinear tracking filter.
Probability Density Function (pdf) A function that represents the probability that
a realization x of a random variable X falls in the infinitesimal region dx.
Probability Hypothesis Density (PHD) An additive theory of Bayesian evidence
accrual first proposed by Stein and Winter [121].
PHD Filter A multitarget sequential tracking filter which avoids the data to tar-
get assignment problem by using a Poisson point process (PPP) to approximate
the multitarget state. The correspondence of data to individual target tracks is not
maintained. Target birth and clutter processes are assumed known a priori.
Radon-Nikodym Derivative A measure theoretic term that, with appropriate qual-
ifications and restricted to Rn , is another name for the likelihood ratio of the data
under two different hypotheses.
Random Sum See Campbell’s Theorem and Slivnyak’s Theorem.
Sequential Monte Carlo A generic name for particle methods.
Slivnyak’s Theorem An important theorem (1962) about the expected value of
a random sum that depends on how a point in a PPP realization relates to other
points in the same realization. The random sum is Σi f (xi , {x1 , . . . , xn } \ xi ).
Campbell’s Theorem is the special case f (xi , · ) ≡ f (xi ).
Single Photon Emission Computed Tomography (SPECT) A widely used
medical imaging technique for estimating the spatial intensity of radioisotope decay
Glossary 261

based on multiple gamma (Anger) camera snapshots. The physics differs from PET
in that only one gamma photon arises from each decay.
Target An entity characterized by a point in a target state space. Typically, the
target state evolves sequentially, that is, its state changes over time (or other time-
like variable).
Target State Space A set whose elements completely characterizes the properties
of a target that are of interest in an application. Typically, this set is the vector
space Rn , or some subset thereof, that represents target kinematic properties or other
properties, e.g., radar cross section. This space is sometimes augmented by discrete
attributes that represent specific categorical properties, e.g., target identity.
Test Functions A sequence of infinitely differentiable functions used in proofs
δ(x), the
involving generalized functions. For example, for the Dirac delta function
test sequence is often taken to be the sequence of Gaussian pdfs
N x ; 0, σn2 ,
where σn → 0, so that for any continuous function f (x), R f (x) δ(x) dx ≡

limσn →0 R f (x) N x ; 0, σn2 dx = f (0) .
Test Sets A collection of “simple” sets, e.g., intervals, that generate the Borel sets,
which in turn are used to define measurable sets. A common style of proof is to
demonstrate a result on a sufficiently rich class of test sets, and then invoke appro-
priate general limit theorems to extend the result to measurable sets.
Transmission Tomography A method of imaging the variation in spatial density
of an object known only from measurements of the line integrals of the density
function for a collection of straight lines.
List of Acronyms

AOU Area Of Uncertainty

CFTP Coupling From The Past
CRB Cramér-Rao Bound
CT Computed Tomography (Emission Tomography)
DFT Discrete Fourier Transform
EM Expectation-Maximization
FIM Fisher Information Matrix
FOV Field of View
GEM Generalized EM
HCR Hammersley-Chapman-Robbins
HMM Hidden Markov Model
i.i.d. Independent and Identically Distributed
MAP Maximum A Posteriori
MCMC Markov Chain Monte Carlo
MHT Multiple Hypothesis Tracking
ML Maximum Likelihood
MMIF Marked Multitarget Intensity Filter
MMPP Markov Modulated Poisson Process
MRI Magnetic Resonance Imaging
MTT Multiple Target Tracking
OIM Observed Information Matrix
PCRB Posterior Cramér-Rao Bound
pdf Probability Density Function
PET Positron Emission Tomography
PHD Probability Hypothesis Density
PMHT Probabilistic Multiple Hypothesis Tracking
PMT Photo-Multiplier Tubes
PPP Poisson Point Process
RACS RAndom Closed Set
SAGE Space Alternating Generalized EM
SDE Stochastic Differential Equation
SMC Sequential Monte Carlo

263
264 List of Acronyms

SNR Signal to Noise Ratio

SPECT Single Photon Emission Computed Tomography
SPRT Sequential Probability Ratio Test
SVD Singular Value Decomposition
TBD Track Before Detect
TOF Time Of Flight
References

1. D. A. Abraham and A. P. Lyons. Novel physical interpretations of K-distributed reverbera-

tion. IEEE Journal of Oceanic Engineering, JOE-27(4):800–813, 2002.
2. A. Baddeley. Stochastic geometry: An introduction and reading-list. International Statistical
Review, 50(2):179–193, 1982.
3. A. Baddeley and E. B. V. Jensen. Stereology for Statisticians. Chapman & Hall/CRC, Boca
Raton, FL, 2004.
4. Y. Bar-Shalom and T. E. Fortmann. Tracking and Data Association. Academic, Boston, MA,
1988.
5. E. W. Barankin. Locally best unbiased estimates. Annals of Mathematical Statistics, 20:
477–501, 1949.
6. M. S. Bartlett. Smoothing periodograms from time series with continuous spectra. Nature,
161:686–687, May 1948.
7. M. S. Bartlett. The spectral analysis of two-dimensional point processes. Biometrica, 51:
299–311, 1964.
8. M. S. Bartlett. The Statistical Analysis of Spatial Pattern. Chapman and Hall, New York,
1975.
9. R. E. Bellman. Adaptive Controls Processes: A Guided Tour. Princeton University Press,
Princeton, NJ, 1961.
10. B. Bollobás. Random Graphs. Cambridge University Press, New York, 2001.
11. N. R. Campbell. The study of discontinuous phenomena. Proceedings of the Cambridge
Philosophical Society, 15:117–136, 1909.
12. M. Á. Carreira-Perpiñán. Gaussian mean shift is an EM algorithm. IEEE Trans. Pattern
Analysis and Machine Intelligence, PAMI-29(5):767–776, May 2007.
13. V. Cevher and L. Kaplan. Pareto frontiers of sensor networks for localization. In Proceedings
of the 2008 International Conference on Information Processing in Sensor Networks, pages
27–38, St. Louis, Missouri, 2008.
14. D. G. Chapman and H. Robbins. Minimum variance estimation without regularity assump-
tions. Annals of Mathematical Statistics, 22(4):581–586, 1951.
15. E. Çinlar. On the superposition of m-dimensional point processes. Journal of Applied Prob-
ability, 5:169–176, 1968.
16. N. A. C. Cressie. Statistics for Spatial Data. Wiley, New York, Revised edition, 1993.
17. D. J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes, volume I:
Elementary Theory and Methods. Springer, New York, Second edition, 2003.
18. W. B. Davenport, Jr., and W. L. Root. An Introduction to the Theory of Random Signals and
Noise. McGraw-Hill, New York, 1958.
19. J. de Leeuw. Applications of convex analysis to multidimensional scaling. In J. R. Barra,
F. Brodeau, G. Romier, and B. van Cutsem, editors, Recent Developments in Statistics, pages
133–145. North-Holland, Amsterdam, 1977.

265
266 References

20. J. de Leeuw and W. J. Heiser. Convergence of correction matrix algorithms for multidimen-
sional scaling. In J. C. Lingoes, editor, Geometric Representations of Relational Data, pages
735–752. Mathesis Press, Ann Arbor, MI, 1977.
21. J. de Leeuw and W. J. Heiser. Multidimensional scaling with restrictions on the configuration.
In P. R. Krishnaiah, editor, Multivariate Analysis, volume 5, pages 501–522. North-Holland,
Amsterdam, 1980.
22. A. H. Delaney and Y. Bresler. A fast and accurate iterative reconstruction algorithm for
parallel-beam tomography. IEEE Transactions on Image Processing, IP-5(5):740–753, 1996.
23. A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data
via the EM algorithm. Journal of the Royal Statistical Society, B, 39:1–39, 1977.
24. P. Diaconis. The Markov chain Monte Carlo revolution. Bulletin (New Series) of the
American Mathematical Society, 46:179–205, 2009.
25. S. W. Dufour. Intersections of Random Convex Regions. PhD thesis, Department of Statistics,
Stanford University, Stanford, CA, 1972.
26. B. Efron and D. V. Hinkley. Assessing the accuracy of the maximum likelihood estimator:
Observed versus expected fisher information (with discussion). Biometrika, 65(3):457–487,
1978.
27. A. Einstein. On the method of theoretical physics. Philosophy of Science, 1(2):163–169,
1934.
28. C. L. Epstein. Introduction to the Mathematics of Medical Imaging. SIAM Press, Philadel-
phia, PA, second edition, 2007.
29. A. Erdélyi, editor. Higher Transcendental Functions, volume 2. Bateman Manuscript Project,
New York, 1953.
30. O. Erdinc, P. Willett, and Y. Bar-Shalom. A physical space approach for the probability
hypothesis density and cardinalized probability hypothesis density filters. In Proceedings
of the SPIE Conference on Signal Processing of Small Targets, Orlando, FL, volume 6236,
April 2006.
31. O. Erdinc, P. Willett, and Y. Bar-Shalom. The bin-occupancy filter and its connection to the
PHD filters. IEEE Transactions on Signal Processing, 57: 4232–4246, 2009.
32. K. J. Falconer. Applications of a result on spherical integration to the theory of convex sets.
The American Mathematical Monthly, 90(10):690–693, 1983.
33. P. Faure. Theoretical model of reverberation noise. Journal of the Acoustical Society of
America, 36(2):259–266, 1964.
34. J. A. Fessler. Statistical image reconstruction methods for transmission tomography. In
M. Sonka and J. M. Fitzpatrick, editors, Handbook of Medical Imaging, SPIE, Bellingham,
Washington, volume 2, pages 1–70, 2000.
35. J. A. Fessler and A. O. Hero. Space-alternating generalized expectation- maximization algo-
rithm. IEEE Transactions on Signal Processing, SP-42(10):2664–2677, 1994.
36. P. M. Fishman and D. L. Snyder. The statistical analysis of space-time point processes. IEEE
Transactions on Information Theory, IT-22:257–274, 1976.
37. D. Fränken, M. Schmidt, and M. Ulmke. “Spooky action at a distance” in the cardinalized
probability hypothesis density filter. IEEE Transactions on Aerospace and Electronic Sys-
tems, AES-45(4):1657–1664, October 2009.
38. K. Fukunaga and L. D. Hostetler. The estimation of the gradient of a density function,
with application in pattern recognition. IEEE Transactions on Information Theory, IT-21(1):
32–40, January 1975.
39. I. I. Gikhman and A. V. Skorokhod. Introduction to the Theory of Random Processes. Dover,
Mineola, NY, Unabridged republication of 1969 edition, 1994.
40. J. R. Goldman. Stochastic point processes: Limit theorems. The Annals of Mathematical
Statistics, 38(3):771–779, 1967.
41. I. R. Goodman, R. P. S. Mahler, and H. T. Nguyen. Mathematics of Data Fusion. Kluwer,
Dordrecht, 1997.
42. G. R. Grimmett and D. D. Stirzaker. Probability and Random Processes. Oxford University
Press, Oxford, Third edition, 2001.
References 267

43. M. Haenggi. On distances in uniformly random networks. IEEE Transactions on Information

Theory, IT-51:3584–3586, 2005.
44. P. Hall. Introduction to the Theory of Coverage Processes. Wiley, New York, 1988.
45. J. M. Hammersley. On estimating restricted parameters. Journal of the Royal Statistical
Society, Series B, 12(2):192–240, 1950.
46. W. Härdle and L. Simar. Applied Multivariate Statistical Analysis. Springer, Berlin, 2003.
47. S. I. Hernandez. State Estimation and Smoothing for the Probability Hypothesis Density
Filter. Tech. Report ECSTR10-13, Victoria University of Wellington, School of Engineering
and Computer Science, April 14, 2010. (Available from http://ecs.victoria.ac.nz/
Main/TechnicalReportSeries) (Also submitted as a PhD thesis, Victoria University of
Wellington, New Zealand, 2010.)
48. A. O. Hero. Poisson models and mean-squared error for correlator estimators of time delay.
IEEE Transactions on Information Theory, IT-34:843–858, 1988.
49. A. O. Hero. Lower bounds on estimator performance for energy-invariant parameters of
multidimensional Poisson processes. IEEE Transactions on Information Theory, IT-35:
287–303, 1989.
50. A. O. Hero and J. A. Fessler. A fast recursive algorithm for computing CR-type bounds for
image reconstruction problems. In Proceedings of the IEEE Nuclear Science Symposium and
Medical Imaging Conference, Orlando, FL, pages 1188–1190, 1992.
51. A. O. Hero and J. A. Fessler. A recursive algorithm for computing Cramér-rao-type bounds
on estimator variance. IEEE Transactions on Information Theory, IT-40:843–848, 1994.
52. P. G. Hoel, S. C. Port, and C. J. Stone. Introduction to Probability Theory. Houghton Mifflin,
Boston, MA, Fourth edition, 1971.
53. E. Jakeman and R. J. A. Tough. Non-Gaussian models for the statistics of scattered waves.
Advances in Physics, 37(5):471–529, 1988.
54. A. Jazwinski. Stochastic Processes and Filtering Theory. Dover, Mineola, NY, Unabridged
republication of 1970 edition, 2007.
55. T. Jebara. Machine Learning: Discriminative and Generative. Kluwer, Boston, MA, 2004.
56. M. E. Johnson. Multivariate Statistical Simulation. Wiley, New York, 1987.
57. A. F. Karr. Point Processes and Their Statistical Inference. Marcel Dekker, New York,
Second edition, 1991.
58. K. Kastella. A maximum likelihood estimator for report-to-track association. In Proceedings
of the SPIE, Signal and Data Processing or Small Targets, Orlando, Florida, volume 1954,
386–393, 1993.
59. K. Kastella. Event averaged maximum likelihood estimation and mean-field theory in multi-
target tracking. IEEE Transactions on Automatic Control, 50:1070–1073, 1995.
60. K. Kastella. Discrimination Gain for Sensor Management in Multitarget Detection and
Tracking. IMACS Conference on Computational Engineering in Systems Applications, Sym-
posium on Control, Optimization and supervision, volume 1, Lille, France, 9–12 July pages
167–172, 1996.
61. M. G. Kendall and P. A. P. Moran. Geometrical Probability. Griffin, London, 1963.
62. A. I. Khinchine. Mathematical Methods in the Theory of Queueing. Griffon, London, 1955.
Translated from Russian, 1960.
63. J. F. C. Kingman. Poisson Processes. Clarendon Press, Oxford, 1993.
64. G. E. Kopec. Formant tracking using hidden Markov models and vector quantization. IEEE
Transactions on Acoustics, Speech, and Signal Processing, ASSP-34:709–729, 1986.
65. K. Lange and R. Carson. EM reconstruction algorithms for emission and transmission tomog-
raphy. Journal of Computer Assisted Tomography, 8(2):306–316, 1984.
66. J. Langner. Development of a parallel computing optimized head movement correction
method in positron emission tomography. Master’s thesis, University of Applied Sciences
Dresden and Research Center, Dresden-Rossendorf, 2003. http://www.jens-langner.
de/ftp/MScThesis.pdf.
67. L. Lazos and R. Poovendran. Stochastic coverage in heterogeneous sensor networks. ACM
Transactions on Sensor Networks, 2(3):325–358, 2006.
268 References

68. K. Lewin. The research center for group dynamics at Massachusetts Institute of Technology.
Sociometry, 8: 126–136, 1945.
69. T. A. Lewis. Finding the observed information matrix when using the EM algorithm. Journal
of the Royal Statistical Society, Series B (Methodological), 44(2):226–233, 1982.
70. R. M. Lewitt and S. Matej. Overview of methods for image reconstruction from projections
in emission computed tomography. Proceedings of the IEEE, 91(10):1588–1611, 2003.
71. M. J. Lighthill. Introduction to Fourier Analysis and Generalized Functions. Cambridge
University Press, London, 1958.
72. L. B. Lucy. An iterative technique for the rectification of observed distributions. The Astro-
nomical Journal, 79:745–754, 1974.
73. T. E. Luginbuhl. Estimation of General, Discrete-Time FM Signals. PhD thesis, Department
of Electrical Engineering, University of Connecticut, Storrs, CT, 1999.
74. R. P. S. Mahler. Multitarget Bayes filtering via first-order multitarget moments. IEEE Trans-
actions on Aerospace and Electronic Systems, AES-39:1152–1178, 2003.
75. R. P. S. Mahler. PHD filters of higher order in target number. IEEE Transactions on
Aerospace and Electronic Systems, AES-43:1523–1543, 2007.
76. R. P. S. Mahler. Statistical Multisource-Multitarget Information Fusion. Artech House,
Boston, MA, 2007.
77. B. Matérn. Spatial variation. Meddelanden fran Statens Skogsforskningsinstitut (Communi-
cations of the State Forest Research Institute), 49(5):163–169, 1960.
78. B. Matérn. Spatial Variation. Number 36 in Lecture Notes in Statistics. Springer, New York,
second edition, 1986.
79. G. Matheron. Random Sets and Integral Geometry. John Wiley & Sons, New York, 1975.
80. G. J. McLachlan and T. Krishnan. The EM Algorithm and Extensions. Wiley, New York,
1997.
81. G. J. McLachlan and D. Peel. Finite Mixture Models. Wiley, New York, 2000.
82. M. I. Miller, D. L. Snyder, and T. R. Miller. Maximum-likelihood reconstruction for
single-photon emission computed-tomography. IEEE Transactions on Nuclear Science,
NS-32(1):769–778, 1985.
83. P. Mitra. Spectral analysis: Point processes, August 16, 2006. http://wiki.neufo.org/
neufo/jsp/Wiki?ParthaMitra.
84. J. Møller and R. P. Waagepetersen. Statistical Inference and Simulation for Spatial Point
Processes. Chapman & Hall/CRC, Boca Raton, FL, 2004.
85. M. R. Morelande, C. M. Kreucher, and K. Kastella. A bayesian approach to multiple target
detection and tracking. IEEE Transactions on Signal Processing, SP-55:1589–1604, 2007.
86. N. Nandakumaran, T. Kirubarajan, T. Lang, and M. McDonald. Gaussian mixture probability
hypothesis density smoothing with multiple sensors. IEEE Transactions on Aerospace and
Electronic Systems. Accepted for publication, 2010.
87. N. Nandakumaran, T. Kirubarajan, T. Lang, M. McDonald, and K. Punithakumar. Multitarget
tracking using probability hypothesis density smoothing. IEEE Transactions on Aerospace
and Electronic Systems. submitted, March 2008.
88. J. K. Nelson, E. G. Rowe, and G. C. Carter. Detection capabilities of randomly-deployed
sensor fields. International Journal of Distributed Sensor Networks, 5(6):708 – 728, 2009.
89. R. Niu, P. Willett, and Y. Bar-Shalom. Matrix CRLB scaling due to measurements of uncer-
tain origin. IEEE Transactions on Signal Processing, SP-49:1325–1335, 2001.
90. B. Øksendal. Stochastic Differential Equations, An Introduction with Applications. Springer,
Berlin, Fourth edition, 1995.
91. J. A. O’Sullivan and J. Benac. Alternating minimization algorithms for transmission tomog-
raphy. IEEE Transactions on Medical Imaging, MI-26:283–297, 2007.
92. C. Palm. Intensitätsschwankungen im fernsprechverkehr. Ericsson Techniks, 44:1–189, 1943.
93. A. Papoulis. Probability, Random Variables, and Stochastic Processes. McGraw-Hill, New
York, 1965.
94. M. D. Penrose. On k-connectivity for a geometric random graph. Random Structures and
Algorithms, 15(2):145–164, 1999.
References 269

95. A. Popescu. Markov-Modulated Poisson Processes. http://www.itm.bth.se/~adrian/

courses/modern_techniques_networking/assignments/Tools.html.
96. J. G. Propp and D. B. Wilson. Exact sampling with coupled markov chains and applications
to statistical mechanics. Random Structures and Algorithms, 9(1/2):223–252, 1996.
97. S. L. Rathbun and N. Cressie. Asymptotic properties of estimators for the parameters of a
spatial inhomogeneous poisson point process. Advances in Applied Probability, 26:122–154,
1994.
98. H. E. Rauch, F. Tung, and C. T. Striebel. Maximum likelihood estimates of linear dynamic
systems. AIAA Journal, 3:1445–1450, 1965.
99. R. A. Redner and Homer F. Walker. Mixture densities, maximum likelihood, and the EM
algorithm. SIAM Review, 26(2):195–239, 1984.
100. S. Resnick. Adventures in Stochastic Processes, with Illustrations. Birkhäuser, Boston, MA,
1992.
101. S. O. Rice. Mathematical analysis of random noise. Bell System Technical Journal, 23–24:
1–162, 1944.
102. W. H. Richardson. Bayesian-based iterative method of image restoration. Journal of the
Optical Society of America, 62:55–59, 1972.
103. B. D. Ripley. Spatial Statistics. John Wiley & Sons, New York, 1981.
104. B. Ristic, S. Arulampalam, and N. Gordon. Beyond the Kalman Filter. Artech House, Boston,
2004.
105. H. E. Robbins. On the measure of a random set. The Annals of Mathematical Statistics,
15:70–74, 1944.
106. H. E. Robbins. On the measure of a random set. II. The Annals of Mathematical Statistics,
16:342–347, 1945.
107. E. G. Rowe and T. A. Wettergren. Coverage and reliability of randomly distributed sensor
systems with heterogeneous detection range. International Journal of Distributed Sensor
Networks, 5(4):303–320, 2009.
108. L. A. Santaló. Integral Geometry and Geometric Probability. Cambridge University Press,
Cambridge, Second edition, 2004.
109. R. T. Seeley. Spherical harmonics. The American Mathematical Monthly, 73:115–121, 1966.
Slaught Memorial Supplement.
110. L. A. Shepp and Y. Vardi. Maximum likelihood reconstruction for emission tomography.
IEEE Transactions on Medical Imaging, MI-1(2):113–122, 1982.
111. H. Sidenbladh. Multi-target particle filtering for probability hypothesis density. In Pro-
ceedings of the International Conference on Information Fusion, Cairns, Australia, pages
800–806. ISIF, 2003.
112. M. Šimandl, J. Královec, and P. Tichavský. Filtering, predictive and smoothing Cramér-Rao
bounds for discrete-time nonlinear dynamic systems. Automatica, 37:1703–1716, 2001.
113. I. M. Slivnyak. Some properties of stationary flows of homogeneous random events. Theory
of Probability and Its Applications, 7:336–341, 1962. Russian original in Teoriya Veroyat-
nostei i ee Primeneniya, 347–352, 1962.
114. B. J. Slocumb and D. L. Snyder. Maximum likelihood estimation applied to quantum-limited
optical position keeping. In Proceedings of the SPIE Technical Symposium on Optical Engi-
neering and Photonics in Aerospace Sensing Orlando, FL, pages 165–176, 1990.
115. B. J. Slocumb. Position-sensing algorithms for optical communications. Master’s thesis,
Department of Electrical Engineering, Washington University, St. Louis, MO, 1988.
116. W. B. Smith and R. R. Hocking. A simple method for obtaining the information matrix for a
multivariate normal distribution. The American Statistician, 22:18–20, 1968.
117. D. L. Snyder and P. M. Fishman. How to track a swarm of fireflies by observing their flashes.
IEEE Transactions on Information Theory, IT-21:692–695, 1975.
118. D. L. Snyder. Random Point Processes. Wiley, New York, 1975.
119. D. L. Snyder and M. I. Miller. Random Point Processes in Time and Space. Springer,
New York, Second edition, 1991.
120. H. Solomon. Geometric Probability. Society for Industrial and Applied Mathematics,
Philadelphia, PA, 1978.
270 References

121. M. C. Stein and C. L. Winter. An additive theory of probabilistic evidence accrual, 1993.
Report LA-UR-93-3336, Los Alamos National Laboratories.
122. L. D. Stone, T. L. Corwin, and C. A. Barlow. Bayesian Multiple Target Tracking. Artech
House, Inc., Norwood, MA, 1999.
123. D. Stoyan, W. S. Kendall, and Joseph Mecke. Stochastic Geometry and its Applications.
Wiley, Chichester, second edition, 1995.
124. R. L. Streit. Multisensor multitarget intensity filter. In Proceedings of the International
Conference on Information Fusion, Cologne, Germany. ISIF, pp 1694–1701 30 June–3 July
2008.
125. R. L. Streit. PHD intensity filtering is one step of a MAP estimation algorithm for positron
emission tomography. In Proceedings of the International Conference on Information
Fusion, Seattle. ISIF, pp 308–315 6 July – 9 July 2009.
126. R. L. Streit and T. E. Luginbuhl. Maximum likelihood method for probabilistic multi-
hypothesis tracking. In Proceedings of the SPIE Conference on Signal and Data Processing
of Small Targets, volume 2235, pages 394–405, Orlando, FL, 1991.
127. R. L. Streit and T. E. Luginbuhl. A probabilistic multi-hypothesis tracking algorithm without
enumeration and pruning. In Proceedings of the Sixth Joint Service Data Fusion Symposium,
pages 1015–1024, Laurel, Maryland, 1993.
128. R. L. Streit and T. E. Luginbuhl. Probabilistic multi-hypothesis tracking, 1995. Technical
Report 10,428 , Naval Undersea Warfare Center, Newport, RI.
129. R. L. Streit and Tod E. Luginbuhl. Estimation of Gaussian mixtures with rotationally invari-
ant covariance matrices. Communications in Statistics: Theory and Methods, 26:2927–2944,
1997.
130. R. L. Streit and L. D. Stone. Bayes derivation of multitarget intensity filters. In Proceedings
of the International Conference on Information Fusion, Cologne, Germany. ISIF, pp.1686–
1693 30 June–3 July 2008.
131. M. Ter-Pogossian. The Physical Aspects of Diagnostic Radiology. Hoeber Medical Division,
Harper and Rowe, New York, 1967.
132. H. R. Thompson. Distribution of distance to Nth neighbour in a population of randomly
distributed individuals. Ecology, 37:391–394, 1956.
133. P. Tichavský, C. H. Muravchic, and A. Nehorai. Posterior Cramér-Rao bounds for discrete-
time nonlinear filtering. IEEE Transactions on Signal Processing, SP-46:1386–1396, 1998.
134. H. L. Van Trees. Detection, Estimation, and Modulation Theory—Part I. Wiley, New York,
1968.
135. H. L. Van Trees and Kristine L. Bell, editors. Bayesian Bounds for Parameter Estimation
and Nonlinear Filtering and Tracking. Wiley, 2007.
136. M. N. M. van Lieshout. Markov Point Processes and Their Applications. Imperial College
Press, London, 2000.
137. B.-N. Vo, S. Singh, and A. Doucet. Sequential Monte Carlo methods for multi-target filtering
with random finite sets. IEEE Transactions on Aerospace and Electronic Systems, AES-
41:1224–1245, 2005.
138. B.-T. Vo, B.-N. Vo, and A. Cantoni. The cardinalized probability hypothesis density filter
for linear Gaussian multi-target models. In Proceedings of the 40th Annual Conference on
Information Sciences and Systems, Princeton, NJ, pp.681–686 March 22–24 2006.
139. B.-T. Vo, B.-N. Vo, and A. Cantoni. Analytic implementations of the cardinalized probability
hypothesis density filter. IEEE Transactions on Signal Processing, SP-55:3553–3567, 2007.
140. W. G. Warren. The center-satellite concept as a basis for ecological sampling (with discus-
sion). In G. P. Patil, E. C. Pielou, and W. E. Waters, editors, Statistical Ecology, volume 2,
pages 87–118. Pennsylvania State University Press, University Park, IL, 1971. (ISBN 0-271-
00112-7).
141. Y. Watanabe. Derivation of linear attenuation coefficients from CT numbers for low-energy
photons. Physics in Medicine Biology, 44:2201–2211, 1999.
142. T. A. Wettergren and M. J. Walsh. Localization accuracy of track-before-detect search strate-
gies for distributed sensor networks. EURASIP Journal on Advances in Signal Processing,
Article ID 264638:15, 2008.
Index

A D
Acceptance-rejection procedure, 14, 31 Delesse’s principle, 198
Affine Gaussian sums, 72, 78, 103 Dirac delta function, 13, 45, 94, 258
Ambient noise, 221 Dirichlet density, 80
Anisotropy, 195 Discrete spaces, 50
Attenuation, 135 Discrete-continuous integral, definition, 53
Augmented space, 50, 53 Discrete-continuous spaces, 50
Distance distributions, 180
B
Barrier problems, 196 E
Bayes-Markov filter, 235 Energy function, 217
Bayesian data splitting, 70 Ensemble average, 19, 149
Erdös-Rényi, 187
Bayesian filtering, 233
Expectation, 18
Bayesian method, 80
Expectation of a random sum, 21
Bernoulli thinning, 30
Expectation of outer product of random sums,
Bernoulli trial, 36
23
Bertrand’s paradox, 191
Expectation-Maximization, 3, 63, 112, 164,
Binomial point process, 7, 20
223
Boolean process, 6
Extreme value distributions, 184

C F
Calculus of variations, 126 Field of view, 198
Campbell’s Theorem, 23 Filtered process, 207
Cauchy-Schwarz, 84 Finite point processes, 7
Central Limit Theorem, 29 First moment intensity, 148, 244
Characteristic function, 24 Fisher information matrix, 142, 207
Cluster process, 210 Fourier reconstruction, 112
Coloring Theorem, 38, 205 Fourier transform, 24
Complete graph, 190 Funk-Hecke Theorem, 112
Compton effect, 124
Conditional mean equation, 229 G
Convex combination, 165 Gamma (Anger) camera, 124
Count record data, 3 Gating, 82
Coupling from the past, 222 Gaussian mixtures, 3, 80
Coverage, 6, 190 Gaussian sum, 3, 69, 168
Cox process, 213 Generalized EM, 225
Cramér-Rao bound, 3, 81, 142, 220 Generalized functions, 13, 44, 258
Crofton’s Theorem, 196 Geometric random graph, 4, 187

271
272 Index

Geometry, stochastic, 6 M
Germ-grain model, 6 Machine learning, 234
Gibbs phenomenon, 143 Marked processes, 204
Gibbs process, 216 Marking Theorem, 205
Grand canonical ensemble, 216 Markov chain, 216
Grenander’s Method of Sieves, 143, 144, Markov Chain Monte Carlo, 203, 221
169 Markov modulated Poisson process, 216
Markov point process, 216
H Markov point processes, 220
Hard core process, 208 Markov transition function, 46
Harmonic spectrum, 79 Matérn hard core process, 209
Hausdorff space, 50 Maximum likelihood, 3, 57
Heteroscedastic sums, 78 Maximum a posteriori, 3
Histogram data, 18, 35 Mean shift algorithm, 164
Histograms, 52 Measurement process, 47
Homogeneous PPP, 13, 189 Microstates, 150
Homoscedastic sums, 78 Microtargets, 150, 175
Homothetic sum, 79 Moments of a random sum, 24
Hyperparameters, 80 Moore’s Law, 163
Multi-sensor intensity filter, 172
I Multinomial thinning (coloring), 38
Importance function, 15 Multinomial trials, 38
Independent increments, 8, 41 Multisensor intensity filter, 134, 173
Independent scattering, 8, 33 Multisets, 8, 12
Inevitability of Poisson distribution, 38 Multitarget tracking, 237
Innovations, 236 Multivariate Gaussian density, expression for,
Intensity, 12, 209 15
Inverse probability mapping on the
N
real line, 43
Nearest neighbor graph, 187
Isopleth, 161
Nearest neighbor tracking, 160
Iterative majorization, 5, 227
Negative binomial distribution, 221
Ito differential equation, 215
Neural spike trains, 213
Neyman-Scott cluster process, 210
J
Nonhomogeneous PPP, definition, 13
Janossy density, 154
Nonlinear transformation, 42
Janossy density function, 244
Joint detection and tracking, 50 O
Observed information matrix, 87, 100, 228
K Olber’s paradox, 45
k-connectivity, 189
k-coverage, 194 P
K-distribution, 221 Parameter tying, 78
Kalman filter, 235 Pareto optimization, 190
Kernel estimator, 164 Particle method, 161
Partition function, 234
L Percolation, 221
Laplace functional, 27 Photo-multiplier tube, 124
Lattices, 50 Photoelectric effect, 60, 124
Level curves, 184 Photoemission, 60
Likelihood function, histogram data, 35 Poisson approximation to binomial
Likelihood function, ordered data, 18 distribution, 195
Likelihood function, unordered data, 17 Poisson cluster process, 210
Luginbuhl’s harmonic spectrum, 79 Poisson gambit, 37
Index 273

Poisson Limit Theorem, 29 Signal to noise ratio (SNR), 95

Poisson process, 41 Similitude ratio, 79
Positron emission tomography (PET), 3, 110 Slivnyak’s Theorem, 23, 185
Posterior CRB, 87 Social networks, 222
Power law distribution, 222 SPECT, 4, 124
Power spectral estimation, 5, 172, 253 Stereology, 198
PPP event space, 12 Stochastic differential equation, 215
PPP intensity under a change of variables, 42 Stochastic processes, 41
PPP realizations, two-step procedure, 14 Stochastic transformations, 46
Probabilistic Multiple Hypothesis Tracking, Strauss point process, 221
178 Strophoscedastic sums, 79
Probability density function, 3 Superposition, 28, 216
Probability generating functional, 27 Symmetric group of permutations, 19, 240
Product space representation of marked PPPs, Synthetic measurement, 251
205
Projection mapping, PPP intensity function T
under, 43 Target motion model, 46
Projection-Slice Theorem, 112 Telephony, 6
Thinning, 30
Q Tichavský recursion, 88
Queueing networks, 6 Track before detect, 171
Track management, 171
R Traffic analysis, 197
RACS, 6 Transition process, 46
Radon transform, 112 Transition process on discrete spaces, 52
Random finite list, 8 Transmission tomography (CT scans), 4, 111
Random finite sets, 8 Trinity Theorem, 185
Random sum, 21
Regularization, 143, 169 U
Renewal theory, 6 Unbiased estimators, 83
Reverberation, 221 Underwater acoustics, 221
Richardson-Lucy algorithm, 4 Uniformly sparse PPPs, 29
Ripley’s K -function, 221
Rosiwal’s principle, 199 V
Variance of a random sum, 22
S Vertex degree, 187
Sample data, 59 Void probabilities, 31
Scintillation, 124
Score vector, 84 W
Self-exciting point process, 215 Weibull distribution, 185
Sequential Monte Carlo, 58, 161 White Gaussian noise, 8
Sequential probability ratio test, 216 Wiener process, 41
Shepp-Vardi algorithm, 3, 156 Wiener-Khinchin Theorem, 172
Shot noise, 8 Wishart density, 80

Bayesian Statistical Methods
100% (10)
Bayesian Statistical Methods
288 pages
Essentials of Probability Theory For Statisticians
67% (3)
Essentials of Probability Theory For Statisticians
419 pages
CQF Exam 3 Questions and Guide
100% (1)
CQF Exam 3 Questions and Guide
8 pages
Missing and Modified Data in Nonparametric Estimation
100% (2)
Missing and Modified Data in Nonparametric Estimation
465 pages
An Introduction To Bayesian Inference, Methods and Computation
100% (1)
An Introduction To Bayesian Inference, Methods and Computation
177 pages
Models For Multi-State Survival Data - Per Kragh Andersen, Henrik Ravn (Chapman & Hall - CRC Texts in Statistical Science) - CRC (2024)
No ratings yet
Models For Multi-State Survival Data - Per Kragh Andersen, Henrik Ravn (Chapman & Hall - CRC Texts in Statistical Science) - CRC (2024)
293 pages
(Rice J.a.) Mathematical Statistics and Data Analy
100% (3)
(Rice J.a.) Mathematical Statistics and Data Analy
685 pages
2013 Book BayesianAndFrequentistRegressi PDF
No ratings yet
2013 Book BayesianAndFrequentistRegressi PDF
700 pages
Statistical Regression Modeling With R: Ding-Geng (Din) Chen Jenny K. Chen
No ratings yet
Statistical Regression Modeling With R: Ding-Geng (Din) Chen Jenny K. Chen
239 pages
Asymptotical Statistics
100% (2)
Asymptotical Statistics
460 pages
Cause and Correlation in Biology - A User's Guide To Path Analysis, Structural Equations and Causal Inference
100% (3)
Cause and Correlation in Biology - A User's Guide To Path Analysis, Structural Equations and Causal Inference
330 pages
David R Brillinger Time Series Data Analysis and Theory 2001 Compress
No ratings yet
David R Brillinger Time Series Data Analysis and Theory 2001 Compress
561 pages
Statistical Models and Methods For Lifetime Data PDF
No ratings yet
Statistical Models and Methods For Lifetime Data PDF
330 pages
Applied Stochastic Modelling, Second Edition PDF
100% (6)
Applied Stochastic Modelling, Second Edition PDF
363 pages
An Introduction To Bootstrap Methods With Applications To R
No ratings yet
An Introduction To Bootstrap Methods With Applications To R
236 pages
(Monographs On Statistics and Applied Probability (Series) 26) Silverman, B. W - Density Estimation For Statistics and Data Analysis-Routledge (2018)
No ratings yet
(Monographs On Statistics and Applied Probability (Series) 26) Silverman, B. W - Density Estimation For Statistics and Data Analysis-Routledge (2018)
186 pages
(CMBS-NSF 59) Grace Wahba - Spline Models For Observational Data-SIAM (1990)
No ratings yet
(CMBS-NSF 59) Grace Wahba - Spline Models For Observational Data-SIAM (1990)
179 pages
Optimal Control For Mathematical Models of Cancer Therapies - An Application of Geometric Methods
No ratings yet
Optimal Control For Mathematical Models of Cancer Therapies - An Application of Geometric Methods
511 pages
McCulloch and Searle, 2001. Generalized, Linear, and Mixed Models
0% (1)
McCulloch and Searle, 2001. Generalized, Linear, and Mixed Models
348 pages
McCullagh - GLM
100% (11)
McCullagh - GLM
526 pages
Stable Convergence and Stable Limit Theorems: Erich Häusler Harald Luschgy
No ratings yet
Stable Convergence and Stable Limit Theorems: Erich Häusler Harald Luschgy
231 pages
An Introduction to Statistical Computing: A Simulation-based Approach
From Everand
An Introduction to Statistical Computing: A Simulation-based Approach
Jochen Voss
No ratings yet
An Introduction To Order Statistics: Mohammad Ahsanullah, Valery B. Nevzorov, Mohammad Shakil
No ratings yet
An Introduction To Order Statistics: Mohammad Ahsanullah, Valery B. Nevzorov, Mohammad Shakil
241 pages
David Williams - Weighing The Odds A Course in Probability and Statistics
100% (1)
David Williams - Weighing The Odds A Course in Probability and Statistics
567 pages
Robust Nonparametric Statistical Methods Second Edition
100% (3)
Robust Nonparametric Statistical Methods Second Edition
532 pages
Fox 2016 PDF
100% (1)
Fox 2016 PDF
817 pages
Fundamentals of Statistical Inference: What Is The Meaning of Random Error?
100% (1)
Fundamentals of Statistical Inference: What Is The Meaning of Random Error?
141 pages
10.1007@978 3 319 28341 8 PDF
100% (4)
10.1007@978 3 319 28341 8 PDF
190 pages
A Companion For Mathematical Statistics
100% (1)
A Companion For Mathematical Statistics
696 pages
Kotz Correlation and Dependence
No ratings yet
Kotz Correlation and Dependence
237 pages
Theory of Stochastic Objects - Probability, Stochastic Processes, and Inference (PDFDrive)
100% (1)
Theory of Stochastic Objects - Probability, Stochastic Processes, and Inference (PDFDrive)
409 pages
All in Likelihood
No ratings yet
All in Likelihood
546 pages
Environmental and Ecological Statistics With R, Second Edition (Song S. Qian)
No ratings yet
Environmental and Ecological Statistics With R, Second Edition (Song S. Qian)
560 pages
(Selected Works in Probability and Statistics - Selected Works in Probability and Statistics) Jianqing Fan - Ya'acov Ritov - Chien-Fu Wu - Selected Works of Peter J. Bickel (2013, Springer) PDF
No ratings yet
(Selected Works in Probability and Statistics - Selected Works in Probability and Statistics) Jianqing Fan - Ya'acov Ritov - Chien-Fu Wu - Selected Works of Peter J. Bickel (2013, Springer) PDF
609 pages
(FreeCourseWeb - Com) 1493997599
100% (1)
(FreeCourseWeb - Com) 1493997599
386 pages
Sequential Analysis Hypothesis Testing and Changepoint Detection (Etc.) (Z-Library)
No ratings yet
Sequential Analysis Hypothesis Testing and Changepoint Detection (Etc.) (Z-Library)
600 pages
EpidemiologyUsingR PDF
No ratings yet
EpidemiologyUsingR PDF
302 pages
Damon Berridge - Robert Crouchley - Multivariate Generalized Linear Mixed Models Using R-CRC Press (2011)
No ratings yet
Damon Berridge - Robert Crouchley - Multivariate Generalized Linear Mixed Models Using R-CRC Press (2011)
284 pages
Previewpdf
No ratings yet
Previewpdf
27 pages
(Chapman & Hall - CRC Texts in Statistical Science) Babette A. Brumback - Fundamentals of Causal Inference With R-Chapman and Hall - CRC (2021)
No ratings yet
(Chapman & Hall - CRC Texts in Statistical Science) Babette A. Brumback - Fundamentals of Causal Inference With R-Chapman and Hall - CRC (2021)
249 pages
Modelos de Fragilidad en El Análisis de Supervivencia PDF
No ratings yet
Modelos de Fragilidad en El Análisis de Supervivencia PDF
320 pages
Pranab K Sen - Julio M Singer - Large Sample Methods in Statistics (1994) - An Introduction With Applications (2017, CRC Press) - Libgen - Li
No ratings yet
Pranab K Sen - Julio M Singer - Large Sample Methods in Statistics (1994) - An Introduction With Applications (2017, CRC Press) - Libgen - Li
395 pages
Generalized Linear Models
100% (9)
Generalized Linear Models
243 pages
(Springer Series in Statistics) Michael L. Stein (Auth.) - Interpolation of Spatial Data - Some Theory For Kriging-Springer-Verlag New York (1999)
No ratings yet
(Springer Series in Statistics) Michael L. Stein (Auth.) - Interpolation of Spatial Data - Some Theory For Kriging-Springer-Verlag New York (1999)
262 pages
Sheldon Ross Stochastic Processes
No ratings yet
Sheldon Ross Stochastic Processes
6 pages
RYAN, THOMAS P. - (Wiley Series in Probability and Statistics) Modern Regression Methods - (2
No ratings yet
RYAN, THOMAS P. - (Wiley Series in Probability and Statistics) Modern Regression Methods - (2
658 pages
Kollo, Rosen - Advanced Multivariate Statistics With Matrices (2005)
100% (2)
Kollo, Rosen - Advanced Multivariate Statistics With Matrices (2005)
503 pages
Виллемсе И., Ниелисани П. Статистические методы и навыки расчетов
100% (2)
Виллемсе И., Ниелисани П. Статистические методы и навыки расчетов
328 pages
An Introduction To Nonparametric Statistics-CRC Press (2020)
100% (1)
An Introduction To Nonparametric Statistics-CRC Press (2020)
225 pages
Diggle 2013 Statistical Analysis of Spatial and
No ratings yet
Diggle 2013 Statistical Analysis of Spatial and
69 pages
0817682554
No ratings yet
0817682554
211 pages
Multilevel Modeling Using R - Finch Bolin Kelley
100% (2)
Multilevel Modeling Using R - Finch Bolin Kelley
82 pages
Math Statistics
No ratings yet
Math Statistics
917 pages
Differential Equations A Modeling Approach PDF
No ratings yet
Differential Equations A Modeling Approach PDF
121 pages
Algebraic and Geometric Methods in Statistics
100% (9)
Algebraic and Geometric Methods in Statistics
448 pages
(Zurich Lectures in Advanced Mathematics) Guus Balkema-High Risk Scenarios and Extremes A Geometric Approach - European Mathematical Society (2007)
No ratings yet
(Zurich Lectures in Advanced Mathematics) Guus Balkema-High Risk Scenarios and Extremes A Geometric Approach - European Mathematical Society (2007)
391 pages
Harold Cramer-The Elements of Probability Theory and Some of Its Applications-Krieger Publishing Company (1973)
No ratings yet
Harold Cramer-The Elements of Probability Theory and Some of Its Applications-Krieger Publishing Company (1973)
276 pages
Introductions to Set and Functions
From Everand
Introductions to Set and Functions
Simone Malacrida
No ratings yet
Learning Probabilistic Graphical Models in R
From Everand
Learning Probabilistic Graphical Models in R
David Bellot
No ratings yet
Elementary Theory and Application of Numerical Analysis: Revised Edition
From Everand
Elementary Theory and Application of Numerical Analysis: Revised Edition
David G. Moursund
No ratings yet
Overview Of Bayesian Approach To Statistical Methods: Software
From Everand
Overview Of Bayesian Approach To Statistical Methods: Software
Vinaitheerthan Renganathan
No ratings yet
Key Words: Basic Statistical Tools, Degree of Dispersion, Measures of Central Tendency, Parametric Tests and
No ratings yet
Key Words: Basic Statistical Tools, Degree of Dispersion, Measures of Central Tendency, Parametric Tests and
16 pages
MSA Study
No ratings yet
MSA Study
15 pages
2016 Hkdse M1
No ratings yet
2016 Hkdse M1
6 pages
Lampiran 1 Irna Revisi FIX
No ratings yet
Lampiran 1 Irna Revisi FIX
18 pages
Stat 101.
No ratings yet
Stat 101.
3 pages
BusiForecasting Qs
No ratings yet
BusiForecasting Qs
13 pages
9 Regression Analysis
No ratings yet
9 Regression Analysis
38 pages
Gretl Empirical Exercise 2 - KEY PDF
No ratings yet
Gretl Empirical Exercise 2 - KEY PDF
3 pages
Statistics & Probability
No ratings yet
Statistics & Probability
11 pages
Probability Theory Cheat Sheet
No ratings yet
Probability Theory Cheat Sheet
10 pages
ARDL Coint EViews
No ratings yet
ARDL Coint EViews
13 pages
3TQ4 Final 2011
No ratings yet
3TQ4 Final 2011
2 pages
Poisson CDF Table
No ratings yet
Poisson CDF Table
6 pages
M.SC Statistics101011 PDF
No ratings yet
M.SC Statistics101011 PDF
35 pages
Application of Statistical Concepts in The Determination of Weight Variation in Coin Samples
No ratings yet
Application of Statistical Concepts in The Determination of Weight Variation in Coin Samples
2 pages
Taleb (2012) The Future Has Thicker Tails Than The Past Model Error As Branching Counterfactuals
No ratings yet
Taleb (2012) The Future Has Thicker Tails Than The Past Model Error As Branching Counterfactuals
11 pages
Seminar Maschinellem Lernen: An Improved Model Selection Heuristic For AUC
No ratings yet
Seminar Maschinellem Lernen: An Improved Model Selection Heuristic For AUC
19 pages
14632practicalsignificance 161017020922
No ratings yet
14632practicalsignificance 161017020922
25 pages
Probability and Random Variables Syllabus
No ratings yet
Probability and Random Variables Syllabus
2 pages
Modeling Uncertainty P2
No ratings yet
Modeling Uncertainty P2
11 pages
Power Failure - Why Small Sample Size Undermines
No ratings yet
Power Failure - Why Small Sample Size Undermines
12 pages
Panel Patent Data Using Poisson, - Ve Binomial and GMM
No ratings yet
Panel Patent Data Using Poisson, - Ve Binomial and GMM
32 pages
R Codes
No ratings yet
R Codes
23 pages
Effect of Greenhouse Height
No ratings yet
Effect of Greenhouse Height
9 pages
Bayesian Compendium One-Click Download
No ratings yet
Bayesian Compendium One-Click Download
17 pages
Groebner Business Statistics 7 Ch19 PDF
No ratings yet
Groebner Business Statistics 7 Ch19 PDF
29 pages
Inverse Problems: From Regularization To Bayesian Inference: Calvetti D, Somersalo E
No ratings yet
Inverse Problems: From Regularization To Bayesian Inference: Calvetti D, Somersalo E
37 pages
Kurtosis Presentation by YASHHH
No ratings yet
Kurtosis Presentation by YASHHH
9 pages
Institute of Actuaries of India: Subject CT3 - Probability & Mathematical Statistics
No ratings yet
Institute of Actuaries of India: Subject CT3 - Probability & Mathematical Statistics
14 pages