Lecture 1: Statistical Signal Processing

University of Michigan Laura Balzano
EECS 564: Estimation, Filtering, and Detection
Lecture 1: Statistical Signal Processing∗

Statistical inference is the process of drawing conclusions from data that are subject to un-
certainty or random variation, for example, observational errors or sampling variation1 . Classically,
in Statistical Signal Processing, the data we focus on are from communications systems, radar,
sonar, or imaging systems– measurements of time-varying phenomena that result in spatial or tem-
poral signals that we would like to use in order to gather or share information. Estimation and
Detection are also taught in statistics classes, with a focus on inferring a parameter of a population
(e.g. average life expectancy) and hypothesis testing. More recently, the statistics version and the
signal processing version are merging, as data analysis problems pop up with many combinations
of characteristics similar to both classical statistics and classical signal processing approaches.
We’d like to use what we know about the signals and measurement devices to do the best job
possible of extracting information from the signals. Statistical Signal Processing models measure-
ment or observation error (noise) as well as model error (bias) probabilistically.
We will denote our data or signal by x or x; The signal could be a discrete-time speech signal,
a digital image, a time-series from a discrete process like the day-end prices of the Dow-Jones
index, a collection of various users’ book ratings on goodreads.com, a list of medical tests and their
outcomes for a group of patients. Usually, x will denote a vector of n ∈ Z scalar measurements, i.e.
x ∈ Rn or x ∈ Cn .
This is why we sometimes will write x (and that is the notation of the textbook), which is often the
notation for a vector. This course will focus on four fundamental inference problems in statistical
signal processing2 .
• Detection: Suppose our signal x is a realization of a random variable X drawn from one of
two possible probability distributions, f1 (x) or f2 (x). When we measure x we need to decide
which model is better, f1 or f2 . This is called binary hypothesis testing. More generally, we
may have M different models to choose from. We call this M-ary hypothesis testing.
Example: Decoding. When a signal is communicated as a binary sequence, at each point

during the signal we wish to decode whether the signal was a 0 or 1. For comms systems that
use more symbols, this becomes M-ary hypothesis testing.
Example: Solar flares. Astrophysicists would like to develop better algorithms to detect
when a solar flare explodes on the sun. We may ask the question whether or not a solar flare
is occurring based on image frames from a video of the sun.
∗
Last updated January 6, 2016.
1
http://en.wikipedia.org/wiki/Statistical_inference
2
This and following material is heavily borrowed from notes by Professor Robert Nowak for his statistical signal
processing course at the University of Wisconsin.
1
• Parameter Estimation: Suppose now that our signal x is a realization of a random vari-
able X drawn from one of an infinite collection probability distributions, but we know that
collection is parameterized by parameters θ. For example, we may model life expectancy as
a Gaussian or normally distributed variable with unknown mean µ and variance σ 2 , and so
we let
µ
θ= 2 .
σ
There are infinitely many values θ could take, and in estimation we are choosing among these
infinitely many values for θ.
Example: sinusoidal parameter estimation for radar. Suppose we work at the DTW
air traffic control, and we send a sinusoidal signal (an electromagnetic wave) to detect whether
there are airplanes in the vicinity. That wave will bounce of an airplane and come back to our
antennae; we then receive a signal s(n) = A cos(2πf n + φ). We can estimate the distance to
the plane using the speed of our wave, but we have to know precisely when that wave arrives
back to compute that. It is crucial to get the phase φ of the received signal to know exactly
when the front edge arrived at the receiver. Also, the amplitude A is unknown because of
attenuation and the frequency f is unknown because of the doppler effect; the latter can also
be used to estimate the speed at which the plane is going. We would like to estimate all these
unknowns from the received signal.
Example: Estimating wildlife populations. Suppose now we work with the Michigan
DEQ (Department for Environmental Quality) and we’d like to understand the health of the
fish populations in the great lakes. To do that we would capture some fish, tag them, and
release them into the lakes. Then over time, we would capture more fish, see how many were
already tagged, and tag others. We’ll continue this sampling over time, and from these data
we’d like to estimate the total fish population.
• Signal Estimation or Prediction: In many problems we wish to predict the value of a

signal y given an observation of another related signal x. We can model the relationship be-
tween x and y using a joint probability distribution f (x, y). If we don’t know this distribution,
but we have some examples from it (i.e. pairs of x and y that go together) then this problem
is called learning.
Example: human health. Often we try to predict your short-term or long-term health
using proxies like your weight, blood pressure, cholesterol levels, hormone levels, etc. Those
are all things we can measure, but your health is the signal we really care about and often
only detect the “unhealthy” state when the person has gotten sick.
Example: image recognition. A canonical problem in computer vision and machine

learning is having the computer identify objects or scenery in images automatically just based
on the pixel values of the image itself. This seems almost an impossible problem and we had
only very weak recognition systems until about 5 years ago when deep learning came on the
2
scene. These algorithms train a model using hundreds of millions of labeled images, where
our signal x is the image pixels and y is a vector of labels of what objects are in the scene.
These models have proven to be very powerful, and yet we still don’t deeply understand why
they work.
• Filtering: Here we consider the case where the number n of measurements is increasing
over time. This situation arises whenever “streaming data” are being collected and need to
be processed in real-time or at least in an online fashion. We may model these data as a
random process, and filtering is the act of estimating parameters in an online fashion from a
realization of the random process as it reveals itself.
Example: speech separation. On Skype or G+ hangouts, the software must try to

separate out your voice from the input of your friend’s microphone so that you don’t hear
feedback.
Example: target tracking. Tracking a moving target or tracking markings from a moving
target. For example ecologists who are interested in animal’s movements may be able to collect
measurements like bird song or chipmunk chirps, and from those they can track the animal.
Driverless cars need to track the dotted lines on either side of them as well as the vehicle in
front in order to stay in the lane at a safe distance.
1 Sufficient Statistics
Definition 1. A statistic is a function of observed data, and may be scalar or vector valued.
Suppose we observe n scalar values x1 , . . . , xn . Consider some examples of statistics:
• Sample mean: x̄ = n1 ni=1 xi

P
• The data itself: [x1 . . . xn ]T
• An order statistic: x(1) = min{x1 , . . . , xn }
• An arbitrary function: [x21 − x2 sin(x3 ) e−x1 x2 ]T
• Likelihood ratios (which we will learn later) are a statistic; these are relevant to the detection
problem and are a sufficient statistic in that case.
Definition 2. Let X denote a random vector whose distribution is parameterized by θ, i.e.,
X ∼ fθ (x) .
The statistic T = τ (X) is a sufficient statistic for θ if the conditional distribution of X given T
is independent of θ.

Lecture 1: Statistical Signal Processing

Uploaded by

Copyright:

Available Formats

Lecture 1: Statistical Signal Processing

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 1: Statistical Signal Processing

Uploaded by

Copyright:

Available Formats

University of Michigan Laura Balzano

EECS 564: Estimation, Filtering, and Detection

Lecture 1: Statistical Signal Processing∗

Example: Decoding. When a signal is communicated as a binary sequence, at each point

• Signal Estimation or Prediction: In many problems we wish to predict the value of a

Example: image recognition. A canonical problem in computer vision and machine

Example: speech separation. On Skype or G+ hangouts, the software must try to

Suppose we observe n scalar values x1 , . . . , xn . Consider some examples of statistics:

• Sample mean: x̄ = n1 ni=1 xi

• The data itself: [x1 . . . xn ]T

• An order statistic: x(1) = min{x1 , . . . , xn }

• An arbitrary function: [x21 − x2 sin(x3 ) e−x1 x2 ]T

Definition 2. Let X denote a random vector whose distribution is parameterized by θ, i.e.,

You might also like