1 Coherent and Incoherent Modulation in OFDM: 1.1 Review of Differential Modulation
1 Coherent and Incoherent Modulation in OFDM: 1.1 Review of Differential Modulation
E
x
where x
k
comes from an M-PSK modulation and we normalize to ensure that
the transmit power is E
x
.
The constellations have values
E
x
(cos(
m
), sin(
m
)),
m
=
M
+(m1)
2
M
,
m = 0, . . . , M 1.
PSK modulation has approximate P
e
= 2Q(
_
E
x
sin(/M)
2
N
0
2
) = 2Q(
_
E
x
sin(/M)
2
N
0
2
)
Typically dierential modulation is based on PSK constellations. The idea
is if the amplitude is the same for every point in the constellation, then we
can encode the dierence between successive symbols. This eliminates the
need for any channel estimation. OFDM is ideal for dierential modulation
as dierential modulation requires at-fading channels (no equalization.)
We receive y
k
= z
k
+ n
k
.
We dierentially demodulate by nding the phase of y
k
y
k1
.
How does this work? Suppose is a constant value, then
y
k
y
k1
= (z
k
+ n
k
)(
k1
+ n
k1
)
= ||
2
|z
k1
|
2
x
k
E
x
+ z
k
n
k1
+
k1
n
k
+ n
k
n
k1
1
0 2 4 6 8 10 12
10
5
10
4
10
3
10
2
10
1
10
0
SNR (dB)
P
r
o
b
a
b
i
l
i
t
y
o
f
s
y
m
b
o
l
e
r
r
o
r
4 PSK
PSK (from simulation)
DPSK (from simulation)
PSK (approximation)
DPSK (approximation)
Figure 1: Probability of error for 4 PSK
So our signal is ||
2
|z
k1
|
2 x
k
E
x
= ||
2
E
x
x
k
and the noise is z
k
n
k1
+
k1
n
k
+n
k
n
k1
. If the SNR is high enough, we ignore the contribution of
n
k
n
k1
. (If the SNR is so low that this becomes important, were in trouble
anyway!).
Ignoring the cross-term, the noise is Gaussian (some of two Gaussian
random variables ) and assuming n
k
and n
k1
are independent, the variance
per dimension is 2E
x
||
2
N
0
/2. So the SNR is
||
4
E
2
x
2E
x
||
2
N
0
/2
=
||
2
E
x
N
0
and P
e
2Q(
_
||
2
E
x
N
0
sin(/M)).
Note if we had coherent demodulation with perfect knowledge of the chan-
nel, the demodulated signal would be
y
k
= ||
2
x
k
+
n
k
and the probabil-
ity of error for the demodulated signals would be P
e
2Q(
_
||
2
E
x
N
0
/2
sin(/M)),
i.e. the argument of the Q-function is 3dB larger. This is where people say
that you lose 3 dB with dierential modulation. What do you gain? No need
to estimate the channel no need for additional pilots, training data etc for
channel estimation.
2
0 2 4 6 8 10 12
10
2
10
1
10
0
SNR (dB)
P
r
o
b
a
b
i
l
i
t
y
o
f
s
y
m
b
o
l
e
r
r
o
r
8 PSK
PSK (from simulation)
DPSK (from simulation)
PSK (approximation)
DPSK (approximation)
Figure 2: Probability of error for 8 PSK
3
0 2 4 6 8 10 12 14 16 18 20
10
3
10
2
10
1
10
0
SNR (dB)
P
r
o
b
a
b
i
l
i
t
y
o
f
s
y
m
b
o
l
e
r
r
o
r
16 PSK
PSK (from simulation)
DPSK (from simulation)
PSK (approximation)
DPSK (approximation)
Figure 3: Probability of error for 16 PSK
4
Because we have a time-frequency grid, we can either dierentially mod-
ulate from time sample to time sample OR from subcarrier to subcarrier.
Both methods have pros and cons.
First lets consider dierential modulation from time sample to time sam-
ple.
1.2 Dierential Modulation from time sample to time
sample
Suppose the OFDM symbol is operating in a fading channel h(t, ) =
L1
l=0
l
(t)(
k
) where E[
l
(t)
l
(t + s)
] =
2
a,l
J
0
(2f
d
s) where f
d
=
f
c
v
and we assume
that the taps,
l
(t) and independent, zero-mean complex Gaussian random
variables (E[
l
(t)
k
(s)
] = (k l)
2
a,l
J
0
(2f
d
(t s)).
If we dierentially modulate from time-symbol to time symbol, then
x
1
(t) =
N1
k=0
z
k,1
e
j2
kt
NT
x
2
(t) =
N1
k=0
z
k,1
c
k,2
E
x
e
j2
k(t(NT+)
NT
We receive
y
1
(t) =
N1
k=0
H
k
(t)z
k,1
e
j2
kt
NT
+ n
1
(t)
y
2
(t) =
N1
k=0
H
k
(t + NT + )z
k,1
c
k,2
E
x
e
j2
k(t(NT+))
NT
+ n
2
(t)
After the DFT on both symbols we have
H
k
(t)z
k,1
+ n
k,1
H
k
(t + NT + )z
k,1
c
k
+ n
k,2
on each tone.
If the relative Doppler f
d
(NT +) is relatively small and the correlation
between H
k
(t) and H
k
(t+NT+) is very high, then the dierential detection
5
will work as we hope with only about a 3 dB dierence between coherent
and dierential detection. In a well-designed, slow-fading OFDM system,
the relative Doppler should be less than 1% anyway to avoid intercarrier
interference. So lets nd out how much additional error we should expect
to get.
First, lets nd the correlation between H
k
(t) and H
k
(t + NT + ).
E[H
k
(t)H
k
(t + NT + )
] = E[
L1
l=0
l
(t)e
j2
kT
l
NT
L1
m=0
m
(t + NT + )
e
j2
kT
m
NT
]
= E[
L1
l=0
l
(t)
l
(t + NT + )
]
= J
0
(2f
d
(NT + ))
L1
l=0
2
l
=
E[H
k
(t)H
k
(t + NT + )
]
E[H
k
(t)H
k
(t)
]
= J
0
(2f
d
(NT + ))
In dierential demodulation, the assumption is that the gain and phase
associated with 2 successive random variables is the same. What happens
when they are slightly dierent.
So suppose a
1
and a
2
are correlated complex Gaussian random variables
with correlation . Then we can write a
2
= a
1
+
1
2
u
2
where a
1
and
u
2
are uncorrelated.
Then if we do dierential detection on a
2
y
2
given a
1
y
1
we have
a
2
y
2
a
1
y
1
= ((a
1
+
_
1
2
u
2
)(z
1
c
2
) + n
2
)(a
1
z
1
+ n
1
)
= |z
1
|
2
(|a
1
|
2
+
_
1
2
u
2
a
1
)c
2
+ n
2
(a
1
z
1
) + n
1
(a
1
+
_
1
2
u
2
)z
2
This means the argument of the Q-function is
P
e
2Q(
_
E
2
x
|a
1
|
4
2
(1
2
)|a
1
|
4
E
2
x
+2E
x
|a
1
|
2
N
0
2
sin(
M
))
6
= 2Q(
_
E
x
|a
1
|
2
2
(1
2
)|a
1
|
2
E
x
+2N
0
2
sin(
M
))
= 2Q(
_
2
(1
2
)
2
+ 1/SNR
MFB
sin(
M
))
Notice that as SNR
MFB
, the approximate P
e
2Q(
_
2
2
(1
2
)
sin(
M
)),
i.e. there is an error-oor. However for a well-designed OFDM system, the
correlation between successive OFDM symbols should be relatively small.
Figure ?? shows the probability of error for a DPSK system with a cor-
relation of = 1 and a correlation of = J
0
(22f
d
(NT + )) where
f
d
(NT + ) = 0.01. Note we have a correlation of based on 2 symbols
away to take the worst case correlation.
1.3 Dierential Modulation from tone to tone
There are two problems with dierential modulation from tone to tone. The
rst is though adjacent tones are correlated, they are not identical. However,
we saw from the previous section that if the correlation is high enough, the
reduction in quality is not signicant. The more important problem has to
with synchronization. First lets address the issue of tone correlation and
then discuss the synchronization issue.
Given an Ltap channel, then the correlation between successive tones
is:
=
E[
L1
l=0
l
e
j2
l
k
NT
L1
m=0
m
e
j2
m
(k+1)
NT
]
L1
l=0
2
l
=
L1
l=0
2
l
e
j2
l
NT
L1
l=0
2
l
Lets rst take the case where
2
l
=
2
h
L
for all
2
l
and the taps are sample
spaced, i.e.
l
= Tl. Then
=
L1
l=0
e
j2lT
NT
L
7
0 1 2 3 4 5 6 7
0.98
0.985
0.99
0.995
1
1.005
number of taps
c
o
r
r
e
l
a
t
i
o
n
Figure 4: Correlation of adjacent tones given a 64 sample FFT and a cyclic
prex of length up to 8 samples
=
1 exp(
j2L
N
)
L(1 exp(
j2
N
))
= exp(
j(L 1)
N
)
sin(L/N)
Lsin(/N)
This represents a worst-case correlation in the sense that all the taps are
equally strong but in Figure 4 a correlation for a 64 sample with an length
8 cyclic prex OFDM symbol is shown. The correlation is still high enough
that the error oor Q(
_
2
1
2
sin(/M)) Q(
_
10
(
1.5) sin(/M)). That is
the error oor due to correlation is 15 dB. So the correlation is high enough
for several appliations that is is not an issue in this case.
1.3.1 Receiving the OFDM signal
The receiver lters then samples the OFDM signal, throws away the cyclic
prex, then takes the DFT. Throwing away the cyclic prex really means
8
determining where in the received signal to start the FFT.
Given
y(t) =
_
NT
k=0
H
k
c
k
e
j2
(t)k
NT
+ n(t) length of h() t (NT + )
This means we can start the FFT anywhere between length of h() and
(NT + ) without losing the FFT. In determining the start of the OFDM
signal, we can be a little sloppy where we start. See Figure 5 for an illustra-
tion of this. However there is a penalty. If we dont start at t = precisely,
there will be a phase rotation. That is starting the FFT at t = ( > 0)
then
y(n) =
NT
k=0
H
k
c
k
e
j2
(nT+)k
NT
+ n(nT)
=
NT
k=0
H
k
e
j2k
NT
c
k
e
j2
kn
N
+ n(nT)
x
k
= H
k
e
j2k
NT
c
k
+ n(k)
Due to not starting the FFT exactly at the right position, we have a phase
rotation. If we are doing coherent detection, estimating the channel after
the DFT, the phase rotation becomes part of the channel and part of the
estimation process. However, if we are doing dierential detection, we are not
counting on a phase rotation and the dierentially detected signal becomes
assuming the correlation between adjacent tones is 1:
|H
k
|
2
e
j2
NT
c
k
+ n
k
H
k1
+ n
k1
H
k
+ n
k
n
k1
= |H
k
|
2
e
phase
k
+j2
NT
+ n
k
H
k1
+ n
k1
H
k
+ n
k
n
k1
In other words, the non-detected phase rotation causes a rotation of the
PSK symbol which will lead to an error oor.
For example, suppose = T and = 64, then Figure 6 shows the Proba-
bility of error when the DFT is started by one sample too early.
2 Coherent Demodulation
One of the nice things about OFDM is it doesnt require equalization, so
lends itself to coherent demodulation.
9
cyclic extension
Safe region
for DFT
Figure 5: Safe region where the DFT can start without losing Fourier trans-
form properties
0 2 4 6 8 10 12
10
4
10
3
10
2
10
1
10
0
SNR (dB)
P
r
o
b
a
b
i
l
i
t
y
o
f
s
y
m
b
o
l
e
r
r
o
r
offset
differential
Figure 6: Safe region where the DFT can start without losing Fourier trans-
form properties
10
Given y
n
= H
n
c
n
+n
n
, we demodulate by nding
y
n
H
n
= c
n
+
n
n
H
n
. Note that
any sort of MMSE ltering doesnt make sense here as we have N orthogonal
channels. However, we do need to estimate the channel.
When estimating a channel, we use combinations of the following items:
1. Training data
2. Decision-directed data
3. Things we know about the symbol and/or the channel.
2.1 Training data
Training data or pilot symbols are data that is known at the receiver by
some agreement. For example a standard will dictate that certain symbols
in certain parts of the data packet will be known and have certain values. The
receiver will use these known values to help estimate the channel. Training
symbols are constellation symbols that are known. Pilot symbols, can take
on a dierent modulated form.
In channel estimation, often the goal is to rst get a gross estimate of the
channel, then to track the channel as the channel slowly changes. For that
reason, often there is an initial large burst of training followed by intermediate
training spread out through the remaining information symbols.
A pilot symbol could take the form of a narrow pulse in time (wideband
in frequency ) that is transmitted periodically to estimate the channel. For
example, suppose you could send an impulse response every 10 OFDM sym-
bols one could measure the pulse response at the receiver to estimate the
channel.
However, an impulse response is dicult to generate and/or t into a
given bandwidth. So a modied impulse response can be sent.
A training symbol (an OFDM symbol comprised entirely of known sym-
bols) can be transmitted say every N
th
OFDM symbol for channel estimation.
The diculty with training data is it occupies bandwidth and diminishes
capacity. However, too little training data may result in bad channel estima-
tion which will impact the performance and/or rate. (Bad performance can
be improved by error-control coding, or increased power all these cost rate
or power.)
The question is how much training is required?
11
Suppose you have a training symbol every N OFDM symbols. If the
fading is such that T
s
< 0.01, (where T
s
is the width of the OFDM symbol)
how much will the channel change between channel updates? Well if we wait
N symbols to update the training, then the relative Doppler is NT
s
. For
example if N = 10, then 1/10
th
of the data is used for training, but by the
10th symbol, the channel will have changed enough that some impairment
may results 10 0.01 = 0.1. So one solution is to use the training every 10
th
symbol, say, and have some training in the symbol.
How much training do you need in a symbol to estimate the channel?
Suppose you have an Ltap channel
L1
k=0
l
(t
l
). Then the frequency
response H
k
=
L1
l=0
l
exp(j2
l
k
NT
) has L separate parameters (not counting
the delays
l
) so the rank of this channel is on order L and you need at least
L training symbols or pilots to estimate the complete frequency response.
(Either you estimate selected H
k
and interpolate or estimate the
l
and nd
H
k
from that.)
2.2 Decision-directed data
Sometimes you can mitigate the need for using training data by also using
your information bearing data values. The idea is if you transmit y
n
=
H
k
x
k
+ n
k
and you estimate x
k
you can estimate
H
k
=
y
k
x
k
. The problem
with this is the data is unreliable and can lead to bursts of error in the
channel estimate. However, it is a thrifty solution in the sense that you are
not spending too much of your precious bandwidth on training symbols.
What happens when your training data is bad? You get a bad channel
esimate which in turn leads to bad decisions. However, often the system can
self-correct by stumbling onto the correct decision.
Ways to mitigate problems with error propagation? If the system has an
error-correction code, one can use the remodulated output from the decoded
data for decision direction. This has the advantage of being more reliable
but also has more delay.
Another way to mitigate the problem with decision-directed data is to
use properties of the channel to spot inconsistencies. That is, if you estimate
H
0
,
H
1
, . . . ,
H
N1
and you notice that the gain and phase of one of the esti-
mated frequency responses is very dierent from the others, you know that
there is probably something wrong (because the frequency response is highly
correlated) and you can use that information.
12
This leads to the idea of what do you know about the symbol and the
channel?
2.3 Properties of the channel
Suppose you had an OFDM symbol of training data, and you estimate
H
k
=
y
k
x
k
= H
k
+
n
k
x
k
. This is a relatively reliable (in the sense that the mean of
H
k
given H
k
is indeed H
k
) estimate of H
k
but its quite noisy. Can you improve
the estimate of the channel? Yes!
Even though the subcarriers are N independent channels, the frequency
response on each channel is correlated. We can use that correlation to smooth
the channel.
Consider a well designed OFDM symbol. First you know that the length
of the channel impulse response must be less than the size of the cyclic prex
. So this is information you can use in rening your estimate of the channel.
2.3.1 DFT ltering
Suppose you have a sample-spaced channel h(t) =
L1
l=0
l
(t Tl). You
nd an estimate
H
k
. You know that if you inverse discrete Fourier transform
H
k
that there should be zeros outside the length of the cyclic prex LT.
So one way you can lter the data is to inverse FFT your estimate of the
frequency response, then time-pass (rather than bandpass) lter the signal
and then go back to the frequency domain. This will reduce the noise in the
estimate of H
k
by a factor of
NT
!
This is a very nice idea but there is one problem not all analog channels
are sample-spaced!
Suppose your channel is h(t) =
L1
l=0
l
(t
l
). Then H
k
=
L1
l=0
l
exp(j2
k
l
NT
).
If we take the inverse discrete Fourier transform we have
1
N
N1
k=0
L1
l=0
l
exp(j2
k
l
NT
) exp(j2
kn
N
) =
1
N
N1
k=0
L1
l=0
l
exp(j2k
l
nT
NT
) =
1
N
L1
l=0
l
1 exp(j2
l
nT
T
)
1 exp(j2
l
nT
NT
)
=
13
1
N
L1
l=0
l
exp(j
(
l
nT)(N 1)
NT
sin(
l
nT
T
)
sin(
l
nT
NT
)
Because
l
is not necessarily an integer multiple of T, the term
sin(
l
nT
T
)
sin(
l
nT
NT
)
does not go to zero outside the cyclic prex, but leaks out. So, if you were
to cuto the channel outside the cyclic prex, youre actually getting rid of
important information. So the DFT is a nice concept, but has diculties.
2.3.2 Filtering based on the general knowledge of the channel
The time-pass ltering the IDFT of the frequency response is an approxima-
tion to a Wiener lter of a set of sample-spaced channels with uniform power
delay prole.
The goal is to nd a lter using knowledge of the cyclic prex that will
not hurt your estimate.
The ideal Wiener lter is (R
hh
+
2
I)
1
R
hh
.
So lets nd
R
hh
= E[
_
_
_
_
_
_
H
1
H
2
.
.
.
H
N1
_
_
_
_
_
_
_
H
1
H
2
. . .
H
N1
_
]
= E[Exp(
1
, . . . ,
L
)
_
_
_
_
1
.
.
.
L
_
_
_
_
_
1
. . .
L
_
Exp(
1
, . . . ,
L
)
L
]
= E[Exp(
1
, . . . ,
L
)
_
_
_
_
A
1
0 . . . 0
.
.
.
0 . . . 0 A
l
_
_
_
_
Exp(
1
, . . . ,
L
)
H
]
where
Exp(
1
, . . . ,
L
)
n,l
= exp(j2
n
l
NT
)
and the values
l
are uniformly distributed between 0 and .
14
E[Exp(
1
, . . . ,
L
)
_
_
_
_
A
1
0 . . . 0
.
.
.
0 . . . 0 A
l
_
_
_
_
Exp(
1
, . . . ,
L
)
H
] =
E[
_
_
_
_
_
L
l=1
A
l
L
l=1
A
l
e
j2
l
NT
. . .
L
l=1
A
l
e
j2(N1)
l
NT
.
.
.
L
l=1
A
l
e
j2
l
(N1)
NT
L
l=1
A
l
e
j2
l
(N2)
NT
. . .
L
l=1
A
l
_
_
_
_
_
] =
L
l=1
A
l
_
_
_
_
_
_
_
1
NT(1e
j2
NT )
j2
. . .
NT(1e
j2(N1)
NT )
j2(N1)
.
.
.
NT(1e
j2(N1)
NT )
j2(N1)
NT(1e
j2(N2)
NT )
j2(N1)
. . . 1
_
_
_
_
_
_
_
15