Message Passing Based Detection For Orthogonal Time Frequency Space Modulation

Message Passing Based Detection

for Orthogonal Time Frequency

Space Modulation

YUAN Zhengdao1, LIU Fei2, GUO Qinghua3, WANG Zhongyong2

(1. The Open University of Henan, Zhengzhou 450000, China;
2. Zhengzhou University, Zhengzhou 450001, China;
3. University of Wollongong, Wollongong NSW 2522, Australia)

Abstract: The orthogonal time frequency space (OTFS) modulation has emerged as a promis⁃
ing modulation scheme for wireless communications in high-mobility scenarios. An efficient
detector is of paramount importance to harvesting the time and frequency diversities promised
by OTFS. Recently, some message passing based detectors have been developed by exploiting DOI: 10.12142/ZTECOM.202104004

the features of the OTFS channel matrices. In this paper, we provide an overview of some re⁃
on potential research on the design of message passing based OTFS receivers.
on potential research on the design of message passing based OTFS receivers. online November 23, 2021

Manuscript received: 2021-10-10

Keywords: OTFS; detection; message passing; belief propagation; approximate message pass⁃
ing (AMP); unitary AMP (UAMP)

Citation (IEEE Format): Z. D. Yuan, F. Liu, Q. H. Guo, et al.,“Message passing based detection for orthogonal time frequency space mod⁃
ulation,”ZTE Communications, vol. 19, no. 4, pp. 34–44, Dec. 2021. doi: 10.12142/ZTECOM.202104004.

1 Introduction OFDM in high mobility scenarios[7].

ecently the orthogonal time frequency space (OTFS) To harvest the diversities promised by OTFS, the design of
modulation has attracted much attention due to its ca⁃ a powerful detector is paramount. The optimal maximum a pos⁃
pability of achieving reliable communications in high teriori (MAP) detector is impractical due to its complexity
mobility scenarios[1–6]. OTFS is a two-dimensional growing exponentially with the length of the OTFS block. Re⁃
modulation scheme, and the information is modulated in the cently, significant efforts have been devoted to the design of
delay Doppler (DD) domain, which is in contrast to the time more efficient detectors. In Ref. [8], an effective channel ma⁃
frequency (TF) domain modulation in the orthogonal frequen⁃ trix in the DD domain was derived, based on which a low-com⁃
cy division multiplexing (OFDM). In OTFS, each symbol plexity two-stage detector was proposed. The first-order Neu⁃
spreads over the time and frequency domains through the two mann series was used in Ref. [9] to approximately solve the
dimensional (2D) inverse symplectic finite Fourier transform matrix inverse problem involved in the linear minimum mean
(SFFT), leading to both time and frequency diversities[1–2]. It squared error (MMSE) estimation based detection. A detection
has been shown that OTFS can significantly outperform scheme was developed in Ref. [10], where the MMSE equaliza⁃
tion was used in the first iteration, followed by parallel inter⁃
ference cancellation with a soft-output sphere decoder in sub⁃
sequent iterations. A rectangular waveform was considered in
This work was supported by the National Natural Science Foundation of Chi⁃ Ref. [11], where the sparsity and quasi banded structure of
na (61901417, U1804152, 61801434) and Science and Technology Re⁃
search Project of Henan Province (212102210556, 212102210566,
channel matrices without fractional Doppler shifts were ex⁃
212400410179). ploited to reduce the detection complexity. The linear equaliz⁃

Message Passing Based Detection for Orthogonal Time Frequency Space Modulation Special Topic
YUAN Zhengdao, LIU Fei, GUO Qinghua, WANG Zhongyong

ers were extended to the multiple input and multiple output leading to very efficient implementation using the 2D fast Fou⁃
(MIMO) -OTFS systems in Ref. [12]. A cross-domain method rier transform (FFT) algorithm. In addition, as the noise vari⁃
was proposed in Ref. [13], where a conventional linear MMSE ance is normally unknown, the noise variance estimation is al⁃
estimator is adopted for equalization in the time domain and a so incorporated into the UAMP-based detector in Ref. [25].
low-complexity symbol-by-symbol detection is utilized in the In this paper, we provide an overview of the message pass⁃
DD domain. A low complexity iterative rake decision feedback ing based detectors, provide some comparison results, and dis⁃
detector was proposed in Ref. [14], which extracts and coher⁃ cuss potential research on the design of message passing
ently combines the multiple copies of the symbols (due to mul⁃ based OTFS receivers. The notations used in this paper are as
tipath propagation) in the DD grid using maximal ratio com⁃ follows. Boldface lower-case and upper-case letters denote
bining (MRC). vectors and matrices, respectively. We use ( ⋅ ) H and ( ⋅ ) T to de⁃
Another line of OTFS detector design is based on factor note the conjugate transpose and the transpose, respectively.
graphs and message passing techniques[15, 23]. When the num⁃ The superscript * denotes the conjugate operation. We define
ber of channel paths is small, the effective channel matrix in [ ⋅ ] M as the mod M operation. We use N ( x|x̂ ,ν x ) to denote the
the DD domain is sparse, which allows efficient detection us⁃ probability density function of a complex Gaussian variable
ing the message passing algorithm (MPA) [2]. An expectation with mean x̂ and variance ν x. The notation f ( x ) q(x ) denotes
propagation (EP) algorithm was proposed in Ref. [16], where
the expectation of the function f ( x ) with respect to the distri⁃
EP is used for message update with Gaussian approximation.
bution q ( x ). The relation f ( x) = cg ( x) for some positive con⁃
A variational Bayes (VB) based detector was proposed in Ref.
[17] to achieve better convergence. Studying the matched fil⁃ stant c is written as f ( x) ∝ g ( x). The notation ⊗ represents
tering processing, the authors in Ref. [18] proposed a message the Kronecker product, and a ⋅ b and a ⋅ /b represent the com⁃
passing detector, which is combined with a probability clip⁃ ponent-wise product and the division between vectors a and b,
ping solution. The detectors in Refs. [2, 17, 19] take advan⁃ respectively. We use X = reshape M ( x ) to denote that the vec⁃
tage of the sparsity of the channel matrix in the DD domain, tor x is reshaped as an M × N matrix X column by column,
and their complexity depends on the number of nonzero ele⁃ where the length of x is MN, and use x = vec ( X ) to represent
ments in each row of the channel matrix, which is denoted by vectorization of matrix X column by column. The notation
S. Without considering fractional Doppler shifts, S is equal to diag (a ) represents a diagonal matrix with the elements of a as
the number of channel paths. In general, a wideband system its diagonal. We use |A|2 to denote the element-wise magni⁃
can provide sufficient delay resolution. The Doppler resolu⁃ tude squared operation for matrix A. The notations 1 and 0 are
tion depends on the time duration of the OTFS block. To fulfill used to denote an all-ones vector and an all-zeros vector with
the low latency requirement in wireless communications, the a proper length, respectively. The j-th entry of q is denoted by
time duration of an OTFS block should be relatively small, q j. The superscript t of st denotes the iteration index of the vari⁃
where it is necessary to consider fractional Doppler shifts [2, 20]. able s involved in an iterative algorithm.
In this case, the value of S can be significantly larger than the
number of channel paths. In the case of rich scattering envi⁃
ronments, the complexity of these detectors can be a concern 2 System Model
and the short loops in the corresponding system graph model The modulation and demodulation for OTFS are illustrated
may result in significant performance. To overcome the above in Fig. 1, which are implemented with the 2D inverse SFFT
issues, the design of OTFS detectors based approximate mes⁃ (ISFFT) and SFFT at the transmitter and receiver, respective⁃
sage passing (AMP) [21–22] was investigated in Ref. [25]. AMP ly[1, 24]. Before the OTFS modulation, a (coded) bit sequence is
works well for independent and identically distributed (sub- ) mapped to symbols x [ k,l ] ,k = 0,...,N - 1, l = 0,...,M - 1 in
Gaussian system transfer matrix, but it suffers from perfor⁃ the DD domain, where x [ k,l ] ∈ A = { α 1 ,...,α |A| }, |A| is the car⁃
mance loss or even diverges for a general system transfer ma⁃ dinality of A, l and k denote the indices of the delay and Dop⁃
trix[27–29]. Instead, the works in Refs. [25 – 26] resort to the pler shifts, respectively, and N and M are the number of grids
unitary AMP (UAMP) [27–29], which is a variant of AMP and
was formerly called UTAMP[27]. In UAMP, a unitary transfor⁃
mation of the original model is used, where the unitary matrix x[k, l] x[n, m] s(t) r(t) y[n, m] y[k, l]
for transformation can be the conjugate transpose of the left ISFFT
h(τ, ν)
singular matrix of the general system transfer matrix [27] ob⁃ Time-frequency domain
tained through singular value decomposition (SVD). It is Delay-Doppler domain
shown in Ref. [25] that UAMP is well suitable for OTFS due to
ISFFT: inverse symplectic finite Fourier transform
the structure of block circulant matrix with circulant block SFFT: symplectic finite Fourier transform
(BCCB) of the DD domain channel matrices, where the 2D dis⁃ ▲ Figure 1. Modulation and demodulation in an orthogonal time fre⁃
crete Fourier transform is used for the unitary transformation, quency space (OTFS) system[2]

Special Topic Message Passing Based Detection for Orthogonal Time Frequency Space Modulation
YUAN Zhengdao, LIU Fei, GUO Qinghua, WANG Zhongyong

of the DD plane. At the transmitter side, ISFFT is performed Ni

y [ k,l ] = ∑ ∑ h i x [ k - k i + c ] N ,

to convert the DD domain symbols to signals in the time-fre⁃ i = 0 c = -N i

quency (TF) domain. -j2π ( -c - κ i ) li ( ki + κi )

1 [ l - li ] M ) N1 1 - e e
MN + ω [ k,l ]
∑ ∑
nk ml
j2π ( - ) -c - κ i
X tf [ n,m ] = x [ k,l ] e
N-1 M-1
, (8)

k=0 l=0
. (1) 1- e N

where N i < N is an integer, and ω [ k,l ] is the noise in the DD

After that, the signals X tf [ m,n ] in the TF domain are con⁃
domain. We can see that for each path, the transmitted signal
verted to a continuous-time waveform s ( t ) using the Heisen⁃
is circularly shifted, and scaled by a corresponding channel
berg transform with a transmit waveform g tx ( t )[2], i.e., gain. We arrange {x [ k,l ]} as a vector x ∈ C MN × 1, where the j-
th element x j is x [ k,l ] with j = kM + l. Similarly, a vector
s( t) = ∑n = 0∑m = 0 X tf [ n,m ] g tx ( t - nT ) ej2πmΔf ( t - nT ) ,
N-1 M-1
y ∈ C MN × 1 can also be constructed based on y [ k,l ]. Then Eq.
(8) can be rewritten in a vector form as:
where Δf is subcarrier spacing and T = 1/Δf. Then the signal
s ( t ) is transmitted over a time-varying channel and the re⁃ y = Hx + ω, (9)
ceived signal in the time domain is given as:
where H ∈ C MN × MN is the effective channel matrix in the DD
r ( t) = ∫∫h (τ,ν ) s( t - τ ) e j2πν ( t - τ)
dτdν, (3) domain, and ω denotes a white Gaussian noise with mean 0
and variance ϵ-1 (or precision ϵ). The channel matrix H in Eq.
where h (τ,ν ) is the channel impulse response in the continu⁃ (9) can be represented as[25]:
ous DD domain, and it can be expressed as[1]: Ni

H = ∑ ∑ I N ( -[ c - k i ] N ) ⊗ [ I M (l i )h i ×

h (τ,ν ) = ∑i = 0 h i δ (τ - τ i )δ ( ν - ν i ) ,
(4) i = 0 c = -N i

-j2π ( -c - κ i ) li ( ki + κi )
1- e -j2π
with δ ( ⋅ ) being the Dirac delta function, P being the num⁃ -c - κ i
e MN ]
ber of channel paths, and h i, τ i and ν i being the gain, delay N - Ne
N , (10)
and Doppler shift associated with the i-th path, respective⁃
ly. The delay and Doppler-shift taps for the i-th path are giv⁃ where I N ( -[ q - k i ] N ) denotes an N × N matrix obtained by
en by circularly shifting the rows of the identity matrix by
-[ q - k i ] N , and I M (l i ) is obtained similarly. Without fraction⁃
li ki + κi
τi = al Doppler, i.e., κ i = 0, the channel matrix H is reduced to
MΔf , ν i = NT , (5)
é li ki
H = ∑I N ( k i ) ⊗ êêI M (l i )h i e MN úú
where l i and k i are the delay and Doppler indices of the i-th
path, and κ i ∈ [ -1 2 , 1 2 ] is a fractional Doppler associat⁃
i=0 ë û. (11)
ed with the i-th path. In the above equation, MΔf is the system
bandwidth and NT is the duration of an OTFS block.
3 Message Passing (MP) Based Detectors
Based on the model (9) in the DD domain, several detectors
At the receiver, a receive waveform g rx ( t ) is used to trans⁃
have been proposed using the message passing techniques.
form the received signal r ( t ) to the TF domain, i.e.,

3.1 MP Detector in Ref. [2]
Y ( t,f ) = g *rx ( t′ - t) r ( t′)e-j2πf ( t′ - t) dt′, (6) In model (9), the MN × MN DD domain complex channel
matrix H is sparse (especially in the case without fractional
which is then sampled at t = nT and f = mΔf, yielding Doppler shifts), which makes belief propagation suitable for im⁃
Y [ n, m ]. Then SFFT is applied to Y [ n, m ] to generate the DD plementing the OTFS detectors. In Eq. (2), y and ω are length-
domain signal y [ k,l ], i.e., MN complex vectors with elements denoted by y [ d ] and ω [ d ],
1 ≤ d ≤ MN, the element of H is denoted by H [ d,c ], 1 ≤ d,c ≤
1 MN, x is a length-MN symbol vector with elements x [ c ] ∈ A,
∑∑Y [ n,m ] e
N - 1M - 1 nk ml
-j2π ( - )
y [ k,l ] = N M

MN n=0m=0 . (7) 1 ≤ c ≤ MN, and A denotes the modulation alphabet.

Thanks to the sparsity of H , the joint distribution of the ran⁃
Assuming that the transmitted waveform and the received dom variables in model (9) can be represented with a sparsely-
waveform satisfy the bi-orthogonal property[1], in the DD do⁃ connected factor graph with MN variable nodes corresponding
main we have the input-output relationship[2]. to x and MN observation nodes corresponding to y. As shown in

Message Passing Based Detection for Orthogonal Time Frequency Space Modulation Special Topic
YUAN Zhengdao, LIU Fei, GUO Qinghua, WANG Zhongyong

Fig. 2, each observation node y [ d ] is connected to a set of vari⁃ with

( )
able nodes { x [ e s ] ,e s ∈ I ( d ) }, and similarly, each variable 2
node x [ c ] is connected to a set of observation nodes
ςi ( e,c,k) = exp
- y [ e ] - μ ie,c - H [ e,c ] a k |
y [ e s ] , e s ∈ J [ c ], where I ( d ) and J ( c) respectively denote ( σ ie,c )2
the sets of indexes of non-zero elements in the d-th row and c-th . (16)
columns of H, | I ( d ) | = | J ( c) | = S and 1 ≤ s ≤ S . The proba⁃
After a certain number of iterations by repeating 1) and 2),
bility mass function (PMF) p c,e s = { p c,e s (a j )|a j ∈ A } represents
the decision on the transmitted symbol can be obtained, i.e.,
the messages from variable nodes x [ c ] to factor nodes y [ e s ].
Based on the factor graph in Fig. 2, a message passing algo⁃ x̂ [ c ] = arg min a j ∈ A p ic (a j ), c = 1,...,MN, (17)
rithm was proposed in Ref. [2], and the detector is called MP
detector in this paper. The following is a brief derivation of the where
message computations in the i-th iteration of the message com⁃
ςi ( e,c,j )
putations. p ic (a j ) = ∏
1) Messages passing from observation node y [ d ] to vari⁃ e ∈ J ( c) i
( e,c,j )
. (18)
able node x e s [ ] k=1

The message is approximated to be Gaussian, and the mean The MP detector is summarized in Algorithm 1.
μ id,e s and variance ( σ id,e s )2are computed as
Algorithm 1. MPA detector in Ref. [2]

∑ ∑p Input: y, H, Initialize: p 0c,e s = 1/|A||, c = 1,...,MN,


μ id,e s = i-1
e,d (a j )a j H [ d,e ]
e ∈ J d ,e ≠ e s j = 1 , (12) e s ∈ J ( c), i = 1
1: Repeat
2: ∀d: update μ id,e s and ( σ id,e s )2 with Eqs. (12) and (13)
( σ id,e s )2 =
3: ∀c: update p c,d i with Eq. (14)
∑ ∑p (
e ∈ J d ,e ≠ e s j = 1
| Q
( a j )|a j|2|H [ d,e ] |2 - ||∑p ie,d- 1 ( a j )a j H [ d,e ] || + ϵ-1
| .
) 4: i = i + 1
5: Until terminate
Output: The decision on transmitted symbols x̂ [ c ] using
Eq. (17)
2) Messages passing from variable node x [ c ] to observation The MP algorithm shown above is an approximation to
node y [ e s ] loopy belief propagation since it approximates the interfer⁃
The PMF p ic,d can be updated as ence to be Gaussian to achieve lower complexity. The com⁃
plexity of the algorithm is O ( MNS|A|) per iteration, which
p ic,e s (a j ) = Δ ⋅ p͂ ic,e s (a j ) + (1 - Δ) ⋅ p͂ ic,e-s 1 (a j ), (14) depends on the sparsity of the channel, i. e., the value of S.
When S is small, the detector is very attractive because it
where Δ ∈ [ 0,1 ] is the damping factor and has low complexity and the detector delivers a good perfor⁃
mance as no short loops in the factor graph model. However,
p͂ ic,e s (a j ) ∝ ∏ Pr ( y [ e ] |x [ c ] = a j ,H ) = in the case of rich-scatting environments and fractional Dop⁃
e ∈ J ( c),e ≠ e s
pler shifts, the value of S can be large, leading to a denser
ςi ( e,c,j ) factor graph model, which can affect the performance of the
∏ MP detector and result in a significant increase in computa⁃
e ∈ J ( c),e ≠ e s i
( e,c,j ) tional complexity.
k=1 , (15)
3.2 VB Detector
y[d] y[e1] y[es] The VB detector was proposed in Ref. [17] to guarantee the
convergence of the iterative detector, which can be implement⁃
( μ d,e 1 , σ 2
d,e 1 ) ( μ d,e s , σ 2d,e s ) ed with variational message passing. With model (9), the opti⁃
P c,e 1 P c,e s mal MAP detection can be formulated as:
x[e1] ... x[es] x[c]
x̂ = arg max x p ( x|y ). (19)
{e ,e ,...,e } = I
1 2 s d {e ,e ,...,e } = J
1 2 s c

▲ Figure 2. Graph representation used to derive the message passing However, the complexity of solving the above optimization
(MP) detector in Ref. [2] problem increases exponentially with the size of x. VB is ad⁃

Special Topic Message Passing Based Detection for Orthogonal Time Frequency Space Modulation
YUAN Zhengdao, LIU Fei, GUO Qinghua, WANG Zhongyong

opted to achieve low complexity approximate detection. In this To find a stationary point of L, the partial derivations of L
method, a distribution q ( x ) from a tractable distribution fami⁃ with respect to all local functions q k,l ( x k,l ), ∀k, l need to be ze⁃
ly Q is found as an approximation to the a posteriori distribu⁃ ro. Take the latent variable x k,l as an example. Setting the par⁃
tion p ( x|y ). The trial distribution q ( x ) can be obtained by min⁃ tial derivation ∂L/∂q k,l ( x k,l ) to zero leads to:
imizing the Kullback-Leibler divergence D ( q||p), i.e.,
é ∑ϱ k,l,k′,l′ x k,l x k′,l′ ù
ê k ,l ú
q* ( x ) = arg max D ( q||p) = E q\k,l ê - ′ ′ ú + lnζ k,l ( x k,l ) - ln q k,l ( x k,l ) + C = 0
ê ϵ -2
arg max E q [ -ln q ( x ) + ln p ( x|y) ] ë û ,
 , (20)
L (27)

where the expectation is taken over x according to the trial dis⁃ where q k,l = ∏( k′,l′) ≠ ( k,l) q iter
( x k,l ), q iter
( x k,l ) is obtained in
tribution q ( x ). the (iter - 1)-th iteration and C denotes a constant.
To simplify the optimization problem, q(x) is assumed to be Then, solving Eq. (27) for q k,l ( x k,l ) results in the local distri⁃
fully factorized, i.e., bution, which can be expressed as:
q ( x ) = ∏k,l q k,l ( x k,l ) , (21) æ é
ç ê
q ( x k,l ) ∝ ζ k.l ( x k,l )exp çE q\k,l ê -
where k ∈ [ 0,N - 1 ], M ∈ [ 0,M - 1 ] and x k,l denotes the k,l
ç ê
( kM + l)-th entry of x. With this assumption, q ( x ) can be up⁃ è ë
dated iteratively by maximizing L. Since the noise sample ω k,l ∑
( )
ϱ x k,l x k′,l′ ùö
and data symbol x k,l, ∀k,l are independent, and
ú÷ ρ k,l|x k,l|2 - m k,l d k,l
∝ ( )exp -
k ,l
ú÷ p x
ω k,l ~CN ( ω k,l ; 0,ϵ-1 ), p ( x|y ) can be rewritten as: ϵ-2 ϵ-2
ûø , (28)
p ( x|y ) ∝ ∏k,l p ( x k,l ) p ( y k,l|y ), (22)
where m k,l = η k,l - ∑( k′,l′) ≠ ( k,l) ϱ k,l,k′,l′ E q iter - 1 x [ k′,l′].
k ,l

where y k,l = h x + ω k,l, h k,l denotes the equivalent channel

k,l It is noted that the variance of x k,l is underestimated and on⁃
vector whose ( kM + l)-th entry is h k,l [ k,l ]. Then the distribu⁃ ly the noise variance is considered in Eq. (28). To fix the un⁃
tion p ( x|y ) can be further rewritten as: derestimation, a practical solution is to repeat the above proce⁃
dure to approximate the a posteriori distribution for all the da⁃
p ( x|y ) ∝ ∏ζ k,l ( x k,l ) ∏ψ k,l ( x k,l ,x k′,l′) ta symbols iteratively, resulting in the approximate marginal
, (23)
q *k,l ( x k,l ), ∀k,l. Then, the decision on the symbols can be made
k,l k′,l′

by maximizing the approximate marginal distribution q *k,l ( x k,l ),


ζ k,l ( x k,l ) = p ( x k,l )exp -
ρ k,l|x k,l|2 + η k,l x k,l
ϵ-2 )
, (24)
x̂ k,l = arg max q *k,l ( x k,l )
x ∈A k,l . (29)

The complexity of the algorithm per iteration is O ( MNS|A|).

ψ k,l ( x k,l ,x k′,l′) = exp -
ϱ k,l,k′,l′ x k,l x k′,l′
ϵ-2 )
, (25)
3.3 UAMP Detector
Leveraging the UAMP algorithm, the UAMP detector was
developed in Ref. [25], where the BCCB structure of the DD
with ρ k,l = ∑k′,l′|h k′,l′ ( k,l)|2, η k,l = 2∑k′,l′R h k′,l′ [ k,l ] ⋅ y k,l′ ,
[ ] domain channel matrix is exploited, leading to a highly effi⁃
[ ]
and ϱ k,l,k′,l′ = 2R h k,l [ k,l ] h [ k′,l′] . Substituting p ( x|y ) in
cient OTFS detector with 2D FFT. It can be seen from Eqs.
Eq. (23) and q ( x ) into L yields (10) and (11) that the DD domain channel matrix H has a BC⁃
CB structure. A useful property of the BCCB matrix H is that
é q k,l ( x k,l ) ù it can be diagonalized using 2D Discrete Fourier Transform
L = E q êê∑lnψ k,l ( x k,l ,x k′,l′) - ∑ln ú= matrix, i.e.,
ë k,l k,l ζ k,l ( x k,l ) úû

é ∑ϱ k,l,k′,l′ x k,l x k′,l′ ù H = F H ΛF, (30)

q k,l ( x k,l ) ú
- ∑ln
Eq ê -

ζ k,l ( x k,l ) ú where F = F N ⊗ F M with F N and F M being respectively the
ê k,l
ë û. (26) normalized N-point and M-point DFT matrices. In Eq. (30),

38 December 2021 Vol. 19 No. 4
YUAN Zhengdao, LIU Fei, GUO Qinghua, WANG Zhongyong

matrix Λ is a diagonal matrix, i. e., Λ = diag ( d ), and d is a

length-MN vector that can be computed using 2D FFT.
f r1
z1 f δ1 x1
f x1
d = vec(FFT2 (C ) ), (31)

where FFT2 ( ⋅ ) represents the 2D FFT operation, C = f

f rj zj f δj xi
reshape M ( H(:,1) ) is an M × N matrix, and H (:,1) with length- ϵ f xi

MN is the first column of matrix H.

The above property is exploited in the design of the UAMP f rMN z MN f δMN x MN
detector, leading to high computational efficiency while with f xMN
outstanding performance compared with the existing detectors.
Instead of using model (9) directly, the UAMP algorithm[27–29]
works with the unitary transform of the model. The channel ▲Figure 3. Factor graph representation of Eq. (31)
matrix H admits the diagonalization in Eq. (30), leading to the
following unitary transform of the OTFS system model: form. With the mean field rule[23] at the function node f r j, we

r = ΛFx + ω', (32) can compute the message passed from function node f r j to vari⁃
able node ϵ, i.e.,
where r = Fy, ω' = Fω, and the noise ω' has the same distri⁃
bution with ω as F is an unitary matrix. The precision of the
noise is still denoted by ϵ, which needs to be estimated. De⁃
fine Φ = ΛF and an auxiliary vector z = Φx. Then we can fac⁃
m fr
→ϵ (ϵ) ∝ exp
{ log f r j ( r j|z j ,ϵ)
b zj } {
∝ ϵexp - ϵ(|r j -

torize the joint distribution of the unknown variables x,z,ϵ giv⁃ ẑ j|2 + v z j ) , } (34)
en r as
p ( x,z,ϵ|r ) = p (ϵ) p ( r|z,ϵ) p ( z|x ) p ( x ) = where b( z j ) is the belief of z j. It turns out that b( z j ) is also
Gaussian with its variance and mean given by
p (ϵ) ∏j p ( r j|z j ,ϵ) p ( z j|x ) ∏i p ( x i ) =
f ϵ ∏j f r j ( z j ,ϵ) f δ j ( z j ,x ) ∏i f x i ( x i ) , (33) ( )
ν z j = 1/ 1/ν p j + ϵ̂ , ẑ = ν z j p j /ν p j + ϵ̂r j , ( ) (35)

where indices i,j ∈ [ 1:MN ]. To facilitate the factor graph rep⁃ respectively, where ϵ̂ is the estimate of ϵ in the last iteration.
resentation of the factorization in Eq. (33), the relevant nota⁃ They can be expressed in a vector form shown in Lines 3 and
tions are listed in Table 1, which shows the correspondence 4 in Algorithm 2. The estimate of ϵ can be obtained based on
between the factor nodes and their associated distributions. the belief b(ϵ) at the variable node ϵ shown in Fig. 3, i.e.,
The factor graph representation for the factorization in Eq.
b(ϵ) ∝ f ϵ (ϵ) ∏m f r
(33) is depicted in Fig. 3. →ϵ (ϵ)
Following the UAMP algorithm, a UAMP based iterative de⁃ j=1 j . (36)
tector can be designed, which is summarized in Algorithm 2.
According to the derivation of (U)AMP using loopy belief prop⁃ And the estimate is given as
agation, UAMP provides the message from variable node z j to
∫ ϵb(ϵ) dϵ = MN/∑( |r ),
∞ MN

function node f r j, which is Gaussian and denoted by ϵ̂ = - ẑ j|2 + ν z j

0 j=1

m z j → f r ( z j ) = N ( z j|p j ,ν p j ). Here, the mean p j and the variance

which can be rewritten in a vector form shown in Line 5 of the

ν p j are given in Lines 1 and 2 of the Algorithm in a vector

algorithm. With the mean field rule at the function node f r j
▼Table 1. Factors, underlying distributions and functional forms asso⁃ again, the message passed from the function node f r j to the
ciated with Eq. (31) variable node z j can be computed as:
Factor Distribution Function Form
f rj

f δj
p r j|z j ,ϵ

( )
p z j|x
) (
δ (z- Φ x)
N z j ; r j ,ϵ-1

m fr
→ zj ( z j ) ∝ exp { log f r j ( r j|z j ,ϵ̂ )
b (ϵ)
} ∝ N ( h j|r j ,ϵ̂-1 )
. (38)

p (x ) (1/|A|) ∑ δ ( x - α )
Then the UAMP algorithm with known noise can be used as
f xi i
i a

if the true noise precision is ϵ̂ , leading to Lines 6 – 15 and

fϵ p(ϵ) ϵ

Special Topic Message Passing Based Detection for Orthogonal Time Frequency Space Modulation
YUAN Zhengdao, LIU Fei, GUO Qinghua, WANG Zhongyong

Lines 1–2 of the Algorithm 2. In Lines 10–13, the Gaussian the OTFS detector can be implemented directly with the
message is combined with the discrete prior to obtain the AMP algorithm. However, due to the deviation of the chan⁃
MMSE estimates of the symbols in terms of their posterior nel matrix from the i. i. d. Gaussian matrix, the AMP detec⁃
means and variances. There is an extra operation in Line 14, tor may perform poorly.
which averages the variances of x j. Thanks to the special form
of the unitary matrix F, 2D FFT is used in the implementa⁃
tions in Lines 2 and 9. It can be seen that the UAMP detector 4 Turbo Processing in Coded Systems
does not require any matrix-vector products, the algorithm re⁃ It is well known that joint decoding and detection can bring
quires only element-wise vector operations or scalar opera⁃ significant system performance improvement, and it can be re⁃
tions, except Lines 2 and 9, which are implemented with FFT. alized in a way that the detector and decoder exchange infor⁃
So the complexity of the UAMP detector is mation iteratively, i. e., the turbo processing[30–31]. The OTFS
O ( MN log ( MN ) ) + O ( MN|A| ) per OTFS block per iteration, detectors can be incorporated into a turbo receiver by endow⁃
which is independent of S. ing the OTFS detectors with the capabilities of taking the out⁃
put log-likelihood ratios (LLRs) of the decoder as (soft) input
Algorithm 2. UAMP detector for OTFS and producing (soft) output in the form of extrinsic LLRs of
Unitary transform:r = Fy = Λ Fx + ω with F = F N ⊗ F M. the coded bits, i.e., the so-called soft input soft output (SISO)
Calculated d with Eq. (29), and define vector Λ = d·d *. detector.
Initialize s-1 = 0, x̂ = 0, ϵ̂ (0) = 1, ν (0)
x = 1, and t = 0.
A typical turbo system is shown in Fig. 4, where Π and Π-1
Input: y, H represent interleaver and de-interleaver, respectively. The in⁃
Repeat formation bits are encoded and interleaved before symbol map⁃
1: ν p = ν tx Λ ping, where each symbol x j ∈ A = { α 1 ,...,α |A| } in the DD do⁃
( (
2: p = d ⋅ vec FFT2 reshape M ( x̂ t ) - ν p ⋅ st - 1 )) main is mapped from a sub-sequence of the coded bit se⁃

3: ν z = 1./ 1./ν p + ϵ̂ )t
quence, which is denoted by c j = c 1j ,...,c log|A|
j [
. Each α acorre⁃ ]
sponds to a length-Alog|A| binary sequence, which is denoted
4: z = ν z ⋅ ( p./ν p + ϵ̂t r ) by {α 1a ,...,α log|A|}. Based on the LLRs provided by the SISO de⁃
5: ϵ̂ t+1
= MN/ ( r - z  2
+ 1T ν z ) coder and the output of the OTFS demodulator as shown in
Fig. 4, the task of the SISO OTFS detector is to compute the
6: ν s = 1./ ν p + 1/ϵt + 11 ) extrinsic LLR for each coded bit, i.e.,
7: s = ν s ⋅ ( r - p̂ )

8: ν q = ΛT ν s / ( MN ) P ( c qj = 0|r )
L e ( c qj ) = ln - L a ( c qj )
9: q = x̂ (t) + ν q vec IFFT2 reshape M ( d ⋅ st ) ( )) P ( c qj = 1|r ) , (39)
10: ∀j:ξ j,a = exp - ν |α a - q j| -1
) where L a ( c qj ) is the output extrinsic LLR of the decoder in the
11: ∀j:β j,a = ξ j,a /∑a = 1 ξ j,a

last iteration. The extrinsic LLR L e ( c qj ) is passed to the decod⁃

12: ∀j:x̂ tj + 1 = ∑a = 1 α a β j,a
er. The extrinsic LLR L e ( c qj ) can be expressed in terms of ex⁃
13: ∀j:ν tx j+ 1 = ∑a = 1 β j,a |α a - x̂ tj + 1 |2
trinsic mean and variance of the symbols[32], i.e.,
MN ∑j = 1 x j
14: ν tx + 1 = νt + 1

|α a - m ej|2
15: t = t + 1 ∑ exp ( - v ej
) ∏P ( c qj ′ = α qa′)
q′ ≠ q
Until terminated
αa ∈ A 0q
L e ( c qj ) = ln
Output: the estimate of x i.e., x̂ |α a - m | e2

∑ exp ( - ) ∏P ( c qj ′ = α qa′)

Compared with the UAMP detector, the MP and VB de⁃ αa ∈ A 1q v ej q′ ≠ q , (40)

tectors have a complexity of O ( MNS|A|) per OTFS block
per iteration, which can be considerably higher than that of where m ej and v ej are the extrinsic mean and variance of x j, and
the UAMP detector in the case of rich scattering environ⁃ A q0 and A q0 represent the subsets of all α a corresponding to
ments and when fractional Doppler shifts have to be consid⁃ c qj = 0 and c qj = 1, respectively. The extrinsic variance and
ered (leading to a large S). Moreover, the UAMP detector mean are defined in Ref. [32].
can deliver much better performance when the number of
paths is relatively large. In particular, the UAMP detector v ej = (1/v pj - 1/v j )-1 ,m ej = v ej (m pj /v pj - m j /v j ), (41)
with estimated noise precision can significantly outperform
other detectors with perfect noise precision. We note that, where m j and v j are the a priori mean and variance of x j calcu⁃

Message Passing Based Detection for Orthogonal Time Frequency Space Modulation Special Topic
YUAN Zhengdao, LIU Fei, GUO Qinghua, WANG Zhongyong

32, i. e., there are 32 time slots and 256 subcarriers in the
Encoder ∏
x[k,l] OTFS s(t) TF domain. Both quadrature phase shift keying (QPSK)
modulation and 16-quadrature amplitude modulation
Time varying (QAM) are considered. The carrier frequency is 3 GHz, and
channel the subcarrier spacing is 2 kHz. The speed of the mobile us⁃
ω er is set to v = 135 km/h, leading to a maximum Doppler fre⁃
Decision OTFS ∏-1 OTFS
OTFS u(t) quency shift index k max = 6. We assume that the maximum
decoder ∏ SISO detector demodulator
delay index is l max=14. The Doppler index of the i-th path is
OTFS: orthogonal time frequency space SISO: soft input soft output uniformly drawn from the set [ -k max ,k max ] and the delay in⁃
dex is in the range of [ 1,l max ] excluding the first path
▲Figure 4. Iterative joint detection and decoding in a coded OTFS system[25]
(l 1 = 0). We assume that the fractional Doppler κ i is uni⁃
formly distributed within [ -1/2,1/2 ], and the channel coeffi⁃
lated based on the output LLRs of the SISO decoder[30] and m pj
cients h i are independently drawn from a complex Gaussian
and v pj are a posteriori mean and variance of x j.
distribution with mean 0 and variance η i, where the normal⁃
Taking the UAMP detector as an example, we show the
incorporation of the OTFS detector into a turbo receiver. Ac⁃ ized power delay profile η = exp ( -αl i )/∑i exp ( -αl i ) with

cording to the derivation of the UAMP algorithm, we can α being 0 or 0.1. The maximum number of iterations is set
find that q and ν q consist of the extrinsic means and varianc⁃ to 15 for all iterative detectors. We note that, all detectors
es of the symbols in x as they are the messages passed from except the MRC detector require the noise variance. The
the observation side and do not contain the immediate a pri⁃ UAMP detector performs noise precision estimation, while
ori information about x. Hence we have m ej = q j and v ej = ν q. the other detectors (except the MRC detector) including the
AMP detector assume perfect noise precision. We evaluate
Then Eq. (40) can be readily used to compute the extrinsic
the performance of the detectors in a variety of scenarios in⁃
LLRs of the coded bits. With the LLRs provided by the
cluding the bi-orthogonal and rectangular waveforms with
SISO decoder, one can compute the probability p ( x j = α a )
integer or fractional Doppler shifts, and QPSK or 16-QAM
for each x j, which is no longer the“non-informative prior ”
for modulations. In addition, both uncoded and coded sys⁃
in Algorithm 2. Therefore, ξ j,a in Line 7 of the algorithm is tems are evaluated.
changed to Fig. 5 shows the BER performance of various detectors
ξ j,a = p ( x j = α a )exp ( -ν -1 in the case of the bi-orthogonal waveform with different
q |α a - q j | ). (42)

numbers of paths, where we assume no fractional Doppler

shifts, i. e., S = P. We also assume α = 0, and QPSK is
In addition, the iteration of the UAMP detector can be com⁃
used. From this figure, we can see that, the MP detector
bined with the iteration between the SISO decoder and detec⁃
performs well when P = 6, but with the increase of P, its
tor, which leads to a single loop iteration (i.e., inner iterations
performance becomes worse. The VB detector has a similar
are not required).
trend. The MRC detector performs similarly to the MP and
The computational complexity of the detectors is summa⁃
VB detectors when P=6 and delivers better performance
rized in Table 2. In the above discussion, we focus on the bi-
than the MP and VB detectors with larger P. The AMP and
orthogonal waveform. The detectors can be extended to OTFS UAMP detectors perform well, where we can see that they
systems with other waveforms, such as the simple rectangular enjoy the diversity gain and achieve better performance
waveform[25]. with the increase of P. In all cases, the UAMP based detec⁃
tor delivers the best performance and significantly outper⁃
forms other detectors.
5 Simulation Results With the rectangular waveform and factional Doppler shifts,
In this section, we compare the performance of the mes⁃ we compare the bit error ratio (BER) performance of the AMP,
sage passing based detectors. The low complexity MRC de⁃ UAMP and MRC detectors in Fig. 6, where the number of
tector in Ref. [14] is also included. We set M = 256 and N = paths P = 9 and α = 0.1 is used for the power delay profile.
Both QPSK and 16-QAM are considered. Due to the deviation
▼Table 2. Computational complexity of various detectors per iteration of the channel matrix from the i. i. d. (sub- ) Gaussian matrix,
Detectors Complexity AMP exhibits performance loss, leading to significantly worse
MP detector O ( MNS|A|) performance compared with the UAMP detector. Thanks to the
VB detector O ( MNS|A|) robustness of UAMP against a general matrix, UAMP performs
UAMP detector O ( MN log ( MN ) ) + O ( MN|A| ) well. We can see that the MRC detector performs better than
MP: message passing UAMP: unitary approximate message passing
the AMP detector. The UAMP detector performs the best and
VB: variational Bayes the gaps between other detectors with the UAMP detector be⁃

Special Topic Message Passing Based Detection for Orthogonal Time Frequency Space Modulation
YUAN Zhengdao, LIU Fei, GUO Qinghua, WANG Zhongyong

10-1 10-1 10-1

10-2 10-2 10-2

10-3 10-3 10-3



10-4 10-4 10-4

10-5 MP 10-5 MP 10-5 MP
10-6 10-6 10-6
8 10 12 14 8 10 12 14 8 10 12 14
(b) P=12 (c) P=14
(a) P=6
AMP: approximate message passing MP: message passing SNR: signal-noise ratio VB: variational Bayes
BER: bit error ratio MRC: maximal ratio combining UAMP: unitary approximate message passing

▲Figure 5. BER performance of detectors with bi-orthogonal waveform and integer Doppler shifts (results are based on Ref. [25])

10-1 100 come larger in the case of higher order modulation 16-QAM,
10-1 compared with QPSK.
We then evaluate the performance of the detectors in a cod⁃
10-3 ed OTFS system, where the turbo receiver in Fig. 4 is em⁃

ployed. The number of paths P=14, and a rectangular wave⁃


form is used. In Fig. 7(a), we show the performance of the un⁃
coded system with the AMP and UAMP detectors. In Fig. 7(b),

MRC 10-5 MRC

we use a rate-1/2 convolutional code with a generator [ 5,7 ] 8
8 10
12 14 16 12 14 16 18 20 22 24
SNR/dB SNR/dB followed by a random interleaver and QPSK modulation. The
(a) QPSK (b) 16-QAM length of the codeword is MN. The BCJR algorithm is used for
AMP: approximate message passing UAMP: unitary approximate message passing
BER: bit error ratio QAM: quadrature amplitude modulation the SISO decoder. We can find that the performance gaps be⁃
MRC: maximal ratio combining
SNR: signal-noise ratio
QPSK: quadrature phase shift keying
tween the AMP detector and the UAMP detector become larg⁃
▲Figure 6. BER performance of detectors with the rectangular wave⁃ er in the coded system. The turbo receiver can achieve much
form and fractional Doppler shifts (results are based on Ref. [25]) better performance (about 3.5 - 4 dB at the BER of 10 -4)

10-1 10-1 10-1

10-2 10-2 10-2

10-3 10-3

10-4 10-4

10-5 AMP-conv
AMP 10-5 AMP-conv 10-5 AMP-LDPC
10 -6

8 10 12 14 2 4 6 8 2 4 6 8
(a) uncoded (b) convolutional code (c) LDPC and convolutional code
AMP: approximate message passing SNR: signal-noise ratio UAMP: unitary approximate message passing
BER: bit error ratio LDPC: low density parity check code

▲Figure 7. BER performance comparison of coded and uncoded system with rectangular waveform (part of the results is based on Ref. [25])

Message Passing Based Detection for Orthogonal Time Frequency Space Modulation Special Topic
YUAN Zhengdao, LIU Fei, GUO Qinghua, WANG Zhongyong

thanks to the joint processing of decoding and detection. In References

December 2021 Vol. 19 No. 4
44 December 2021 Vol. 19 No. 4

