0% found this document useful (0 votes)
31 views

Soft-Output Sphere Decoding Performance and Implementation Aspects PDF

This document summarizes soft-output sphere decoding algorithms for multiple-input multiple-output (MIMO) wireless systems. It describes how sphere decoding can be used as an efficient tool to implement soft-output MIMO detection with flexible trade-offs between computational complexity and error rate performance. Key aspects discussed include single tree search, ordered QR decomposition, channel matrix regularization, and log-likelihood ratio clipping which allow achieving near max-log performance at a complexity close to hard-output sphere decoding. The document also provides a framework for characterizing the resulting complexity-performance trade-offs of soft-output sphere decoding algorithms.

Uploaded by

Tiến Anh Vũ
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Soft-Output Sphere Decoding Performance and Implementation Aspects PDF

This document summarizes soft-output sphere decoding algorithms for multiple-input multiple-output (MIMO) wireless systems. It describes how sphere decoding can be used as an efficient tool to implement soft-output MIMO detection with flexible trade-offs between computational complexity and error rate performance. Key aspects discussed include single tree search, ordered QR decomposition, channel matrix regularization, and log-likelihood ratio clipping which allow achieving near max-log performance at a complexity close to hard-output sphere decoding. The document also provides a framework for characterizing the resulting complexity-performance trade-offs of soft-output sphere decoding algorithms.

Uploaded by

Tiến Anh Vũ
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Soft-Output Sphere Decoding:

Performance and Implementation Aspects


C. Studer∗ , M. Wenk∗ , A. Burg∗ , and H. Bölcskei†
∗ Integrated
Systems Laboratory † Communication Technology Laboratory
ETH Zurich, Switzerland ETH Zurich, Switzerland
email: {studer, mawenk, apburg}@iis.ee.ethz.ch email: [email protected]

Abstract— Multiple-input multiple-output (MIMO) detection hardware overhead and in addition be highly inefficient since
algorithms providing soft information for a subsequent channel large portions of the chip would remain idle most of the time.
decoder pose significant implementation challenges due to their A practical MIMO receiver design must therefore be able to
high computational complexity. In this paper, we show how
sphere decoding can be used as an efficient tool to implement cover a wide range of complexity/performance trade-offs using
soft-output MIMO detection with flexible trade-offs between a single tunable detection algorithm.
computational complexity and (error rate) performance. In Contributions: In this (predominantly tutorial) paper, we
particular, we demonstrate that single tree search, ordered QR provide a formulation of the sphere decoder [2], [3] as a
decomposition, channel matrix regularization, and log-likelihood
ratio clipping are the key ingredients for realizing soft-output tunable MIMO detector with performance ranging from that of
MIMO detectors with near max-log performance at a computa- successive interference cancellation (SIC) to that of max-log
tional complexity that is reasonably close to that of hard-output APP detection. Tuning of the detector is achieved through log-
sphere decoding. likelihood ratio (LLR) clipping, preprocessing, and imposing
constraints on the maximum computational complexity of the
decoder. We formulate a framework for systematically char-
I. I NTRODUCTION
acterizing the resulting complexity/performance trade-offs. Fi-
Multiple-input multiple-output (MIMO) wireless systems nally, we elaborate on, and provide some refinements of, the
employ multiple antennas on both sides of the wireless link tree-search algorithm introduced in [4] and the LLR clipping
and offer increased spectral efficiency (compared to single- approach proposed in [5].
antenna systems) by transmitting multiple data streams concur- Outline: The remainder of this paper is organized as fol-
rently and in the same frequency band (spatial multiplexing). lows. Section II reviews the transformation of the MIMO
MIMO technology constitutes the basis for upcoming wireless detection and LLR computation problems into a tree-search
communication standards, such as IEEE 802.11n and IEEE problem. Section III reviews max-log APP sphere decod-
802.16e. ing and proposes some refinements of existing algorithms.
The main challenge in the practical realization of MIMO In Section IV, we describe methods for reducing the tree-
wireless systems lies in the efficient implementation of the search complexity. A framework for evaluating the complex-
detector which needs to separate the spatially multiplexed data ity/performance trade-offs of the resulting class of detectors is
streams. To this end, a wide range of algorithms offering introduced in Section V. We conclude in Section VI.
various trade-offs between performance and computational
complexity have been developed [1]. Linear detection produc-
ing hard-decision outputs constitutes one extreme of the com- II. S OFT-O UTPUT S PHERE D ECODING
plexity/performance trade-off region, while computationally
demanding a posteriori probability (APP) detection algorithms Consider a MIMO system with MT transmit and MR
result in the opposite extreme. The computational complexity receive antennas. The coded bit-stream is mapped to
of a MIMO detection algorithm depends on the symbol MT -dimensional transmit vector symbols s ∈ OMT , where O
constellation size and the number of spatially multiplexed data stands for the underlying complex-valued scalar constellation
streams, but often also on the instantaneous MIMO channel of cardinality 2Q . The individual coded bits are denoted by
realization and the signal-to-noise ratio (SNR). On the other xj,b , where the indices j and b refer to the bth bit in the
hand, the overall decoding effort is typically constrained by binary label of the jth entry of s, respectively. The resulting
system bandwidth, latency requirements, and limitations on complex baseband input-output relation is given by
power consumption. Implementing different algorithms, each
optimized for a maximum allowed decoding effort and/or a y = Hs + n (1)
particular system configuration, would entail a considerable
where H denotes the MR × MT channel matrix and n is
This work was supported by the STREP project No. IST-026905 (MAS-
COT) within the sixth framework programme (FP6) of the European Com- an i.i.d. proper complex Gaussian distributed MR -dimensional
mission. noise vector with unit variance entries.

1­4244­0785­0/06/$20.00 2071
A. Max-Log Soft-Output Computation computed recursively as d(s) = d1 with the partial Euclidean
Soft-output MIMO detection requires the computation of distances (PEDs)
LLRs for all coded bits. In order to reduce the corresponding di = di+1 + |ei |2 , i = MT , MT − 1, . . . , 1 (8)
computational complexity, we employ the max-log approxi-
mation [6] and the distance increments (DIs)
   MT
 2
L xj,b = min y − Hs2 − min y − Hs2 (2)  
(0) (1) |ei |2 = ỹi − Ri,j sj  . (9)
s∈Xj,b s∈Xj,b
j=i
(0) (1)
where Xj,b and Xj,b are the disjoint sets of vector symbols Since the dependence of the PEDs di on the symbol vector s
that have the bth bit in the label of the jth scalar symbol equal is only through s(i) , we have transformed ML detection and
to 0 and 1, respectively. For each bit, one of the two minima the computation of the max-log LLRs into a weighted tree-
in (2) is given by λML = y − HsML 2 , where search problem: PEDs and PSVs are associated with nodes,
branches correspond to DIs. Each path from the root down
sML = arg min y − Hs2 (3) to a leaf corresponds to a symbol vector s ∈ OMT. The
s∈O MT 
xML
j,b
is the maximum likelihood (ML) solution. The other minimum leaf associated with the smallest metric in OMT and Xj,b
in (2) is given by corresponds to the solution of (6) and (7), respectively. The
basic building block underlying the two tree traversal strategies
λML
j,b = min y − Hs2 (4) described in the next section is the Schnorr-Euchner sphere
(xML )
s∈X j,b decoder (SESD) with radius reduction [8], briefly summarized
j,b
as follows: The SESD constrains the search to nodes which
where the counter-hypothesis xMLj,b denotes the binary comple- lie within a radius r around ỹ and traverses the tree depth-
ment of the bth bit in the binary label of the jth entry of sML . first, visiting the children of a given node in ascending order
With (3) and (4) the max-log LLRs can be written as of their PEDs. The basic idea of radius reduction is to start
 the algorithm with r = ∞ and to update the radius according
  λML − λMLj,b , xML j,b = 0 to r2 ← d(s) whenever a leaf s has been reached. This avoids
L xj,b = (5)
λML
j,b − λ ML
, x ML
j,b = 1 . the problem of selecting a suitable (initial) radius and leads to
efficient pruning of the tree.
From (5) we can conclude that efficient max-log APP MIMO
Throughout this paper, computational complexity is defined
detection reduces to efficiently identifying sML , λML , and λML
j,b as the number of visited nodes. This complexity measure
for j = 1, 2, . . . , MT and b = 1, 2, . . . , Q [7].
is directly related to the throughput of corresponding VLSI
implementations [9].
B. Max-Log APP MIMO Detection as a Tree Search
Transforming (3) and (4) into tree-search problems and us- III. T REE -T RAVERSAL S TRATEGIES
ing the sphere decoding algorithm [2], [3] allows to efficiently Computing the LLRs as in (5) requires determining the
compute the LLRs (5). To this end, the channel matrix H metric λML
j,b , which is achieved by traversing

only those parts
is first QR-decomposed according to H = QR, where Q is xML
j,b
unitary and R is upper-triangular with real-valued positive of the tree that have leaves in Xj,b . Since this computation
entries on its main diagonal. Left-multiplying (1) by1 QH has to be carried out for every coded bit, it is immediately
leads to the modified input-output relation obvious that the resulting need for repeated tree traversals can
lead to a major computational burden. In the following, we
ỹ = Rs + QH n with ỹ = QH y review two alternative tree-traversal strategies, proposed in [7]
and [4], respectively, for solving (6) and (7). In addition, we
and hence, noting that QH n has the same statistics as n, to
propose some minor refinements of the tree-search algorithm
the equivalent formulation of λML and λML
j,b as introduced in [4].
λML = min ỹ − Rs2 (6)
s∈O MT A. Repeated Tree Search
λML
j,b = min ỹ − Rs2 . (7) An algorithm for computing the LLRs based on repeated
(xML )
s∈X j,b
j,b tree search (RTS) was described in [7]. The basic idea is
to start by solving (6) (using the SESD) and to rerun the
We next define the partial symbol vectors (PSVs)
T SESD to solve (7) for each coded bit (i.e., QMT times) in
s(i) = [ si si+1 · · · sMT ] and note that the s(i) can be
the vector symbol. When rerunning the SESD to determine
arranged in a tree that has its root just above level i = MT
λML
j,b , the search tree is prepruned by forcing the decoder to
and leaves, which correspond to possible candidate symbol
exclude all nodes (and the corresponding subtrees) from the
vectors, on level i = 1. After initializing dMT +1 = 0, the
search for which xj,b = xML j,b . This prepruning procedure is
Euclidean distances d(s) = ỹ − Rs2 in (6) and (7) can be
illustrated in Fig. 1. Initializing the SESD with r = ∞ in
1 The superscript H stands for conjugate transposition. each of the QMT runs required to obtain λML j,b will lead to

2072
by the updates λML ← d (x) and xML ← x. In other
xML = [ 0 1 1 ] xML
1 =1 words, for each bit in the ML hypothesis that is changed
in the process of the update, the metric of the former
0 1
ML hypothesis becomes the metric of the new counter-
1 hypothesis, followed by an update of the ML hypothesis.
This procedure ensures that all λML j,b always contain the
1 metric associated with a valid counter-hypothesis to the
current ML hypothesis.
2) In the case where d (x) ≥ λML , only the counter-
xML
2 =0 xML
3 =0 hypotheses have to be checked. For all j and b for
which d (x) < λML ML
j,b and xj,b = xj,b , the decoder up-
ML
dates λj,b ← d (x).
0 0
Pruning criterion: The key aspect of this algorithm is
0 0 0 0 the following pruning criterion. A given node s(i) on
level i and the subtree originating from that node have
the partial binary label x(i) consisting of the bits xj,b
Fig. 1. Example of the prepruning procedure in the RTS approach. Counter- (b = 1, 2, . . . , Q and j = i, i + 1, . . . , MT ). The remaining
hypotheses to the ML solution are found by forcing the algorithm through bits xj,b (j = 1, 2, . . . , i − 1) corresponding to the subtree are
the dashed branches.
unknown at this point. The pruning criterion for s(i) along
with its subtree is compiled from two conditions. First, the
high computational complexity. It is therefore important to bits in the partial binary label x(i) are compared with the
realize that, without compromising max-log optimality, we corresponding bits in the binary label of the current ML
can initialize the search radius rj,b by setting itequal
 to the
hypothesis. In this comparison, for all j, b with xj,b = xML j,b ,
xML
j,b the corresponding counter-hypotheses λML might be affected
minimum value of ỹ − Rs over all s ∈ Xj,b found j,b
when further searching the node’s subtree. Second, all counter-
during preceding tree traversals.
The main advantage of the RTS strategy lies in the fact hypotheses corresponding to the subtree of s(i) with the asso-
that each traversal of the tree can be performed using a hard- ciated metrics λML j,b (j = 1, 2, . . . , i − 1) may also be updated
decision SESD with minimal modifications to account for since the corresponding bits are not yet known. In summary,
the search being carried out on a prepruned tree. The main the metrics which may be affected during further search in the
disadvantage is the repeated traversal of large parts of the tree. subtree emanating from a node s(i) are given by the set



As noted in [10], this problem can be mitigated somewhat by A = {al } = λML  ML ML 


j,b xj,b = xj,b , j ≥ i ∪ λj,b j < i .
changing the detection order in each run. Unfortunately, the
resulting need for multiple QR-decompositions typically leads (i)
The
 node  s along with its subtree is pruned if its PED
to prohibitive overall computational complexity. d s (i)
satisfies
 
B. Single Tree Search d s(i) > max al . (10)
al ∈A
The key to a more efficient (compared to RTS) tree-search
strategy is to ensure that every node in the tree is visited at This pruning criterion (illustrated in Fig. 2) ensures that the
most once. This can be accomplished by searching for the ML subtree of a given node is explored only if it can lead to an
solution and all counter-hypotheses concurrently. The basic update of either the ML hypothesis or of at least one of the
idea behind such a single tree search (STS) approach has been counter-hypotheses. Note that λML does not appear in (10) as
outlined in [4]. In the following, we shall elaborate on the λML ≤ λMLj,b (∀ j, b).
idea presented in [4] and describe some minor refinements.
Specifically, we formulate update rules and a pruning criterion IV. M ETHODS FOR C OMPLEXITY R EDUCTION
based on a list containing the metrics λML and λML j,b .
So far we have discussed tree-search strategies which
The main concept is to have a list containing the metric
solve (2) exactly and hence do not compromise the perfor-
λML along with the corresponding bit sequence xML and
mance of the max-log APP decoder. The goal of this section is
the metrics λMLj,b of all counter-hypotheses and to search the
to describe methods that allow to trade-off decoder complexity
subtree originating from a given node only if the result can
with (error rate) performance.
lead to an update of either λML or one of the λMLj,b .
List administration: The algorithm is initialized with
λML = λML j,b = ∞ (∀ j, b). Whenever a leaf with correspond-
A. LLR Clipping
ing binary label x has been reached, the decoder distinguishes The dynamic range of LLRs is typically not bounded.
between two cases: However, practical systems need to constrain the maximum
1) If a new ML hypothesis is found, i.e., d (x) < λML , all LLR value to enable fixed-point implementations. Evidently
λML ML
j,b for which xj,b = xj,b are set to λ
ML
followed this will lead to a performance degradation. A straightforward

2073
where P is a suitably chosen permutation matrix. More
xML x(i) counter-hypotheses
efficient pruning of the search tree closer to the root is obtained
0 0 0 0 λML
MT ,1 λML
MT ,2 if “stronger streams” correspond to the levels closer to the root,
i.e., P is chosen such that the main diagonal entries of R in
max HP = QR are sorted in ascending order. In the following, this
0 0 1 0 λML ML
MT −1,1 λMT −1,2
approach is termed sorted QR-decomposition (SQRD) [11].
Regularization: Poorly conditioned channel realizations H
1 0 0 0 λML λML >? lead to significant search complexity due to the low effective
i,1 i,2
level i SNR on one or multiple of the effective spatial streams. An

efficient way to counter this problem is to perform the tree-
1 0 ? ? λML
i−1,1 λML
i−1,2 d s(i)
search on a regularized channel matrix by computing
   
H Q1
1 0 ? ? λML λML P= R
1,2 1,1 αI Q2
where I is the MT × MT -identity matrix and α > 0 is
Fig. 2. Example of the STS pruning criterion (MT = 5 and two bits per a suitably chosen regularization parameter. LLRs are then
symbol): The partial binary label x(i) determines which counter-hypotheses computed according to
may be affected during the search of the subtree emanating from the current  
node. L xj,b = min ỹ − Rs̃2 − min ỹ − Rs̃2 (14)
(0) (1)
s̃∈Xj,b s̃∈Xj,b

way of ensuring that LLR values are bounded is to clip them where ỹ = QH 1 y and s̃ = Ps. Note that the LLRs in (14) need
after the detection stage so that to be reordered at the end of the decoding process to account
for the permutation induced by P. Operating on a regularized
|L(xj,b )| ≤ Lmax ∀ j, b . (11) version of the channel matrix clearly entails an (error rate)
It has been noted in [5] that the constraint (11) can be built performance loss. However, we shall see in Section V that
into the tree-search algorithm such that it leads to a reduction choosing α according to the minimum mean squared error
in search complexity. In the following, we briefly describe the (MMSE) criterion (resulting in MMSE-SQRD) as outlined in
application of the idea proposed in [5] to the RTS and the STS [12], degrades the performance only slightly while leading to
tree-traversal strategies. considerable savings in terms of search complexity.
a) LLR Clipping for RTS: Whenever the RTS algorithm
starts to search for a counter-hypothesis, with the search radius C. Run-Time Constraints
rj,b initialized as described in Section III-A, we first update A disadvantage of all SDs is that the computational com-
 plexity required to find the ML solution (and the LLR values)
rj,b ← min rj,b , λML + Lmax (12)
depends on the realization of the channel matrix and the noise;
which ensures that (11) is satisfied. Metrics associated with the worst-case complexity corresponds to an exhaustive search.
counter-hypotheses for which no valid lattice point can be On the other hand, in order to meet the practically important
found are set to λML + Lmax . requirement of a fixed throughput, the algorithm run-time must
b) LLR Clipping for STS: Whenever a leaf has been be constrained, which leads to a constraint on the maximum
reached and a new ML hypothesis has been found after detection effort. This, in turn, generally prevents the detector
carrying out the steps in Case 1 in Section III-B, the counter- from achieving ML or max-log APP performance.
hypotheses have to be updated according to A straightforward way of enforcing a run-time constraint

is to terminate the search, on a symbol vector by symbol


λML ML
j,b ← min λj,b , λ
ML
+ Lmax ∀ j, b . (13)
vector basis, after a maximum number of visited nodes. The
For Lmax = ∞, we obviously get the exact max-log solution, detector then returns the best solution found so far, i.e., the
whereas for Lmax → 0, the decoder performance approaches current ML and counter-hypotheses. A better solution is to
that of a hard-output ML detector. On the other hand smaller impose an aggregate run-time constraint of N Davg visited
Lmax leads to a reduction in complexity, as more aggressive nodes for an entire block of N vector symbols2 . The maximum
pruning is performed. The parameter Lmax can therefore be complexity allocated to the detection of the kth vector symbol
used to adjust the complexity/performance trade-off (cf. Sec- can, for example, be chosen according to the maximum-first
tion V). (MF) scheduling strategy [13] as
k−1

B. Ordering and Regularization Dmax (k) = N Davg − D(i) − (N − k)MT (15)
i=1
Ordering: A common approach to reduce complexity in
sphere decoding without compromising the decoder’s perfor- where D(i) denotes the actual number of visited nodes for the
mance is to adapt the detection ordering of the spatial streams ith vector symbol. The concept behind (15) is that a vector
to the geometry of the instantaneous channel realization by 2 In an OFDM-based MIMO system, N would, for example, be the number
performing a QR-decomposition on HP (rather than H), of OFDM tones.

2074
symbol is allowed to use up all of the remaining run-time 450 0.4
0.4 RTS, FER=0.04
within the block up to a safety margin of (N − k)MT visited 400 64 64
RTS, FER=0.01

Average number of visited nodes


nodes, which allows to find at least the decision feedback STS, FER=0.04
solution for the remaining vector symbols. Setting Davg = MT 350 0.2 STS, FER=0.01
0.2
maximizes the throughput but reduces the performance to that LSD [6], FER=0.04
300 LSD [6], FER=0.01
of hard-decision SIC.
0.1
250
V. P ERFORMANCE /C OMPLEXITY T RADE - OFFS 32 32 0.1

In practice, system engineers are typically faced with the 200


0.05 0.05
problem of designing a receiver that achieves a given target 150 0.4
frame error rate (FER) at a given throughput. The quality 16
0.4 16 0.025
0.025
0.0125
of the receiver implementation can then be measured by the 100 0.0125
0.2 8 0.2 8
minimum SNR required to achieve this target FER. In the 4 4
50 0.1
following, we assess the complexity/performance trade-offs 0.1 0.05 2
2

of the concepts described in Sections III and IV by plotting 0 0.05 0.025 0.0125 0.025 0.0125

the average (over independent channel and noise realizations) 15.5 16 16.5 17 17.5 18 18.5 19
Minimum SNR for a given FER
number of visited nodes as a function of this minimum
SNR. Since the number of visited nodes translates directly to
the required chip area per throughput [9], the corresponding Fig. 3. Comparison of repeated tree search (RTS), single tree search (STS)
and the list sphere decoder (LSD) as proposed in [6]. The numbers next to
charts allow to associate an SNR penalty with a reduction in the curves correspond to Lmax for RTS and STS and to the list size in the
hardware complexity. case of the LSD.
All simulation results are for a rate 1/2 (generator poly-
nomials [133o 171o ] and constraint length 7) convolutionally 120
encoded 4 × 4 MIMO-OFDM system with 16-QAM constel- 0.2
QRD
lation (using Gray mapping) and N = 64 tones. A soft-in SQRD
Average number of visited nodes

0.4
100 MMSE-SQRD
Viterbi decoder [14] is employed. One frame consists of 1024
randomly interleaved (across space and frequency) bits and a
TGn type C channel model [15] is used. 80

0.2
A. Comparison of Tree-Search Strategies
60
Fig. 3 compares the performance of RTS and STS max- 0.1

log APP decoders, and the list sphere decoder (LSD) [6] for 0.2
hard-output
different target FERs, different values of Lmax and in the case 40
0.1
0.05
SESD
of the LSD for different list sizes. Changing the list size allows 0.025
to adjust the complexity/performance trade-off. 20 0.1 0.05 0.0125
0.025
The STS approach is seen to clearly outperform the RTS 0.0125
0.05
strategy in terms of average complexity. We can furthermore 0.025 0.0125
0
see that for this setup max-log APP performance is achieved 16.5 17 17.5 18 18.5 19 19.5 20
for Lmax = 0.2. Increasing the LLR clipping level beyond Minimum SNR for a given FER
this value only increases complexity without improving per-
formance. Fig. 4. Comparison of unordered QRD, SQRD and MMSE-SQRD prepro-
The implementation of the LSD requires additional memory cessing applied to STS at a target FER of 0.01. The numbers next to the
and logic for the administration of the candidate list, which is curves correspond to Lmax . For Lmax → 0, the performance approaches
that of hard-output SESD.
not accounted for in this comparison. Fig. 3 shows that even
when this additional complexity is ignored, the LSD is still
inferior to the STS algorithm. C. LLR Clipping
B. Impact of Preprocessing and Regularization Both Fig. 3 and Fig. 4 show that adjusting the LLR clipping
Fig. 4 compares the impact of SQRD, MMSE-SQRD, and level Lmax allows to sweep an entire family of sphere decoders
standard (unordered) QRD-based preprocessing on the com- ranging from the exact max-log APP SESD (obtained, in our
plexity/performance trade-off of the STS algorithm at a target setup, for Lmax ≥ 0.2) to hard-output SESD (Lmax = 0). The
FER of 0.01. It can be seen that the improvement resulting LLR clipping level is therefore an important design parameter
from SQRD compared to unordered QRD becomes significant which can be used to conveniently adjust the decoder at
in the low (but realistic) complexity region. Further (minor) runtime to a given complexity constraint.
improvements are obtained from regularization using MMSE-
SQRD. In the region where the average complexity is very D. Run-time Constraints
high, the performance penalty resulting from regularization In Fig. 5, we finally demonstrate the impact of imposing a
eventually renders MMSE-SQRD inferior to SQRD. maximum run-time constraint of N Davg visited nodes for a

2075
50 R EFERENCES
0.2
0.2 Davg=8
45 [1] H. Bölcskei, D. Gesbert, C. Papadias, and A. J. van der Veen, Eds.,
Average number of visited nodes

Davg=16 Space-Time Wireless Systems: From Array Processing to MIMO Com-


40 Davg=32 munications. Cambridge Univ. Press, 2006.
[2] C. P. Schnorr and M. Euchner, “Lattice basis reduction: Improved prac-
35 Davg=64 tical algorithms and solving subset sum problems,” Math. Programming,
Davg=128 vol. 66, no. 2, pp. 181–191, Sept. 1994.
30 [3] U. Fincke and M. Pohst, “Improved methods for calculating vectors of
short length in a lattice, including a complexity analysis.” Mathematics
25
of Computation, vol. 44, pp. 463–471, Apr. 1985.
20
0.1 0.1 [4] J. Jaldén and B. Ottersten, “Parallel implementation of a soft output
sphere decoder,” in Proceedings Asilomar Conference on Signals, Sys-
0.2
15
0.1 tems and Computers, Nov. 2005, pp. 581–585.
0.05 [5] M. S. Yee, “Max-Log-Map sphere decoder,” in Proc. IEEE ICASSP
0.05
10 0.1 2005, vol. 3, Mar. 2005, pp. 1013–1016.
0.05
0.2 [6] B. M. Hochwald and S. ten Brink, “Achieving near-capacity on a
0.025
5 0.0125 0.025 0.125 multiple-antenna channel,” IEEE Transactions on Communications,
vol. 51, no. 3, pp. 389–399, Mar. 2003.
0 [7] R.Wang and G. Giannakis, “Approaching MIMO channel capacity with
16 17 18 19 20 21 22 reduced-complexity soft sphere decoding,” in Proc. of IEEE Wireless
Minimum SNR for a given FER Communications and Networking Conf. (WCNC), vol. 3, Mar. 2004, pp.
1620–1625.
[8] E. Agrell, T. Eriksson, A. Vardy, and K. Zeger, “Closest point search in
Fig. 5. Impact of run-time constraints with MF scheduling on STS SESD lattices,” IEEE Transactions on Information Theory, vol. 48, no. 8, pp.
with MMSE-SQRD preprocessing at a FER of 0.01. The performance can be 2201–2214, Aug. 2002.
optimized by choosing an appropriate LLR clipping level (shown next to the [9] A. Burg, M. Borgmann, M. Wenk, M. Zellweger, W. Fichtner, and
curves) for a given average run-time constraint Davg . H. Bölcskei, “VLSI implementation of MIMO detection using the sphere
decoder algorithm,” IEEE Journal of Solid-State Circuits, vol. 40, no. 7,
pp. 1566–1577, July 2005.
[10] P. Marsch, E. Zimmermann, and G. Fettweis, “Smart candidate adding:
A new low-complexity approach towards near-capacity MIMO detec-
block of N = 64 vector symbols using the strategy described tion,” in Proceedings of 13th European Signal Processing Conference
in Section IV-C. The resulting curves essentially consist of (EUSIPCO), Sept. 2005.
two regions: [11] D. Wübben, R. Böhnke, J. Rinas, V. Kühn, and K. Kammeyer, “Efficient
algorithm for decoding layered space-time codes,” IEE Electronics
• If the LLR clipping level is large (corresponding to high Letters, vol. 37, no. 22, pp. 1348–1350, Oct. 2001.
[12] D. Wübben, R. Böhnke, V. Kühn, and K. Kammeyer, “MMSE extension
search complexity), the run-time constrained detector is of V-BLAST based on sorted QR decomposition,” in IEEE Proc.
not able to compute accurate LLR values, which results Vehicular Technology Conference (Fall), vol. 1, Oct. 2003, pp. 508–512.
in (very) poor performance, unless Davg is large. For [13] A. Burg, M. Borgmann, M. Wenk, C. Studer, and H. Bölcskei, “Ad-
vanced receiver algorithms for MIMO wireless communications,” in
Davg = 128, the performance is very close to that of the Proceedings of the Design Automation and Test Europe Conf. (DATE),
unconstrained max-log APP decoder. vol. 1, May 2006, pp. 593–598.
• In the region where Lmax is small, the performance is [14] J. Hagenauer, E. Offer, and L. Papke, “Iterative decoding of binary block
and convolutional codes,” IEEE Transactions on Information Theory,
dominated by aggressive LLR clipping rather than by the vol. 42, no. 2, pp. 429–445, Mar. 1996.
run-time constraint. [15] V. Erceg et al., TGn channel models, May 2004, IEEE 802.11 document
03/940r4.
In summary, we can conclude that for a given average run-
time constraint there exists an optimum LLR clipping level,
which minimizes the SNR required to achieve a certain target
FER. It is therefore of paramount importance to choose the
LLR clipping level in accordance with the average run-time
constraint.

VI. C ONCLUSIONS

The sphere decoder is a suitable tool to implement MIMO


detection with flexible complexity/performance trade-offs. In
particular, adjusting the LLR clipping level is an efficient way
of realizing an entire family of decoders ranging from exact
max-log soft-output SD to hard-output SIC detection. The keys
to achieving low complexity are the single tree-search strategy
in Section III-B, MMSE-SQRD preprocessing, LLR clipping,
and imposing run-time constraints with MF scheduling. Our
results demonstrate that MIMO detection with near max-log
APP performance can be realized with a complexity that is
reasonably close to that of a hard-output sphere decoder.

2076

You might also like