A Performance Study of MIMO Detectors: Christoph Windpassinger, Lutz Lampe, Robert F. H. Fischer, Thorsten Hehn

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

A Performance Study of MIMO Detectors

Christoph Windpassinger , Lutz Lampe+ , Robert F. H. Fischer , Thorsten Hehn

Abstract Several approaches have recently been proposed for the efcient optimum or approximate solution of the detection problem in multiple-input multiple-output transmission systems. These are, however, difcult to compare. In the present work we briey summarize the most popular and promising of these approaches and offer a way to visualize the tradeoff between complexity of the detection and the achievable power efciency using complexitypower diagrams. We conclude that the so-called sphere decoder algorithm is very attractive in terms of average complexity, while for low and constant processing delay lattice reduction with subsequent simple linear or nonlinear detection is more favorable. Index Terms MIMO transmission systems, MIMO detection, lattice reduction, lattice decoding, sphere decoding

Lehrstuhl f r Informations bertragung, Universit t ErlangenN rnberg, Cauerstrae 7/LIT, 91058 Erlangen, Germany, u u a u

email: {windpass,fischer,hehn}@LNT.de + The University of British Columbia, Department of Electrical and Computer Engineering, Vancouver, BC, Canada, email: [email protected] The material in this letter was presented in part at the IEEE International Symposium on Information Theory, Chicago, Illinois, USA, June 2004

WINDPASSINGER ET AL.: A PERFORMANCE STUDY OF MIMO DETECTORS

I. I NTRODUCTION Of late, multiple-input multiple-output (MIMO) systems have attracted considerable attention due to the linear growth of the system capacity with the number of parallel subchannels, i.e., dimensions of the vector transmission model. As the number of dimensions increases, the detection problem quickly becomes very complex, since with M -ary signaling in each dimension, a K-dimensional signaling vector allows for M K different transmit signals in each time step. This exponential growth of the signaling set calls for low complexity suboptimum detector structures, several of which have been proposed, e.g., [1], [2]. On the other hand, there has also been considerable progress in the eld of low complexity (almost) maximum-likelihood detection algorithms based on lattice decoding and collectively known as sphere decoder algorithms due to the way the search problem is solved. In particular, in [3] the combination of a low complexity minimum mean-square-error (MMSE) detector with the sphere decoder algorithm was shown to result in a low complexity and power-efcient detector. A very general approach to obtain approximate solutions to the so-called lattice closest vector problem goes back to Babai [4]. The application of this method to MIMO transmission schemes has been called lattice-reduction-aided detection (LRAD) [5], [2], cf. also [6]. The uncoded error rates resulting for this detector show a very promising run they parallel those for maximum likelihood detection, i.e., show the full diversity offered by the channel, even when only simple linear zero-forcing equalization is used. The main problem of these detectors is that the boundary region of the signaling constellation can not be taken into account by the detector, which leads to a non-negligible increase in error rate with respect to non-lattice-reduction-aided detectors for low signal-to-noise ratios. One way to overcome this problem of LRAD is discussed in [7], where the use of a set of parallel subspace detectors producing a set of candidate solutions was proposed. That scheme will turn out to be rather complex compared to the other methods discussed, and we will also consider a new variant using only one set of parallel detectors instead of K sets of parallel
July 18, 2005 REVISED VERSION

WINDPASSINGER ET AL.: A PERFORMANCE STUDY OF MIMO DETECTORS

detectors as discussed in [7]. All of these detector implementations are much more efcient than brute-force maximumlikelihood detection. However, since they do not all solve the maximum-likelihood detection problem exactly, a fair and complete comparison of these schemes is still missing. In this work we propose to use complexity-power diagrams which visualize the tradeoff between the achievable power efciency and the complexity of the detector. Since for a practical implementation also the processing delay with which the detector can be implemented is of interest, we will also discuss this parameter. This letter is organized as follows. In Section II, we introduce the MIMO transmission model. The detectors that we compare are briey described in Section III. The power-complexity diagrams are presented and evaluated in Section IV, while the processing delay issue is discussed in Section V. Some concluding remarks are offered in Section VI.

II. T RANSMISSION M ODEL We consider the often used at-fading MIMO transmission model y = Hx + n (1)

with NT input and NR output subchannels (e.g. antennas), where the vectors x, n, y and the matrix H are obtained from the equivalent baseband model, and hence are complex-valued. H = [hkl ] is the NR NT channel matrix, x = [x1 , . . . , xNT ]T1 symbols, each chosen from some signal set A, and n = [n1 , . . . , nNR ]T is the vector of i.i.d. circularly symmetric complex Gaussian noise. The receiver has access to the vector of complex received values, y = [y 1 , . . . , yNR ]T , and additionally we assume perfect knowledge of the channel matrix H. For simplicity we will subsequently chose NR = NT = K, but note that the methods discussed equally apply for any setting with NR NT .
1

In this letter, ()T , || ||, and det() denote transpose, the L2 -norm, and the determinant of a matrix. |A| is the cardinaly of

set A, and I denotes an identity matrix of appropriate dimension.

July 18, 2005

REVISED VERSION

WINDPASSINGER ET AL.: A PERFORMANCE STUDY OF MIMO DETECTORS

When the complex signal constellation used for x can be described as the Cartesian product of two real-valued constellations, as in the case of the commonly used quadrature-amplitude modulation (QAM) constellations, a real-valued equivalent transmission model turns out to be more convenient, writing y r = H r xr + n r

(2)

with H r

2K2K 2K

, and xr , nr , y r

2K

[2], [8]. If xr is taken from the set of integers, i.e.,

xr A2K r

, the real-valued model shows that the noiseless received signal H r xr can be

seen as a point of the lattice described by the matrix H r . Only a subset of this lattice around the zero vector is actually used, as practical systems have only limited average and peak output

power. For QAM constellations we can achieve xr the received signal.

2K

by suitable scaling and translation of

Since the noise is circularly symmetric and Gaussian, the maximum-likelihood estimate of x r is xr = arg min ||y r H r xr ||2 .
xr A2K r

(3)

If the search space A2K is changed to the integer lattice r

2K

, the equivalent minimization prob-

lem is known as the lattice closest vector problem (LCVP). Even the restricted search space of size |A2K | in general prohibits the use of a brute-force search through all possible transmit r vectors. Several so-called sphere decoding algorithms have been proposed to solve this problem efciently, e.g., [9], [10], [3]. In fact, it turns out that this particular instance of the LCVP has on the average only complexity polynomial in K, whereas the worst-case complexity is exponential, i.e., proportional to the size of the search space [11].

III. MIMO D ETECTORS We now briey describe some of the detection algorithms that we include in our comparison. We assume the reader is familiar with the traditional low complexity detection strategies such as linear, decision-feedback, and V-BLAST equalization, cf. [1], [12].
July 18, 2005 REVISED VERSION

WINDPASSINGER ET AL.: A PERFORMANCE STUDY OF MIMO DETECTORS

A. Sphere Decoder Algorithm The sphere decoder algorithm of [10] is reproduced in Fig. 1, where we have included modications to account for nite constellation sizes (see also e.g. [3]). To use the algorithm, the QR decomposition of H r is formed, H r = QR, with Q orthogonal and R upper triangular, and xr = decode(QT y r , R1 ) is obtained. If the M -QAM constellation used for xr is dened as xr,k { , k, the signal sets can be easily translated to correspond to transmit signals from a range xk [0, . . . , Qmax = M 1], as expected by the algorithm of Fig. 1, by adding
M 1 [1, . . . , 1]T 2 M1 H r [1, . . . , 1]T 2 M 1 , M 1 2 2 M 1 } 2

+ 1, . . . ,

to the received vector y r , and subtracting

from the result of the decode()-function.

In [3] it was shown that if the preprocessing of y r by QT obtained from the QR decomposition is replaced by a matrix obtained from the V-BLAST algorithm with MMSE criterion and corresponding component reordering, a signicant reduction in average complexity of the algorithm is possible. Since the MMSE criterion allows for some tradeoff between signal and noise components, this means the matrix corresponding to R in the QR decomposition is no longer strictly upper triangular as expected by the sphere decoder algorithm, and thus the resulting detector does not achieve true maximum-likelihood detection.

B. Lattice-Reduction-Aided Detection (LRAD) As shown in Section II, the noiseless received points in the communication scenario corre

spond to points of the lattice H r

2K

. (For QAM constellations the translation as outlined in the

previous Section III-A is performed). Lattice (basis) reduction [13] changes the generating matrix of the lattice to obtain a nicer description of the lattice. Since the lattice points correspond to integer linear combinations of the columns of H r = [hr,1 , . . . , hr,2K ], it obtains another set of column vectors, collected in the reduced matrix H red , that span the same set of points,

Hr

2K

H red

2K

, but which are, in general, closer to orthogonal.

July 18, 2005

REVISED VERSION

WINDPASSINGER ET AL.: A PERFORMANCE STUDY OF MIMO DETECTORS

The matrices H r and H red are related by H red = H r P , (4)

where P is a matrix with integer entries that has |det(P )| = 1, i.e., P 1 also contains only integer entries. To obtain this decomposition, we assume the application of the popular LenstraLenstra-Lov sz (LLL) reduction [13] in the following. Using this reduced matrix in the reala valued model for the transmission, we write y r = H red P 1 xr + nr , (5)

and since P 1 is an integer matrix, P 1 xr is an integer vector, and we can thus interpret the noiseless received signal points as points in the lattice described by H red . Since the matrix H red is in general closer to orthogonal than H r , low complexity detection in this lattice works better since noise enhancement, e.g., due to linear equalization with H 1 , is decreased. In [2] red it was shown that using the V-BLAST decision-feedback detection algorithm [1] instead of linear equalization can improve detector performance signicantly. Having found the estimate described in terms of the lattice basis H red , we can reverse the basis change, i.e., translate the result by application of P , to obtain xr . This approach corresponds exactly to the nearest lattice point approximation methods of [4], which however does not consider reordering of the components, i.e., P = I. Since the quantization to the integer lattice can not regard the boundary region of the constellation used for x r , the points obtained in xr stem from an extended version of the original constellation, and points that happen to lie outside the boundary region of the original constellation have to be assigned to the nearest point within the boundary region. Using an idea from [14], it is also possible to apply the LLL reduction to a matrix that corresponds to an MMSE equalization of the channel H r . While the fundamental switch in the asymptotic behaviour of the error rate curve occurs already with simple linear (zero-forcing) equalization of H red , this approach results in an additional performance improvement over both
July 18, 2005 REVISED VERSION

WINDPASSINGER ET AL.: A PERFORMANCE STUDY OF MIMO DETECTORS

linear and V-BLAST zero-forcing equalization, cf. also [15]. Lattice reduction is applied to the 4K 2K matrix, H red = with 2 =
2 n 2, x

Hr I

P ,

(6)

2 x being the variance of one component of xr , and apply the V-BLAST algorithm

to this matrix as described in [14], cf. also [12].

C. Subspace LRAD To overcome the boundary control problem of the LRAD scheme, in [7] we proposed to nd a list of candidate points { (i) }, i.e., points close to the received signal vector y r , using LRAD, x and proceed to select the most likely point from this set, giving preference to points that lie within the constellation A2K . r The block diagram corresponding to the subspace-based detector is depicted in [7, Fig. 4], where the block labeled LRdec contains the lattice-reduction-aided detector for 2K 1 di mensions. The computational structure is straightforward and consists of M 2K parallel detection paths followed by the selection element arg min. Thus, complexity of subspace LRAD is roughly 2K M times that of plain LRAD, and hence still polynomial in the number of dimensions K. If higher rate modulation (M = 16, . . . ) is used we can either x x r,k to all possible values in turn, or use a coset decomposition of the constellation lattice [7]. In the former case complexity grows with M , while in the latter case the complexity grows only with log( M ). Even though the complexity of this scheme is polynomial in K, we will see that it is still quite high. Sacricing some power efciency, we can save complexity by implementing the parallel (2K 1)-dimensional lattice-reduction-aided detectors only for the component that experiences the highest noise enhancement due to H 1 . The complexity of this version of subspace LRAD r is only 1/K that of the original version of [7]. The results in the following section will allow us to tell how much power efciency is lost by this complexity reduction.
July 18, 2005 REVISED VERSION

WINDPASSINGER ET AL.: A PERFORMANCE STUDY OF MIMO DETECTORS

IV. P ERFORMANCE

AND

C OMPLEXITY C OMPARISON

To evaluate the detection schemes described above, we performed Monte Carlo simulations of complex-valued K K systems, averaging over random H with i.i.d. unit-variance complex circularly symmetric Gaussian entries. As measure of power efciency we look at the minimum ratio of average received energy per bit (Eb ) to one-sided noise power spectral density (N0 ) required to achieve a given bit error rate. As measure of complexity we use the number of arithmetic operations required by the detector for each received vector. The calculations required only once for each realization of H are not counted (e.g., the LLL algorithm for LRAD, or the QR decomposition and matrix inversion for the sphere decoder algorithm). This is because we assume a block fading channel model, with H constant over bursts of lengths much larger than K. In this case the computations required for the actual processing of the received signal signicantly outnumber those for the preprocessing. Figs. 2 to 5 give the points in the complexity vs. power efciency plane (complexity-power diagrams) for the following schemes: 1) MMSE V-BLAST detection [1], 2) the standard sphere decoder algorithm, which implements maximum likelihood detection (cf. Section III-A, Fig. 1), labeled SD; the points are given for the average complexity, and the light gray shaded area shows the range over which complexity varied in the simulations, 3) the sphere decoder algorithm with MMSE V-BLAST frontend (cf. Section III-A, [3]), labeled mSD, with the dark gray shaded area showing the range of the complexity, 4) MMSE LRAD using the V-BLAST reordering (cf. Section III-B, [15]), labeled mLRAD, 5) subspace LRAD (cf. Section III-C, [7]), labeled S-LRAD (full/ coset for 16-QAM with 4 or 2 parallel detectors per component), where linear equalization is used for the LRAD, 6) subspace LRAD for M parallel detectors (cf. Section III-C), labeled S-LRAD (2), again linear equalization is used for the LRAD. The considered settings are (K = 4, 4-QAM) in Fig. 2, (K = 4, 16-QAM) in Fig. 3, (K = 8,
July 18, 2005 REVISED VERSION

WINDPASSINGER ET AL.: A PERFORMANCE STUDY OF MIMO DETECTORS

4-QAM) in Fig. 4, and (K = 8, 16-QAM) in Fig. 5, respectively. We can observe that for all settings the sphere decoder, and in particular the sphere decoder with MMSE V-BLAST frontend, is on the average only slightly more complex than LRAD, while offering the highest power efciency. MMSE V-BLAST is a particularly low-complexity solution, but the performance gap to the sphere decoder rapidly increases with growing constellation size and decreasing target error rate (cf., e.g., Figs. 2, 3). In terms of power efciency, LRAD bridges the performance gap between sphere decoding and MMSE V-BLAST detection. More specically, mLRAD is a favorable option for error rates below 103 . This is especially apparent when considering the maximum complexity of the sphere decoder algorithms, which signicantly rises with increasing constellation size (cf., e.g., Figs. 4 and 5) and growing dimension of the signal space (cf., e.g., Figs. 3 and 5). Finally, we note that the novel subspace LRAD scheme, S-LRAD (2), shows a power efciency vs. complexity tradeoff very similar to that of mLRAD. Hence, the S-LRAD (2) scheme is an interesting solution especially for large MIMO systems (parameters M and K). On the other hand, the original S-LRAD scheme from [7] turns out to be not very competitive as its complexity is often even larger than the maximum complexity of mSD. V. P ROCESSING D ELAY C OMPARISON The complexity measure number of arithmetic operations in Figs. 2 to 5 provides also a good indication of the processing delay entailed by the different detection schemes. In the case of S-LRAD/S-LRAD (2), however, the parallel detector structure is particularly well suited for a parallel implementation in custom hardware. Thus, a detector with low processing delay can be implemented, with the processing delay equal to that of the (linear equalization) LRAD scheme plus the arg min-block (see [7, Fig. 4]). Considering Figs. 2 to 5, the processing delay of a parallel implementation of the S-LRAD (2) scheme would be roughly on the level of mLRAD, since S-LRAD (2) works with linear detection and 2K 1 dimensional matrices, whereas the mLRAD scheme used V-BLAST detection.
July 18, 2005 REVISED VERSION

WINDPASSINGER ET AL.: A PERFORMANCE STUDY OF MIMO DETECTORS

In contrast to this, a parallel implementation does not seem feasible for the sphere decoder algorithm, and the processing delay is proportional to the complexity of the algorithm. Furthermore, if processing delay is an issue in a receiver implementation, the variable processing delay for the sphere decoder algorithm is troublesome. Here one needs to terminate the sphere decoder algorithm after a xed amount of computation. All other methods discussed, in particular mLRAD and S-LRAD (2), provide the detection result with constant processing delay. We therefore conclude that, if one considers the implementation of the receiver in custom hardware, the mLRAD and S-LRAD (2) schemes are the most attractive alternatives.

VI. C ONCLUSIONS In this contribution we have compared the complexities of various MIMO detection methods. We have introduced complexity-power diagrams that allow a comparison of the schemes according to computational effort and power efciency and visualize the tradeoff between these two gures of merit. It turns out that if average complexity is considered, the sphere decoder and particularly the MMSE V-BLAST frontend sphere decoder are both power efcient and of low complexity. However, the highly varying complexity might be a problem in a practical implementation. Lattice-reduction-aided detection (mLRAD) and a modied version of subspace lattice-reduction-aided detection (S-LRAD (2)) prove to be interesting alternatives if low error rates are required and larger constellations are applied. Both schemes are particularly attractive options when a low and constant processing delay is desirable,

R EFERENCES
[1] G. J. Foschini, G. D. Golden, R. A. Valenzuela, and P. W. Wolniansky. Simplied processing for high spectral efciency wireless communication employing multi-element arrays, IEEE Journal on Selected Areas in Communications, Vol. 17, pp. 18411852, November 1999. [2] C. Windpassinger, R. F. H. Fischer. Low-complexity near-maximum-likelihood detection and precoding for MIMO systems using lattice reduction, in Proc. IEEE Information Theory Workshop (ITW), Paris, France, March 2003. [3] O. Damen, H. El Gamal, and G. Caire. On maximum likelihood detection and the search for the closest lattice point, IEEE Transactions on Information Theory, Vol. 49, No. 10, pp. 23892402, Oct. 2003. [4] L. Babai. On Lov sz lattice reduction and the nearest lattice point problem. Combinatorica, 6(1):113, 1986. a

July 18, 2005

REVISED VERSION

WINDPASSINGER ET AL.: A PERFORMANCE STUDY OF MIMO DETECTORS

10

[5] H. Yao, G. W. Wornell. Lattice-reduction-aided detectors for MIMO communication systems, in Proc. IEEE Global Telecommunications Conference (Globecom, Taipei, Taiwan, Nov. 2002. [6] W. H. Mow. Universal lattice decoding: principle and recent advances, Wireless Communications and Mobile Computing, 3(5):553569. August 2003. [7] C. Windpassinger, L. Lampe and R. F. H. Fischer. From lattice-reduction-aided detection towards maximum-likelihood detection in MIMO systems, in Proc. International Conference on Wireless and Optical Communications, Banff, Canada, July 2003 [8] R. F. H. Fischer, C. Windpassinger. Real- vs. complex-valued equalisation in V-BLAST systems, Electronics Letters, Vol. 39, No. 5, pp. 470-471, March 2003 [9] O. Damen and A. Chkeif and J.-C. Belore. Lattice code decoder for space-time codes, IEEE Communications Letters, Vol. 4, No. 5, pp. 161-163, May 2000. [10] E. Agrell, T. Eriksson, A. Vardy, K. Zeger. Closest point search in lattices, IEEE Transactions on Information Theory, Vol. 48, No. 8, pp. 22012214, Aug. 2002. [11] B. Hassibi, H. Vikalo. On the expected complexity of integer least-squares problems, in Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando, Florida, USA, May 2002. [12] C. Windpassinger. Detection and Precoding for Multiple Input Multiple Output Channels. PhD thesis, Universit t a Erlangen-N rnberg, Germany, 2004. u [13] H. Cohen. A Course in Computational Algebraic Number Theory. Springer Verlag, Berlin, Germany, 1993. [14] B. B. Hassibi. An efcient square-root algorithm for BLAST. Bell Labs Technical Memorandum, available at

http://mars.bell-labs.com, 1999. [15] D. W bben, R. B hnke, V. K hn and K.D. Kammeyer. MMSE-based lattice-reduction for near-ML detection of MIMO u o u systems, in ITG Workshop on Smart Antennas, Munich, Germany, March 18-19, 2004.

July 18, 2005

REVISED VERSION

Figures

11

function decode(w, G) 1 bestdist = 2 k := 2K 3 distk := 0 4 ek = Gw 5 xk := ekk 6 xk := max(xk , 0); xk := min(xk , Qmax ) 7 y := (ekk xk )/gkk 8 stepk = sgn(y) 9 while (1) { 10 newdist := distk + y 2 11 if (newdist < bestdist (xk 0 xk Qmax ) ) { 12 if (k = 1) { 13 for (i = 1, . . . , k 1) { ek1,i := eki ygik } 14 k := k 1 15 distk := newdist 16 xk := ekk 17 xk := max(xk , 0); xk := min(xk , Qmax ) 18 y := (ekk xk )/gkk 19 stepk := sgn(y) 20 } 21 else { 22 x := x 23 bestdist := newdist 24 k := k + 1 25 xk := xk + stepk 26 stepk := stepk sgn(stepk ) 27 if (xk < 0 xk > Qmax ) { 28 xk := xk + stepk 29 stepk := stepk sgn(stepk ) 30 } 31 y := (ekk xk )/gkk 32 } 33 } 34 else { 35 if (k = 2K) { return x } 36 else { 37 k := k + 1 38 xk := xk + stepk 39 stepk := stepk sgn(stepk ) 40 if (xk < 0 xk > Qmax ) { 41 xk := xk + stepk 42 stepk := stepk sgn(stepk ) 43 } 44 y := (ekk xk )/gkk 45 } 46 } 47 } Fig. 1. Pseudocode for sphere decoder algorithm [10] (modications for nite constellations shown shaded).
July 18, 2005 REVISED VERSION

Figures

12

10

PSfrag replacements

Number of Operations

S-LRAD

BER=101 BER=102 BER=103 BER=104 BER=105

10

SD mSD S-LRAD (2) mLRAD MMSE V-BLAST

10
lin. MMSE

10

15

20

25

30

10 log10 (Eb /N0 ) [dB]

Fig. 2. Power-complexity diagram for the various detection strategies, K = 4, 4-QAM.

July 18, 2005

REVISED VERSION

Figures

13

10

S-LRAD (full) S-LRAD (coset)

PSfrag replacements

Number of Operations

BER=101 BER=102 BER=103 BER=104 BER=105

10

SD S-LRAD (2) mSD mLRAD MMSE V-BLAST

10
lin. MMSE

10

15

20

25

30

35

10 log10 (Eb /N0 ) [dB]

Fig. 3. Power-complexity diagram for the various detection strategies, K = 4, 16-QAM.

July 18, 2005

REVISED VERSION

Figures

14

10

S-LRAD

Number of Operations

PSfrag replacements

10

SD

BER=101 BER=102 BER=103 BER=104 BER=105

mSD S-LRAD (2)


3

10

mLRAD MMSE V-BLAST

10
lin. MMSE

10
10 log10 (Eb /N0 ) [dB]

15

20

Fig. 4. Power-complexity diagram for the various detection strategies, K = 8, 4-QAM.

July 18, 2005

REVISED VERSION

Figures

15

10

PSfrag replacements

Number of Operations

10

S-LRAD (full) S-LRAD (coset)


4

BER=101 BER=102 BER=103 BER=104 BER=105

10

mSD

SD S-LRAD (2)

10

mLRAD MMSE V-BLAST

10
lin. MMSE

10

15

20

25

30

10 log10 (Eb /N0 ) [dB]

Fig. 5. Power-complexity diagram for the various detection strategies, K = 8, 16-QAM.

July 18, 2005

REVISED VERSION

You might also like