0% found this document useful (0 votes)
26 views

Learning To Detect

This article proposes using deep neural networks for MIMO detection. It introduces two network architectures - a fully connected multi-layer network and a Detection Network (DetNet) designed specifically for the task by unfolding iterations of a projected gradient descent algorithm. The networks achieve state-of-the-art MIMO detection performance while maintaining low computational complexity. Furthermore, a single network can be trained to detect over different channel distributions. The networks can also be modified to produce soft detection outputs rather than just hard decisions.

Uploaded by

Seckin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Learning To Detect

This article proposes using deep neural networks for MIMO detection. It introduces two network architectures - a fully connected multi-layer network and a Detection Network (DetNet) designed specifically for the task by unfolding iterations of a projected gradient descent algorithm. The networks achieve state-of-the-art MIMO detection performance while maintaining low computational complexity. Furthermore, a single network can be trained to detect over different channel distributions. The networks can also be modified to produce soft detection outputs rather than just hard decisions.

Uploaded by

Seckin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2019.2899805, IEEE
Transactions on Signal Processing
1

Learning to Detect
Neev Samuel, Member, IEEE, and Tzvi Diskin, Member, IEEE and Ami Wiesel, Member, IEEE

Abstract—In this paper we consider Multiple-Input-Multiple- better computational complexity with a rather low accuracy
Output (MIMO) detection using deep neural networks. We performance degradation relative to the full search. In the
introduce two different deep architectures: a standard fully other regime, the most common suboptimal detectors are the
connected multi-layer network, and a Detection Network (DetNet)
which is specifically designed for the task. The structure of linear receivers, i.e., the matched filter (MF), the decorrelator
DetNet is obtained by unfolding the iterations of a projected or zero forcing (ZF) detector and the minimum mean squared
gradient descent algorithm into a network. We compare the error (MMSE) detector. More advanced detectors are based on
accuracy and runtime complexity of the proposed approaches decision feedback equalization (DFE), approximate message
and achieve state-of-the-art performance while maintaining low passing (AMP) [5] and semidefinite relaxation (SDR) [6], [7].
computational requirements. Furthermore, we manage to train a
single network to detect over an entire distribution of channels. Currently, both AMP and SDR provide near optimal accuracy
Finally, we consider detection with soft outputs and show that under many practical scenarios. AMP is simple and cheap to
the networks can easily be modified to produce soft decisions. implement in practice, but is an iterative method that may
Index Terms—MIMO Detection, Deep Learning, Neural Net- diverge in challenging settings. SDR is more robust and has
works. polynomial complexity, but is limited in the constellations it
addresses and is much slower in practice.
I. I NTRODUCTION
B. Background on Machine Learning
M ULTIPLE input multiple output (MIMO) systems en-
able enhanced performance in communication systems,
by using many dimensions that account for time and frequency
Supervised machine learning is the ability to solve sta-
tistical problems using examples of inputs and their desired
resources, multiple users, multiple antennas and other re- outputs. Unlike classical hypothesis testing, it is typically
sources. While improving performance, these systems present used when the underlying distributions are unknown and
difficult computational challenges when it comes to detection are characterized via sample examples. In recent years, the
since the problem is NP-Complete, and there is a growing field witnessed the deep revolution. The “deep” adjective is
need for sub-optimal solutions with polynomial complexity. associated with the use of complicated and expressive classes
Recent advances in the field of machine learning, specif- of algorithms, also known as architectures. These are typically
ically the success of deep neural networks in solving many neural networks with many non-linear operations and layers.
problems in almost any field of engineering, suggest that a Deep architectures are more expressive than shallow ones
data driven approach for detection using machine learning may and can theoretically solve much harder and larger problems
present a computationally efficient way to achieve near optimal [8], but were previously considered impossible to optimize.
detection accuracy. With the advances in big data, optimization algorithms and
stronger computing resources, such networks are currently
A. MIMO detection state of the art in different problems from speech processing
[9], [10] and computer vision [11], [12] to games [13]. Typical
MIMO detection is a classical problem in simple hypothesis solutions involve dozens and even hundreds of layers which
testing [1]. The maximum likelihood (ML) detector involves are slowly optimized off-line over clusters of computers,
an exhaustive search and is the optimal detector in the sense to provide accurate and cheap decision rules which can be
of minimum joint probability of error for detecting all the applied in real-time. In particular, one promising approach
symbols simultaneously. Unfortunately, it has an exponential to designing deep architectures is by unfolding an existing
runtime complexity which makes it impractical in large real iterative algorithm [14]. Each iteration is considered a layer
time systems. and the algorithm is called a network. The learning begins
In order to overcome the computational cost of the maxi- with the existing algorithm as an initial starting point and uses
mum likelihood decoder there is considerable interest in imple- optimization methods to improve the algorithm. For example,
mentation of suboptimal detection algorithms which provide a this strategy has been shown successful in the context of sparse
better and more flexible accuracy versus complexity tradeoff. reconstruction [15], [16]. Leading algorithms such as Iterative
In the high accuracy regime, sphere decoding algorithms [2], Shrinkage and Thresholding and a sparse version of AMP
[3], [4] were proposed, based on lattice search, and offering have both been improved by unfolding their iterations into
Manuscript received May, 2018; a network and learning their optimal parameters.
N. Samuel, T. Diskin and A. Wiesel are with the School of Computer Following this revolution, there is a growing body of
Science and Engineering, The Hebrew University of Jerusalem, Israel. E-mail: works on deep learning methods for communication systems.
([email protected] or see http://www.cs.huji.ac.il/ amiw/).
This research was partly supported by the Heron Consortium and by ISF Exciting contributions in the context of error correcting codes
grant 1339/15. include [17]–[21]. In [22] a machine learning approach is

1053-587X (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2019.2899805, IEEE
Transactions on Signal Processing
2

considered in order to decode over molecular communica- and =(·) respectively. An α-Toeplitz matrix M will be defined
tion systems where chemical signals are used for transfer as a matrix such that [M T M ]i,j = α|i−j| .
of information. In these systems an accurate model of the
channel is impossible to find. This approach of decoding II. P ROBLEM FORMULATION
without CSI (channel state information) is further developed
in [23]. Machine learning for channel estimation is considered A. MIMO detection
in [24], [25]. End-to-end detection over continuous signals is We consider the standard linear MIMO model:
addressed in [26]. Joint learning of transmitters and receivers
is considered in [27]. Parts of our work on MIMO detection ȳ = H̄x̄ + w̄, (1)
using deep learning have already appeared in [28], see also where ȳ ∈ CN is the received vector, H̄ ∈ CN ×K is the
[29]. Similar ideas were discussed in [30] in the context of channel matrix, x̄ ∈ S̄K is an unknown vector of independent
robust regression. and equal probability symbols from some finite constellation
S̄ (e.g. PSK or QAM), w̄ is a noise vector of size N with
C. Main contributions independent, zero mean complex normal variables of variance
The main contribution of this paper is the introduction of σ2 .
two deep learning networks for MIMO detection. We show Our detectors do not assume knowledge of the noise
that, under a wide range of scenarios including different variance σ 2 . Hypothesis testing theory guarantees that it is
channel models and various digital constellations, our net- unnecessary for optimal detection [1]. Indeed, the ML rule
works achieve near optimal detection performance with low does not depend on it. This is in contrast to the MMSE and
computational complexity. AMP decoders that exploit this parameter and are therefore
Another important result we show is their ability to easily less robust in cases where the noise variance is not known
provide soft outputs as required by modern communication exactly.
systems. We show that for different constellations the soft out-
put of our networks achieve accuracy comparable to that of the
B. Reparameterization
M-Best sphere decoder with low computational complexity.
In a more general learning perspective, an important con- A main challenge in MIMO detection is the use of complex
tribution is DetNet’s ability to perform on multiple models valued signals and various digital constellations S̄ which are
with a single training. Recently, there were works on learning less common in machine learning. In order to use standard
to invert linear channels and reconstruct signals [15], [16], tools and provide a unified framework, we re-parameterize the
[31]. To the best of our knowledge, these were developed and problem using real valued vectors and one-hot mappings as
trained to address a single fixed channel. In contrast, DetNet is described below.
designed for handling multiple channels simultaneously with First, throughout this work, we avoid handling complex
a single training phase. valued variables, and use the following convention:
The paper is organized as follows. In section II we present
y = Hx + w, (2)
the MIMO detection problem and how it is formulated as a
learning problem including the use of one-hot representations. where
In section III we present two types of neural network based      
<(ȳ) <(w̄) <(x̄)
detectors, FullyCon and DetNet. In section IV we consider y = ,w = ,x = ,
=(ȳ) =(w̄) =(x̄)
soft decisions. In section V we compare the accuracy and  
the runtime of the proposed learning based detectors against <(H̄) − =(H̄)
H = (3)
traditional detection methods both in the hard decision and the =(H̄) <(H̄)
soft decision cases. Finally, section VI provides concluding
where y ∈ R2N is the received vector, H ∈ R2N ×2K is the
remarks.
channel matrix and x ∈ S2K where S = <{S̄} (which is also
equal to ={S̄} in the complex valued constellations we tested)
D. Notation A second convention concerns the re-parameterization of
 where µ is
In this paper, we define the normal distribution the discrete constellations S = {s1 , · · · , s|S| } using one-hot
the mean and σ 2 is the variance as N µ, σ 2 . The uniform mapping. With each possible si we associate a unit vector
distribution with the minimum value a and the maximum value ui ∈ R|S| . For example, the 4 dimensional one-hot mapping
b will be U (a, b) . Boldface uppercase letters denote matrices. of the real part of 16-QAM constellations is defined as
T
Boldface lowercase letters denote vectors. The superscript (·)
denotes the transpose. The i’th element of the vector x will be s1 = −3 ↔ u1 = [1, 0, 0, 0]
denoted as xi . Unless stated otherwise, the term independent s2 = −1 ↔ u2 = [0, 1, 0, 0]
and identically distributed (i.i.d.) Gaussian matrix, refers to a s3 = 1 ↔ u3 = [0, 0, 1, 0]
matrix where each of its elements is i.i.d. sampled from the s4 = 3 ↔ u4 = [0, 0, 0, 1] (4)
normal distribution N (0, 1). The rectified linear unit defined
as ρ(x) = max{0, x}. When considering a complex matrix or We denote this mapping via the function s = foh (u) so
vector the real and imaginary parts of it are defined as <(·) that si = foh (ui ) for i = 1, · · · , |S|. More generally, for

1053-587X (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2019.2899805, IEEE
Transactions on Signal Processing
3

approximate inputs which are not unit vectors, the function is input/output
defined as Learned Variables
variables
|S|
X
x = foh (xoh ) = si [xoh ]i (5) - Multiplication + - Addition ρ - Relu Activation
i=1

The description above holds for a scalar symbol. The MIMO


Wk bk
model involves a vector of 2K symbols which is handled
by stacking the one-hot mapping of each of its elements.
|S|·2K
Altogether, a vector xoh ∈ {0, 1} is mapped to x ∈ S2K .
qk X + ρ qk+1
C. Learning to detect
We end this section by formulating the MIMO detec- Fig. 1. A flowchart representing a single layer of the fully connected network.
tion problem as a machine learning task. The first step in
machine learning is choosing a class of possible detectors,
also known as an architecture. A network architecture is a as ’FullyCon’. It is composed of L layers, where the output
function x̂oh (H, y; θ) parameterized by θ that detects the of each layer is the input of the next layer. Each layer can be
unknown xoh given y and H. Learning is the problem of described by the following equations:
finding the θ within some feasible set that will lead to strong q1 = y
detectors x̂oh (H, y; θ). For this purpose, we fix a loss function qk+1 = ρ (Wk qk + bk )
l (xoh ; x̂oh (H, y; θ)) that measures the distance between the
true vectors and their estimates. Then, we find the network’s x̂oh = WL qL + bL
parameter θ by minimizing the loss function over the MIMO x̂ = foh (x̂oh ) (7)
model distribution:
An illustration of a single layer of FullyCon can be seen in
min E {l (xoh ; x̂oh (H, y; θ))} , (6) Fig 1. The parameters of the network that are optimized during
θ
the learning phase are:
where the expectation is with respect to all the random L
variables in (2), i.e., x, w, and H. Learning to detect is defined θ = {Wk , bk }k=1 . (8)
as finding the best parameters θ of the networks’ architecture The loss function used is a simple l2 distance between the
that minimize the expected loss l (·; ·) over the distribution in estimated signal and the true signal:
(2).
We always assume perfect channel state information (CSI) l (xoh ; x̂oh (H, y; θ)) = kxoh − x̂oh k2 (9)
which means that the channel H is exactly known during de-
tection time. However, we differentiate between two possible FullyCon is simple and general purpose. It has a relatively
cases: small number of parameters to optimize. It only uses the
• Fixed Channel (FC): In the FC scenario, H is deter- input y, and does not exploit the channel H within (7). The
ministic and constant (or a realization of a degenerate dependence on the channel is indirect via the expectation in
distribution which only takes a single value). This means (6) which depends on H and leads to parameters that depend
that during the training phase we know over which on its moments. The result is a simple and straight forward
channel the detector will detect. structure which is ideal for detection over the FC model.
• Varying Channel (VC): In the VC scenario, we assume As will be detailed in the simulations section, it manages to
that H is random with a known continuous distribution. It achieve almost optimal accuracy with low complexity. On the
is still completely known but changes in each realization, other hand, our experiences with FullyCon for the VC model
and a single detection algorithm must be designed for all led to disappointing results. We tried to add the channel matrix
its possible realizations. When detecting, the channel is H, reshaped as a vector, to the input yet this attempt failed.
randomly chosen, and the network must be able to gen- The network did not manage to capture the dependencies of
eralize over the entire distribution of possible channels. changing channels. In the next subsection, we propose a more
Altogether, our goal is to detect x, using a neural network expressive architecture specifically designed for addressing
that receives y and H as inputs and provides an estimate x̂. In this challenge.
the next section, we will introduce two competing architectures
that tradeoff accuracy and complexity. B. DetNet
In this section, we present an architecture designed specif-
III. D EEP MIMO DETECTORS ically for MIMO detection that we will call ’DetNet’ (ab-
A. FullyCon breviation of ’detection network’). The derivation begins by
noting that an efficient MIMO detector should not work with
The fully connected multi-layer network is a well known
y directly, but use the compressed sufficient statistic:
architecture which is considered to be the basic deep neural
network architecture, and from now on will be named simply HT y = HT Hx + HT w. (10)

1053-587X (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2019.2899805, IEEE
Transactions on Signal Processing
4

This hints that two main ingredients in the architecture should to compute complicated features using many layer. In order to
be HT y and HT Hx. Second, our construction is based on solve this problem we added the vˆk in (12) variable that allows
mimicking a projected gradient descent like solution for the the network to pass unconstrained information from one layer
maximum likelihood optimization. Such an algorithm would to another.
lead to iterations of the form In our final implementation, in order to further enhance
" # the performance of DetNet, we added a residual feature from
∂ky − Hxk2
x̂k+1 = Π x̂k − δk ResNet [11] where the output of each layer is a weighted
∂x x=x̂k average with the output of the previous layer:
T T
 
= Π x̂k − δk H y + δk H Hxk , (11) x̂k = αx̂k−1 + (1 − α)x̂k . (15)
where x̂k is the estimate in the k’th iteration, Π[·] is a
IV. S OFT DECISION OUTPUT
nonlinear projection operator, and δk is a step size. Intuitively,
each iteration (executed by a single layer in DetNet) is a linear In this section, we consider a more general setting in
combination of the xk , HT y, and HT Hxk followed by a which the MIMO detector needs to provide soft outputs.
non-linear projection. We enrich these iterations by lifting the High end communication systems typically resort to iterative
input to a higher dimension in each iteration and applying decoding where the MIMO detector and the error correcting
standard non-linearities which are common in deep neural decoder iteratively exchange information on the unknowns
networks. In order to further improve the performance we treat until convergence. For this purpose, the MIMO detector must
the gradient step sizes δK at each step as a learned parameter replace its hard estimates with soft posterior distributions
and optimize them during the training phase. This yields the Prob(xj = si |y) for each unknown j = 1, · · · , 2K and each
following architecture: possible symbol i = 1, · · · , |S|. More precisely, it also needs
to allow additional soft inputs but we leave this for future
work.
qk = x̂k−1 − δ1k HT y + δ2k HT Hx̂k−1 Computation of the posteriors is straight forward based on
Bayes law, but its complexity is exponential in the size of the
   
qk
zk = ρ W1k + b1k signal and constellation. Similarly to the maximum likelihood
vk−1
x̂oh,k = W2k zk + b2k algorithm in the hard decision case, this computation yields
optimal accuracy yet is intractable. Thus, the goal in this
x̂k = foh (x̂oh,k ) section is to design networks that approximate the posteriors.
v̂k = W3k zk + b3k On first glance, this seems difficult to learn as we have no
x̂0 = 0 training set of posteriors and cannot define a loss function.
v̂0 = 0, (12) Remarkably, this is not a problem and the probabilities of
arbitrary constellations can be easily recovered using the stan-
with the trainable parameters dard l2 loss function with respect to the one-hot representation
L xoh . Indeed, consider a scalar x and a single s ∈ S associated
θ = {W1k , b1k , W2k , b2k , W3k , b1k , δ1k , δ2k }k=1 . (13)
with its one-hot bit xoh then it is well known that
Note the similarity between (11) and the computation of qk . 2
arg min E[||xoh − x̂oh || |y] = E[xoh |y] (16)
When computing qk at each layer, an explicit gradient descent x̂oh
is computed, with learnable step sizes. To ensure that our = xest (17)
networks have the advantages of wide neural networks, the which is a vector of the size of S that satisfies:
parameters W1k are m × n matrices where m > n, which
means that multiplying by W1k increases the dimension of the xest,i = Prob(xoh,i = 1|y)
input. The final estimate is defined as x̂L . For convenience, = Prob(x = si |y)
the structure of each DetNet layer is illustrated in Fig. 2.
Assuming that our network is sufficiently expressive and
Training deep networks is a difficult task due to vanishing
globally optimized, the one-hot output xˆoh will provide the
gradients, saturation of the activation functions, sensitivity to
exact posterior probabilities. Therefore, in practice, the one-
initialization and more [32]. To address these challenges we
hot output will approximate the true posterior probabilities.
adopted a loss function that takes into account the outputs
of all of the layers, an idea following the notion of auxiliary V. N UMERICAL R ESULTS
classifiers presented in GoogLeNet [12]:
In this section, we provide numerical results on the accuracy
L
X and complexity of the proposed networks in comparison to
l (xoh ; x̂oh (H, y; θ)) = log(l)kxoh − x̂oh,l k2 . (14) competing methods.
l=1 In the FC case, the results are over the 0.55-Toeplitz
The downside of this loss function is that it forces the output channel.
of each layer x̂oh,k to be close to xoh in order to minimize the In the VC case and when testing the soft output perfor-
loss function. This will mean that the only information passing mance, the results presented are over random channels, where
from one layer to the next one will be estimated xoh,k . This each element is sampled i.i.d. from the normal distribution
means that in practice we lose the ability of the deep network N (0, 1).

1053-587X (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2019.2899805, IEEE
Transactions on Signal Processing
5

δ1,K Wk,1 input/output Learned


bk,1
variables Variables

HTy - Multiplication
X + Wk,3 bk,3
HTH
+ - Addition

xk X + Con ρ X + XOh,k+1 foh Xk+1


foh - One-Hot Function
δ2,k
X + Vk+1 Con - Concatenation

vk Wk,2 bk,2 - Relu Activation


ρ

Fig. 2. A flowchart representing a single layer of DetNet. The network is composed out of L layers as such where each layers’ output is the next layers’
input

A. Implementation details In the hard decision scenarios, we tested our deep networks
We train both networks using a variant of the stochastic gra- against the following detection algorithms:
dient descent method [33], [34] for optimizing deep networks, ZF: This is the classical decorrelator, also known as least
named Adam Optimizer [35]. All networks were implemented squares or zero forcing (ZF) detector [1].
using the Python based TensorFlow library [36]. DF: The decision feedback equalizer algorithm.
In the Fullycon network, the number of layers was 6 and AMP: Approximate message passing algorithm from [5].
each hidden layer had 10K neurons. SDR: A decoder based on semidefinite relaxation imple-
In the case of the DetNet architecture we used 30 layers in mented using an efficient interior point solver [6], [7].
all constellations and all channel sizes presented in this paper. For the 8-PSK constellation we implemented the SDR
In the hard decision case the sizes of zk were 4K, 4K, variation suggested in [37].
8K and 12K for the BPSK, QPSK, 16QAM and 8PSK SD: An implementation of the sphere decoding algorithm as
constellations respectively and the sizes of vK were 2K, presented in [38].
2K, 4K and 4K for the BPSK, QPSK, 16QAM and 8PSK In the soft output case, we tested our networks against
constellations respectively. zk and vk are not dependent of k. the M-Best sphere decoding algorithm as presented in [3]
In the soft decision case the sizes of zk was 12K and size (originally named K-Best, but changed here to avoid confusion
of vK was 4K for all of the constellations tested. with K the transmitted signal size):
We trained FullyCon for 1,000,000 iterations with a batch M-Best SD M=5: The M-Best sphere decoding algorithm,
size of 1,000 samples and DetNet for approximately 100,000 where the number of candidates we keep is 5.
iterations with a batch size of 2,000 samples (the number M-Best SD M=7: Same as M-Best SD M=5 with 7 candi-
of iterations varied slightly depending on the constellation). dates.
We used a decaying learning rate with the starting rate of
0.0008 and a decay rate of 0.97 every 1000 iterations (the exact C. Accuracy results
values might vary slightly between constellations. To give a 1) Fixed Channel (FC): In the case of the FC scenario,
rough idea of the computation needed during the learning where we know during the learning phase over what realization
phase, optimizing the detectors in our numerical results in of the channel we need to detect, the performance of both our
both architectures took around 3 days on a standard Intel networks was comparable to most of the competitors except
i7-6700 processor. Each sample was independently generated SD. Both DetNet and FullyCon managed to achieve accuracy
from (2) according to the statistics of x, H (either in the results comparable to SDR and AMP. This result emphasizes
FC or VC model) and w. During training, the noise variance the notion that when learning to detect over simple scenarios as
was randomly generated so that the SNR will be uniformly FC, a simple network is expressive enough. And since a simple
distributed on U (SNRmin − 1, SNRmax + 1) where SNRmin network is easier to optimize and has lower complexity, it is
and SNRmax are the minimal and maximal SNR values over preferable. Fig. 3 we present the accuracy rates over a range of
which we used the network. SNR values in the FC model. This is a rather difficult setting
and algorithms such as AMP did not succeed to converge.
B. Competing algorithms 2) Varying channel: In the VC case, the accuracy results of
FullyCon were poor and the network did not manage to learn
When presenting our network performance we shall use the how to detect properly. DetNet managed to achieve accuracy
following naming conventions: rates comparable to those of SDR and AMP, and almost
FullyCon: The basic fully-connected deep architecture. comparable to those of SD, while being computationally
DetNet: The DetNet deep architecture. cheaper (see next section regarding computational resources).

1053-587X (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2019.2899805, IEEE
Transactions on Signal Processing
6

−1
10

−2
−2 10
10

SER
−3
BER

10
−3
ZF 10 DF
DF AMP
SDR
SDR
−4 AMP −4
10 10 SD
SD
DetNet
DetNet
−5 FullyCon 8 9 10 11 12 13
10 SNR (dB)
8 9 10 11 12 13
SNR (dB)
Fig. 5. Comparison of the detection algorithms BER performance in the
varying channel case over a QPSK modulated signal. All algorithms were
Fig. 3. Comparison of the detection algorithms BER performance in the fixed
tested on channels of size 30x20.
channel channel case over a BPSK modulated signal.

−1
10
−2
10
SER
−2
−3 10 ZF
10
BER

ZF
DF
DF
AMP
−4 AMP
10 −3 SD
SDR 10
DetNet
SD
8 9 10 11 12 13
−5 DetNet SNR (dB)
10
8 9 10 11 12 13
SNR (dB) Fig. 6. Comparison of the detection algorithms SER performance in the
varying channel case over a 16-QAM modulated signal. All algorithms were
Fig. 4. Comparison of the detection algorithms BER performance in the tested on channels of size 25X15.
varying channel case over a BPSK modulated signal. All algorithms were
tested channels of size 60x30.

this scenario we generated channels where at each column the


In Fig. 4 we compare the accuracy results over a 60 × 30 values are Gaussian with zero mean and a correlation matrix
real valued channel with BPSK signals and in Fig. 5 we of a uniform linear array based on the one-ring model with
compare the accuracy of a 30 × 20 complex channel with a random value of angular spread. Note that each column
QPSK symbols. In both cases DetNet achieves accuracy rates has a different correlation matrix chosen randomly for each
comparable to SDR and AMP and near SD, and accuracy much channel instance. In Fig. 8 we compare the accuracy of DetNet
better than ZF and DF. Results over larger constellations are in this settings over a 15 × 10 complex channel with the
presented in Fig. 6 and 7 where we compare the accuracy rates QPSK constellation. We noticed that the performance of all
over complex channels of size 25×15 for the 16-QAM and 8- detection algorithms besides the Sphere decoder was poor.
PSK constellations respectively.We can see that in those larger While performing better the AMP and the decision feedback
constellations DetNet performs better then AMP and SDR. algorithms, DetNet performed worse than the SDR detector.
For both constellations we can observe that DetNet reaches DetNet does not manage to detect with high accuracy rates
accuracy levels topped only by SD. over channels where the correlation of the columns values is
high.
D. Correlated Channels 1) Soft Outputs: We also experimented with soft decoding.
In order to evaluate the accuracy of DetNet in more Implementing a full iterative decoding scheme is outside the
challenging scenarios, we tested DetNet over channels with scope of this paper, and we only provide initial results on
covariance matrices of a uniform linear array based on the the accuracy of our posterior estimates. For this purpose, we
one-ring model [39]. The covariance matrices were generated examined smaller models where the exact posteriors can be
using the code provided in [40] and available in github1 . In computed exactly and measured their statistical distance to
our estimates.
1 https://github.com/emilbjornson/massive-MIMO-hardware-impairments/ In order to compare between the estimated distribution and

1053-587X (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2019.2899805, IEEE
Transactions on Signal Processing
7

−2
10

Average Distance From


−2
10

Posterior Probability
SER

−4 −3
10 10
SDR
M−Best M=5
SD
M−Best M=7
−6 DetNet
10 DetNet
19 20 21 22 23 24
SNR (dB) 8 9 10 11 12 13
SNR (dB)
Fig. 7. Comparison of the detection algorithms SER performance in the
varying channel case over a 8-PSK modulated signal. All algorithms were Fig. 9. Comparison of the accuracy of the soft output relative to the posterior
tested on channels of size 25X15. probability in the case of a BPSK signal over a 20 × 10 real valued channel.

0
10

Average Distance From


−1
10

Posterior Probability
−2
10
SER

−2
10
DF
AMP
−3
10 SDR
M−Best M=5
SD M−Best M=7
−4 DetNet DetNet
10 −3
10
8 10 12 14 16 18 8 9 10 11 12 13
SNR (dB) SNR (dB)
Fig. 8. Comparison of the detection algorithms SER performance in the Fig. 10. Comparison of the accuracy of the soft output relative to the posterior
varying channel where each column is correlated according to a one-ring probability for a 16-QAM signal over an 8 × 4 complex valued channel.
model correlation matrix created with a random parameter of the angular
spread over a QPSK modulated signal. All algorithms were tested on channels
of size 15x10.
output where DetNet is comparable to the M-Best algorithms
only in the high SNR region.
the posterior probability distribution we shall use the Jensen-
Shannon divergence that measures the similarity between two E. Computational Resources
discrete probability distributions, P and Q defined as:
1) FullyCon and DetNet run time: In order and estimate
1 1 the computational complexity of the different detectors we
JSD(P, Q) = DKL (P, M ) + DKL (Q, M ) compared their run time. Comparing complexity is non-trivial
2 2
1 due to many complicating factors as implementation details
M = (P + Q)
2   and platforms. To ensure fairness, all the algorithms were
X P (i) tested on the same machine via python 2.7 environment
DKL (P, Q) = P (i)log (18)
Q(i) using the Numpy package. The networks were converted from
i=1
TensorFlow objects to Numpy objects. We note that the run-
In Fig. 9 we present the Jensen-Shannon divergence between time of SD depends on the SNR, and we therefore report a
the estimated prediction of DetNet and the posterior probabil- range of times.
ity distribution in the case of a BPSK signal over a 10x20 real An important factor when considering the run time of the
channel. In this setting we reach smaller divergence than that neural networks is the effect the batch size. Unlike classical
achieved by the M-Best algorithm. As seen in Fig. 9 adding detectors as SDR and SD, neural networks can detect over
additional layers improves the accuracy of the soft output. In entire batches of data which speeds up the detection process.
Fig. 10 we present the results over a 8x4 complex channel with This is true also for the AMP algorithm, where computation
16-QAM constellation. We can see the performance of DetNet can be made on an entire batch of signals at once. However, the
is comparable to the M-Best Sphere decoding algorithm. For improvement introduced by using batches is highly dependent
completeness, in Fig. 11 we added the 8-PSK constellation soft on the platform used (CPU/GPU/FPGA etc). Therefore, for

1053-587X (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2019.2899805, IEEE
Transactions on Signal Processing
8

−1
Constellation Batch DetNet SDR AMP SD
10 channel size size
BPSK 1 0.0066 0.024 0.0093 0.008
Average Distance From

60X30 -0.1
Posterior Probability

BPSK 10 0.0011 0.024 0.0016 0.008


60X30 -0.1
BPSK 100 0.0005 0.024 0.00086 0.008
60X30 -0.1
16-QAM 1 0.006 - 0.01 0.01
−2 25X15 -0.4
10 16-QAM 10 0.0014 - 0.002 0.01
25X15 -0.4
M−Best M=5 16-QAM 100 0.0003 - 0.001 0.01
M−Best M=7 25X15 -0.4
8-PSK 1 0.019 0.021 - 0.004
DetNet 25X15 -0.06
19 20 21 22 23 24 25 26 8-PSK 10 0.0029 0.021 - 0.004
SNR (dB) 25X15 -0.06
8-PSK 100 0.0005 0.021 - 0.004
Fig. 11. Comparison of the accuracy of the soft output relative to the posterior 25X15 -0.06
probability for a 8-PSK signal over an 8 × 4 complex valued channel. TABLE II
RUN T IME C OMPARISON IN VC. D ET N ET IS COMPARED WITH THE
SDR,AMP AND S PHERE D ECODING ALGORITHMS
completeness, we present the run time for several batch sizes
including batch size equal to one.
In table I the run times are presented for hard decision Constellation Batch DetNet M-Best M-Best
channel size size (M=5) (M=7)
detection in a FC case. We can see that FullyCon is faster
BPSK 20X10 1
0.0075 0.006 0.008
than all other detection algorithms, even without using batches. BPSK 20X10 10
0.00092 0.006 0.008
DetNet is slightly faster than traditional detection algorithms BPSK 20X10 100
0.00029 0.006 0.008
without using batches, yet when using batches, the run time 16-QAM 8X4 1
0.006 0.008 0.01
improves significantly compared to other detection methods. 16-QAM 8X4 10
0.0008 0.008 0.01
16-QAM 8X4 100
0.0001 0.008 0.01
Channel Batch FullyCon DetNet SDR AMP SD 8-PSK 8X4 1
0.02 0.05 0.07
size size 8-PSK 8X4 10
0.003 0.05 0.07
8-PSK 8X4 100
0.0012 0.05 0.07
0.55-Toeplitz 1 4E-4 5E-3 9E-3 5E-3 0.001 TABLE III
60x30 -0.01 RUN T IME C OMPARISON OF S OFT O UTPUT IN VC. T HE D ET N ET IS
0.55-Toeplitz 10 6.6E-05 7E-4 9E-3 E-3 0.001 COMPARED WITH THE M-B EST S PHERE D ECODING ALGORITHM
60x30 -0.01
0.55-Toeplitz 100 2.4E-05 1.6E-04 9E-3 3E-4 0.001
60x30 -0.01
0.55-Toeplitz 1000 1.6E-05 1.1E-04 9E-3 3E-4 0.001
60x30 -0.01 compare the computational resource required by an algorithm
TABLE I
F IXED C HANNEL RUNTIME C OMPARISON is counting the required floating point operations needed.
In table IV we present the Flops count for the different
algorithms, as a function of the different parameters of the
In table II we present the results for the VC setting. In the algorithms. We shall define those parameters:
BPSK case the relative time difference between the different LDetNet : The number of layers in the DetNet architecture.
detection algorithms is similar to the FC case, with the excep- LSDR : The number of iterations for an average run of SDR
tion of SD being relatively slower. In larger constellations (8- until it converges.
PSK/16-QAM) DetNet’s relative advantage when comparing LAMP : The number of iterations of the AMP algorithm.
against AMP/SDR is smaller than in the BPSK case (and in the Pos: The number of Flops required to calculate the message
16-QAM constellation AMP was slightly faster without using mean for a single element of the signal in the AMP
batches). The reason is that these accurate detection with these algorithm.
constellations requires larger networks. On the other hand, the Nodes: The average number of nodes visited at each run of
relative performance vs SD improved. the sphere decoding algorithm.
In table III we compare the run time of the detection Hid: The size of zk in DetNet.
algorithms in the soft-output case.As we can see, in the BPSK Aux: The size of vk in DetNet.
case without using batches the performance of DetNet is Con: The size of the constellation.
comparable to the performance of the M-Best sphere decoders, The values of Lsdr and N odes which are not chosen in
and using batches improves the performance significantly. In advance were calculated by running the simulation many times
the 16-QAM/8-PSK cases DetNet is slightly faster than the and finding the average value. The number of Flops required
M-Best decoders even without using batches. to calculate non-linear operations was determined by referring
2) FullyCon and DetNet Flops estimation: Since runtime to [41]. In table IV we present the Flops count as a function
of an algorithm might be affected significantly by the specific of the different parameters of the algorithms. In order and
hardware and software implementation, an additional way to keep it clear we omitted the Flops count of operations that

1053-587X (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2019.2899805, IEEE
Transactions on Signal Processing
9

8
10
6
10

Flops Count
Flops Count

6
10
4
SDR 10
AMP SDR
4
10 SD Low SNR SD High SNR
SD High SNR SD Low SNR
DetNet 2 DetNet
10
10 20 30 10 20 30
K size K Size

Fig. 12. Flops count for different algorithms over different K sizes for the Fig. 13. Flops count for different algorithms over different K sizes for the
QPSK constellation and a K × K channel size 8PSK constellation and a K × K channel size

are asymptotically responsible for less than one percent of the


total Flops count.

channel Number of Flops


size
Flops Count

K 2 + (3K + 2Aux)Hid

DetNet LDetN et
6K 3 LSDR

SDR 5
AMP (2K × N + 2P ost × K) LAM P 10
SD 2K × N odes
M-Best SD (23 + log(|Con| × M )(K × M × |Con|)
TABLE IV AMP
F LOPS COUNT COMPARISON BETWEEN D ET N ET, S EMIDEFINITE SD Low SNR
R ELAXATION , AMP, S PHERE D ECODING AND M- BEST S PHERE
D ECODING AS A FUNCTION OF K AND THE PARAMETERS OF THE
SD High SNR
ALGORITHMS DetNet
6 8 10 12 14 16 18
K Size
In Fig. 12, 13 and 14 we present the number of flops for the
Fig. 14. Flops count for different algorithms over different K sizes for the
QPSK, 8PSK and the 16QAM constellations respectively for 16QAM constellation and a K × K channel size
different sizes of K. The complexity of the sphere decoding
algorithm is very dependant on the SNR value, so we presented
one graph for low SNR values and a second one for a higher layer used as the output layer in the case of a 60x30 channel
SNR value. Both sphere decoding graphs are asymptotically with BPSK constellation.
worse than the competing detection algorithms. The Flops
count of DetNet is always better than SDR and worse than VI. C ONCLUSION
the AMP algorithm. In Fig. 15 we present the Flops count for In this paper we investigated into the ability of deep neural
the 16QAM constellation in the soft decision case. While in networks to serve as MIMO detectors. We introduced two deep
the runtime comparison the M-Best Sphere decoding algorithm learning architectures that provide promising accuracy with
was comparable or slower than DetNet, when counting Flops low and flexible computational complexity. We demonstrated
the M-Best algorithm is much faster than DetNet. The reason their application to various digital constellations, and their
for the difference is that the M-Best algorithm is more exten- ability to provide accurate soft posterior outputs. An important
sive in memory accesses, which slows down the algorithm, feature of one of our network is its ability to detect over
yet does not affect the Flops count. multiple channel realizations with a single training.
3) Accuracy-Complexity Trade-Off: An interesting feature Using neural networks as a general scheme in MIMO
of DetNet is that the complexity-accuracy trade-off can be detection still a long way to go and there are many open ques-
decided during run-time. Each of the network’s layers outputs tions. These include their hardware complexity, robustness,
an estimated signal, and our loss optimizes all of them. We and integration into full communication systems. Furthermore
usually use the output of the last layer as the result since it the architectures we proposed are not flexible to changes in the
is the most accurate, but it is possible to take the estimated constellation used or the number of users (that is, any change
output xi of previous layers to allow faster detection. In Fig. in the number of users or constellation used will require a new
16 we present the accuracy as a function of the number of the network). Nonetheless, we believe this approach is promising

1053-587X (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2019.2899805, IEEE
Transactions on Signal Processing
10

7 [4] S. Suh and J. R. Barry, “Reduced-complexity MIMO detection via


10 a slicing breadth-first tree search,” IEEE Transactions on Wireless
Communications, vol. 16, no. 3, pp. 1782–1790, 2017.
[5] C. Jeon, R. Ghods, A. Maleki, and C. Studer, “Optimality of large MIMO
detection via approximate message passing,” in IEEE International
6 Symposium on Information Theory (ISIT). IEEE, 2015, pp. 1227–1231.
10
Flops Count

[6] Z. Q. Luo, W. K. Ma, A. M. So, Y. Ye, and S. Zhang, “Semidefinite


relaxation of quadratic optimization problems,” IEEE Signal Processing
Magazine, vol. 27, no. 3, pp. 20–34, 2010.
5 [7] J. Jald’en and B. Ottersten, “The diversity order of the semidefinite
10 relaxation detector,” IEEE Transactions on Information Theory, vol. 54,
no. 4, pp. 1406–1422, 2008.
[8] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
M−Best SD M=5 no. 7553, pp. 436–444, 2015.
4 M−Best SD M=7 [9] G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior,
10 V. Vanhoucke, P. Nguyen, T. Sainath et al., “Deep neural networks
DetNet for acoustic modeling in speech recognition: The shared views of four
10 20 30 research groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp.
K Size 82–97, 2012.
[10] A. Graves and N. Jaitly, “Towards end-to-end speech recognition with
Fig. 15. Flops count for different algorithms over different K sizes for the recurrent neural networks,” in International Conference on Machine
16QAM constellation and a K × K channel size in the soft decision scenario Learning, 2014, pp. 1764–1772.
[11] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, 2016, pp. 770–778.
0.8 [12] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,”
in Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, 2015, pp. 1–9.
0.4 [13] D. Silver, A. Huang, C. Maddison, A. Guez, L. Sifre, G. Van Den Driess-
che, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot
BER

et al., “Mastering the game of Go with deep neural networks and tree
0.2 search,” nature, vol. 529, no. 7587, pp. 484–489, 2016.
[14] J. R. Hershey, J. L. Roux, and F. Weninger, “Deep unfolding:
Model-based inspiration of novel deep architectures,” arXiv preprint
arXiv:1409.2574, 2014.
0.1 [15] K. Gregor and Y. LeCun, “Learning fast approximations of sparse cod-
ing,” in Proceedings of the 27th International Conference on Machine
Learning (ICML-10), 2010, pp. 399–406.
[16] M. Borgerding and P. Schniter, “Onsager-corrected deep learning for
sparse linear inverse problems,” in IEEE Global Conference on Signal
10 20 30 40 and Information Processing (GlobalSIP). IEEE, 2016, pp. 227–231.
Output Layer Number [17] E. Nachmani, Y. Be’ery, and D. Burshtein, “Learning to decode linear
codes using deep learning,” in Communication, Control, and Computing
Fig. 16. Comparison of the average BER as a function of the layer chosen to (Allerton), 2016 54th Annual Allerton Conference on. IEEE, 2016, pp.
be the output layer in the case of a 60x30 channel and BPSK constellation. 341–346.
[18] E. Nachmani, E. Marciano, D. Burshtein, and Y. Be’ery, “RNN decoding
of linear block codes,” arXiv preprint arXiv:1702.07560, 2017.
[19] E. Nachmani, E. Marciano, L. Lugosch, W. J. Gross, D. Burshtein, and
and has the potential to impact future communication systems. Y. Be’ery, “Deep learning methods for improved decoding of linear
Neural networks can be trained on realistic channel models codes,” IEEE Journal of Selected Topics in Signal Processing, 2018.
and tune their performance for specific environments. Their [20] T. J. O’Shea and J. Hoydis, “An introduction to machine learning
communications systems,” arXiv preprint arXiv:1702.00832, 2017.
architectures and batch operation are more natural to hardware [21] T. Gruber, S. Cammerer, J. Hoydis, and S. ten Brink, “On deep learning-
implementation than algorithms as SDR and SD. Finally, their based channel decoding,” in 51st Annual Conference on Information
multi-layer structure allows a flexible accuracy vs complexity Sciences and Systems (CISS). IEEE, 2017, pp. 1–6.
[22] N. Farsad and A. Goldsmith, “Detection algorithms for communication
nature as required by many modern applications. systems using deep learning,” arXiv preprint arXiv:1705.08044, 2017.
[23] ——, “Neural network detection of data sequences in communication
ACKNOWLEDGMENTS systems,” arXiv preprint arXiv:1802.02046, 2018.
[24] H. Ye, G. Li, and B. Juang, “Power of deep learning for channel
We would like to thank Shai Shalev-Shwartz for many estimation and signal detection in OFDM systems,” IEEE Wireless
discussions throughout this research. In addition, we thank Communications Letters, 2017.
[25] T. O’Shea, K. Karra, and T. Clancy, “Learning approximate neu-
Amir Globerson and Yoav Wald for their ideas and help with ral estimators for wireless channel state information,” arXiv preprint
the soft output networks. arXiv:1707.06260, 2017.
[26] S. Dörner, S. Cammerer, J. Hoydis, and S. ten Brink, “Deep learning
based communication over the air,” IEEE Journal of Selected Topics in
R EFERENCES Signal Processing, vol. 12, no. 1, pp. 132–143, 2018.
[1] S. Verdu, Multiuser detection. Cambridge university press, 1998. [27] T. O’Shea, T. Erpek, and T. Clancy, “Deep learning based MIMO
[2] E. Agrell, T. Eriksson, A. Vardy, and K. Zeger, “Closest point search in communications,” arXiv preprint arXiv:1707.07980, 2017.
lattices,” IEEE transactions on information theory, vol. 48, no. 8, pp. [28] N. Samuel, T. Diskin, and A. Wiesel, “Deep MIMO detection,” arXiv
2201–2214, 2002. preprint arXiv:1706.01151, 2017.
[3] Z. Guo and P. Nilsson, “Algorithm and implementation of the k-best [29] T. Wang, C. Wen, H. Wang, F. Gao, T. Jiang, and S. Jin, “Deep
sphere decoding for MIMO detection,” IEEE Journal on selected areas learning for wireless physical layer: Opportunities and challenges,”
in communications, vol. 24, no. 3, pp. 491–503, 2006. China Communications, vol. 14, no. 11, pp. 92–111, 2017.

1053-587X (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2019.2899805, IEEE
Transactions on Signal Processing
11

[30] T. Diskin, G. Draskovic, F. Pascal, and A. Wiesel, “Deep robust


regression,” in IEEE 7th International Workshop on Computational
Advances in Multi-Sensor Adaptive Processing (CAMSAP). IEEE, 2017,
pp. 1–5.
[31] A. Mousavi and R. G. Baraniuk, “Learning to invert: Signal recovery
via deep convolutional networks,” in IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017, pp.
2272–2276.
[32] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep
feedforward neural networks.” in Aistats, vol. 9, 2010, pp. 249–256.
[33] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning represen-
tations by back-propagating errors,” Cognitive modeling, vol. 5, no. 3,
p. 1, 1988.
[34] L. Bottou, “Large-scale machine learning with stochastic gradient de-
scent,” in Proceedings of COMPSTAT’2010. Springer, 2010, pp. 177–
186.
[35] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
arXiv preprint arXiv:1412.6980, 2014.
[36] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S.
Corrado, A. Davis, J. Dean, M. Devin et al., “Tensorflow: Large-scale
machine learning on heterogeneous distributed systems,” arXiv preprint
arXiv:1603.04467, 2016.
[37] W. Ma, P. Ching, and Z. Ding, “Semidefinite relaxation based multiuser
detection for m-ary PSK multiuser systems,” IEEE Transactions on
Signal Processing, vol. 52, no. 10, pp. 2862–2872, 2004.
[38] A. Ghasemmehdi and E. Agrell, “Faster recursions in sphere decoding,”
IEEE Transactions on Information Theory, vol. 57, no. 6, pp. 3530–
3536, 2011.
[39] D. Shiu, G. J. Foschini, M. J. Gans, and J. M. Kahn, “Fading correlation
and its effect on the capacity of multielement antenna systems,” IEEE
Transactions on communications, vol. 48, no. 3, pp. 502–513, 2000.
[40] E. Björnson, J. Hoydis, M. Kountouris, and M. Debbah, “Massive mimo
systems with non-ideal hardware: Energy efficiency, estimation, and
capacity limits,” IEEE Transactions on Information Theory, vol. 60,
no. 11, pp. 7112–7139, 2014.
[41] D. Bailey and J. Barton, “The nas kernel benchmark program,” 1985.
[42] S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning:
From Theory to Algorithms. New York, NY, USA: Cambridge
University Press, 2014.
[43] S. Rangan, P. Schniter, and A. Fletcher, “On the convergence of approx-
imate message passing with arbitrary matrices,” in IEEE International
Symposium on Information Theory (ISIT). IEEE, 2014, pp. 236–240.

1053-587X (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like