Schmitz 2019
Schmitz 2019
Abstract—In this paper, we present a deep learning based physical layer with raw data bits as the autoencoder’s input
wireless transceiver. We describe in detail the corresponding and output.
artificial neural network architecture, the training process, and A channel model is included in the middle layers, and the
report on excessive over-the-air measurement results. We employ
the end-to-end training approach with an autoencoder model modulation and demodulation are implemented by the first
that includes a channel model in the middle layers as previously and last layers of the autoencoder. After training, the first
proposed in the literature. In contrast to other state-of-the-art layers are separately used as the transmitter. The final layers
results, our architecture supports learning time synchronization constitute the receiver of the system. The middle layers, which
without any manually designed signal processing operations. model the effect of channels, are omitted after training. In
Moreover, the neural transceiver has been tested over the air with
an implementation in software defined radio. Our experimental order to adapt the communication system to arbitrary real-
results for the implemented single antenna system demonstrate world channels lacking tractable mathematical models, some
a raw bit-rate of 0.5 million bits per second. This exceeds results extensions of autoencoders have been proposed in [4]–[6].
from comparable systems presented in the literature and suggests In this paper we model the effect of channel impairments
the feasibility of high throughput deep learning transceivers. through customized middle layers of the autoencoder. For
Index Terms—Deep Learning, Transceiver, Wireless Commu-
nication, Synchronization achieving practical applicability it is assumed that the autoen-
coder incorporates a digital complex baseband model, where
signal samples generated by the transmitter DNN can directly
I. I NTRODUCTION
be fed to an radio frequency (RF)-frontend for up-conversion.
Deep learning techniques had a tremendous impact on Down-converted signal samples from the receiver RF-frontend
numerous research areas over the last decade. While ground- can be directly fed into the receiver DNN without any manual
breaking results in computer vision have been reported already pre-processing.
at the beginning of the decade, deep learning methods for For such a complete end-to-end deep learning signal pro-
communication systems have gained attraction only recently. cessing approach the problem of synchronization arises. The
Although many works already proposed machine learning authors of [7] and [3] discuss radio transformer networks
solutions to communication system problems, for example, for this purpose. Signal parameters are estimated with a
modulation detection in [1], the first end-to-end deep learning separate DNNs and the signal is then processed by manually
communication systems appeared in [2] and [3]. Previously programmed operations. The authors in [8] and [9] propose
the design of communication systems relied on information to learn merely the receiver side of the system and afterwards
theoretic models, classical optimization techniques and signal apply manual signal processing to support synchronization.
processing algorithms that were optimal for certain systems. We present instead a new full end-to-end approach where
These approaches, however, are limited to only mathematically synchronization is completely learned as part of a single
tractable channel models and can in most cases only be DNN structure. This is achieved by pilot symbols with a
applied to isolated components of the digital communication learned waveform and by feeding longer signals to the decoder.
chain. In contrast, the novel deep learning approach promises This paper is structured as follows. In Sec. II we describe
to overcome these limitations by applying a global end-to- the architecture of the autoencoder. The training process is
end optimization to the problem at hand. Learning based described explicitly in Sec. II-B. The details of the software
approaches do not need a tractable channel model and op- defined radio (SDR) implementation are elaborated in Sec. III.
timize the system globally. A deep neural network (DNN) The paper concludes with the presentation and discussion of
autoencoder was proposed in [2] as a first attempt to end- the test results in Sec. IV.
to-end learning of communication systems. The autoencoder
II. A D EEP N EURAL N ETWORK T RANSCEIVER
structure, a multilayer neural network, consists of an encoder-
decoder pair that models the transmission of bits on the In this section we describe the structure of the DNN au-
toencoder, dimensions and parameters are given in Tab. I. The
transmitter is represented by the encoder and the receiver by
B. Channel model and training all samples in the vector x generated by the encoder are rotated
by a uniformly distributed phase
The present autoencoder transceiver cannot be trained with-
out a channel model, since backpropagation spans over all u = e−jϕ · x. (1)
layers of the DNN. We use a channel model that incorporates
This operation is implemented as a layer of the channel model,
additive white Gaussian noise (AWGN), phase shift, time shift
where ϕ is provided as an additional input to the layer.
and attenuation. All are modeled to be frequency independent,
The assumptions above are harder than a real channel with
which is a simplification, sufficient for narrowband channels.
correlation over time.
The channel model hence extends over the standard memory-
Attenuation of signals is modeled by multiplying each
less channels to a more challenging setting with asynchronous
sample by a uniformly distributed random factor a ∈ [amin , 1]
communication.
as
In real systems phase offset occurs due to asynchronous
v = a · u. (2)
clocks or phase noise. Since the autoencoder cannot memorize
the phase offset from previous transmissions, we model it as At the next step, the random noise is added to the symbols.
an independent, uniformly distributed random variable in each Independent random noise variables are added to each complex
training step. This is reasonable as the phase offset changes sample
only little for the duration of one symbol transmission. Hence, y = v + r. (3)
A small value of signal-to-noise ratio (SNR) is chosen, as the
autoencoder does not generalize well for SNR less than the
training SNR. We observed, moreover, that the autoencoder
does not converge to a suitable state if the SNR is too
small. The autoencoder, however, converges during training
and generalizes well for moderately larger SNR values. Note
that the noise is added to each complex sample. Therefore the
SNR is, in this sense, defined by Esample /N0 . We can convert Fig. 1: Visualizing the time shift for the channel and receiver
the SNR per sample to the SNR per bit, denoted by Eb /N0 of autoencoder AE-8/8 for a shift m = 0 samples
using
k
Eb /N0 = Esample /N0 .
n
The SNR per bit is used to evaluate the performance indepen-
dent of the number of complex samples n.
Finally, synchronization effects are introduced. Since the
decoder has no memory, synchronization is a difficult problem
for DNN based transceivers. In practice, synchronization errors
occur gradually due to slightly asynchronous clocks at the
transmitter and the receiver. To overcome this problem, in this
work, the receiver takes account of a larger window of symbols
that includes more than one baseband symbol. The purpose is Fig. 2: Autoencoder Layout during Training
that the receiver always has available the full set of samples
for one of the data symbols within each window.
To introduce the time offset in the channel model, multiple over {−n + 1, . . . , n}. Therefore, the input vector of complex
baseband symbols are first considered. We use pilot symbols to baseband samples to the receiver in step i is given by
facilitate synchronization. The pilot baseband samples are gen- (i)
erated by the same encoder that generates the data baseband y(i) [mi ] = (ym
(i)
i
, . . . , ymi +W −1 ) ∈ CW . (4)
symbols. Hence, the encoder expects as input an alternating
stream of pilot symbols p and source symbols si . Fig. 1 depicts This approach provides a sufficient number of samples to
an exemplary sequence of data and pilot channel symbols at the receiver to account for a set of possible time-shifts of
the training step i given by: the receiver window. Note that, in contrast to over the air
deployment, we do not receive a continuous sequence of
(xi−1 i i+1 5n
data , xsync , xdata , xsync , xdata ) ∈ C . complex samples during training. Each time only 5n samples
The pilot symbol p is chosen to be the integer 0 ∈ M. The are generated over which the receiver chooses a window
same number of bits is used for data and pilot symbols. of W samples. It should be noted that by introducing the
At each training step, the channel symbols, for data and synchronization symbol, the communication rate is halved.
pilots, are generated by the encoder successively. We use the
TimeDistributed model of Keras to create 5 identical parallel C. Decoder
encoders, as depicted in Fig. 2. The weights of the parallel
encoders are shared and updated jointly during training. The The final step in the transceiver design is the decoder. It
transmitter output consists of 5n consecutive complex samples receives 2W real inputs derived from W complex symbols.
during training. The channel impairments are added after- At training step i, the decoder shall correctly identify the
wards. First, a random phase shift ϕi and a random attenuation source symbol si from the input of W complex samples. It is
ai are chosen. They act on each transmitted baseband sample trained to cope with synchronization errors and other channel
x ∈ C. White Gaussian noise ri ∈ RW , which satisfies SNR impairments and, furthermore, includes a particular entity
Esample /N0 , is added at the end. The channel output at the to support the synchronization task called Synchronization
training step i is therefore given by Feature Estimator (SFE). SFE extracts a set of features using
convolutional layers (see Table I). The decoder then uses these
y(i) = ai e−jφi (xi−1 i i+1 5n
data , xsync , xdata , xsync , xdata ) + ri ∈ C . features with 2W received inputs to perform the final decoding
The receiver uses a window of length W = 3n − 1 samples, in the next layers. SFE is similar to correlation filters in
as to include at least one full data symbol regardless of the conventional transceivers which are used for synchronization.
actual position of the window. Without synchronization errors, The final output of the decoder P is a probability vector
this window is placed at the beginning of the first pilot and ŝ ∈ [0, 1]M with components evj / ` ev` , where vj , j =
spans over the data symbol as in Fig. 2. Synchronization errors 1, . . . , M , denotes the real output of the final layer. The
introduce a shift of the window. At training step i, the window symbol with the highest probability is then chosen as the most
is shifted by an offset mi drawn from the uniform distribution likely input symbol.
III. S OFTWARE DEFINED RADIO IMPLEMENTATION 100
Ps
exported and loaded into the GNU Radio block, which is
10−2
implemented in C++. This block is able to run trained TF
[11] models, i.e. perform the inference, in C++ [12]. GNU
Radio provides easy ways to interface with the RF hardware
frontends. The autoencoder transceiver operates only in base- 10−3
band. Hence, for radio transmission over the air, up- and down-
conversion to and from the carrier frequency is performed by
0 2 4 6
the RF frontends. We used the 2.4GHz frequency band for
transmission. The C++ implementation of the system leads
Fig. 3: SER of AE-8/8 tested over the air at amplitude 0.6156,
to higher throughput compared to earlier python implementa-
windowed over 200ms for each data point, no overlap, for
tions, e.g., in [3].
1.25 · 106 symbols in total
In this work, the signal-to-noise ratio of the AWGN is set
to Esample /N0 = 5 dB. As mentioned above, the autoencoder
cannot be trained properly for too small SNR, e.g., 0 dB.
This is attributed to the slightly different clock speeds of the
By blockwise processing with one pilot per data block and
two USRPs. The error rate changes, as drifting time offsets
a bandwidth of 1 MHz, we achieve a throughput of 0.5 Mbit/s
make it more difficult for the autoencoder to decode certain
on a machine with an Intel Core i7 940 CPU and NVIDIA
received sample windows. The minimum at 5.5 s corresponds
GeForce GTX 1080 Ti GPU. Two USRP N210 RF-frontends
to an error rate of 4 · 10−3 , which means a single error during
from Ettus were used for the experiments.
a period of 200 ms.
In the current setup, the ratio of pilot to data symbols is
Next, after training AE-7/16 was tested over the air in two
1. To improve the throughput, more data symbols per pilot
independent experiments for different relative amplitudes at
symbol could be transmitted. The optimum ratio of pilot to
the transmitter, each with Ntest = 3.4 · 106 test symbols. The
data symbols will be determined in future experiments.
resulting error rates are shown in Fig. 4. For a real radio
IV. E XPERIMENTAL R ESULTS transmission, the receiver SNR cannot be measured precisely.
The proposed system is evaluated by estimating the symbol Hence, the SER is plotted over the the transmit amplitude as
error rate of transmissions over simulated and real channels. x-axis.
We compare the results with binary phase shift keying (BPSK) Both experiments show a decrease of errors for higher
over perfect Gaussian channels. The BPSK transmission dif- amplitudes, as expected. A higher amplitude causes more
fers from our DNN transceiver at least in two different aspects. energy for the signal and reduces the effect of noise at
First, no synchronization error is assumed for BPSK, and the receiver. Very low amplitudes correspond to very low
secondly, each bit is mapped to one complex channel sample. values of attenuation a in the channel model, so that the
In contrast, our system considers synchronization errors and neural receiver decodes increasingly incorrectly for amplitudes
maps k bits to n complex samples. below 0.005. This effect was also observed over-the-air, thus
In our benchmark test series, different autoencoders are reducing performance at low levels of relative amplitude. Both
trained for varying parameters k and n, denoted by AE-k/n. experiments show a bottom floor of symbol error rate (SER)
k/n is the ratio between the source symbol bits and the at around 1%.
number of complex baseband samples. For example, an AE- Notably, the error rates do not decrease monotonically with
7/16 transmits 7 bits using 16 samples. After each data symbol increasing amplitude. However, the main trend is indicated
of length n a pilot symbol of the same length is inserted. To by dotted lines. The upper green trend curve converges to an
evaluate the efficiency of SFE the autoencoder named AE-8/8- error rate of about 2%, the yellow dotted line refers to error
2 is trained and tested without this unit. rates smaller by approximately a factor of 3. At higher relative
amplitudes the error rates increase caused by imperfections
A. Over-the-air-transmission of the transmitter. The reason for the two observed distinctly
To investigate transmission over a real channel, an AE-8/8 different error rate curves remains unclear at that point and
was tested over the air at a relative amplitude of 0.61. The will be investigated in the future.
error rate is calculated every 200 ms, which corresponds to The observed error rates are significantly worse than the
200000/8 = 25000 symbols per evaluated time window. The ones found when testing the trained autoencoder over the
resulting SER is plotted over time in Fig. 3. Most notably, corresponding simulated channel, results are depicted in Fig. 5.
the error rate fluctuates periodically in intervals of about 2.5 s. This can be attributed to the channel model that only approx-
100
AE-7/16, first set AE-7/8
AE-7/16, second set AE-8/8
10−1 AE-8/8-2
AE-7/16
BPSK in AWGN
10−1
10−2
Ps
Ps
10−3
10−4
10−2
10−5
10−1 0 2 4 6 8 10 12 14
Relative amplitude Eb/N0 in dB
(a) Normalized per bit
Fig. 4: SER of AE-7/16 tested over the air in two experiments
100
for 3.4·106 symbols each. The dotted lines emphasize patterns
observed over the two sets.
10−1
Ps
models or finding ways to train the autoencoder by training
10−3
even over real channels as has been shown in [13]. AE-7/8
AE-8/8
B. Comparison of the autoencoders 10−4 AE-8/8-2
In this section, the performance of the autoencoders is AE-7/16
compared for the simulated channel. The channel attenuation is BPSK in AWGN
10−5
chosen as amin = 0.01. The phase shift is generated at random 0 2 4 6 8 10
uniformly over [0, 2π) and the time offset chosen uniformly Esample/N0 in dB
distributed over {−7, . . . , 8}. 106 data symbols are sent for (b) Normalized per sample
each of the four autoencoders and for a range of different Fig. 5: SER of the four presented autoencoders, tested on 106
SNRs. We use two notions of SNR, namely SNR per sample symbols each over modeled channels, and theoretical BPSK
Esample /N0 and SNR per bit Eb /N0 , error rate over AWGN as a baseline
The SERs is plotted versus the SNRs per bit, Eb /N0 in
Figure 5a. As previously mentioned, the theoretical error rate
of uncoded BPSK is plotted for comparison. The performance
of all autoencoders increases with SNR. The AE-7/8 shows the representation of the encoded bits within the samples. The AE-
slowest improvement over Eb /N0 and reaches only an SER 8/8 needs about 1.7 dB more SNR to achieve the same error
of 10−4 at 14 dB. While the AE-7/16 shows a similarly slow rates as AE-7/16. This number is more than 2 dB for AE-8/8-
improvement for low SNRs, it improves more quickly and 2. AE-7/8 is significantly worse than the other autoencoders.
reaches the SER below 10−5 . Note that the AE-7/16, however, If SNR per sample is used, the AE-7/16 is able to outperform
uses more complex samples than the AE-7/8 to transmit the even uncoded BPSK for SNR values between approximately
same amount of data. 5 and 8 dB. This can be attributed to the significantly greater
At a low SNRs, the best performance is achieved by the AE- number of complex samples used for transmission.
8/8 and AE-8/8-2. The improvement of SER with SNR shows Uncoded BPSK mostly achieves better error rates than the
an error floor in the high SNR regime. This is in contrast to the proposed autoencoders. But as discussed above, this observa-
AE-7/16 which, at least within the tested range of SNRs, keeps tion should be interpreted with care. The theoretical BPSK
improving and reaches a lower error rate than both the AE- error rate is determined for an AWGN channel, disregarding
8/8 and the AE-8/8-2. In that regard, the AE-7/16 outperforms other effects of the channel model. Furthermore, the autoen-
other autoencoders in the high SNR regime, however, at the coders are evaluated by their symbol error rate, which is
price of using more samples to transmit data. at most equal to the bit error rate, in most cases, however,
We also plot the result of the same experiments for the SNR significantly lower. Unlike the theoretical results for BPSK,
per sample Esample /N0 . The corresponding results for the SERs the autoencoders of the present paper are designed to combat
are shown in Figure 5b. As expected, AE-7/16 performs better more channel impairments, as described above.
than other autoencoders, since it has more samples to transmit For over-the-air experiments, the SER of the four autoen-
the same amount of data and can thus achieve a more robust coders is plotted in 6. We use 3.3 · 106 symbols for each
V. C ONCLUSION AND O UTLOOK
100 AE-7/8
AE-8/8
In this paper, we presented a fully trainable deep learning
AE-8/8-2 transceiver, which addresses in particular synchronization is-
AE-7/16 sues. Multiple autoencoders with different architectures are
trained. Each has three different components, namely the
10−1 encoder, the channel and the decoder. The channel is in-
Ps