0% found this document useful (0 votes)
122 views13 pages

Deep Learning Enabled Semantic Communication Systems

The document proposes a deep learning-based semantic communication system called DeepSC for text transmission. DeepSC aims to maximize system capacity and minimize semantic errors by recovering the meaning of sentences, rather than bit- or symbol-errors as in traditional systems. Compared to traditional systems, DeepSC is more robust to channel variations, especially at low SNR, as shown through simulation results. The document provides background on semantic communication and motivates the need for intelligent communication systems considering semantic meaning.

Uploaded by

zhujinfu1208
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
122 views13 pages

Deep Learning Enabled Semantic Communication Systems

The document proposes a deep learning-based semantic communication system called DeepSC for text transmission. DeepSC aims to maximize system capacity and minimize semantic errors by recovering the meaning of sentences, rather than bit- or symbol-errors as in traditional systems. Compared to traditional systems, DeepSC is more robust to channel variations, especially at low SNR, as shown through simulation results. The document provides background on semantic communication and motivates the need for intelligent communication systems considering semantic meaning.

Uploaded by

zhujinfu1208
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL.

69, 2021 2663

Deep Learning Enabled Semantic


Communication Systems
Huiqiang Xie , Graduate Student Member, IEEE, Zhijin Qin , Member, IEEE, Geoffrey Ye Li , Fellow, IEEE,
and Biing-Hwang Juang , Life Fellow, IEEE

Abstract—Recently, deep learned enabled end-to-end commu- at the receiver, named as semantic communication. The third
nication systems have been developed to merge all physical layer level deals with the effects of communication that turn into the
blocks in the traditional communication systems, which make joint ability of the receiver to perform certain tasks in the way desired
transceiver optimization possible. Powered by deep learning, nat-
ural language processing has achieved great success in analyzing by the transmitter.
and understanding a large amount of language texts. Inspired by In the past decades, communications primarily focus on how
research results in both areas, we aim to provide a new view on to accurately and effectively transmit symbols (measured by
communication systems from the semantic level. Particularly, we bits) from the transmitter to the receiver. In such systems,
propose a deep learning based semantic communication system, bit-error rate (BER) or symbol-error rate (SER) is usually taken
named DeepSC, for text transmission. Based on the Transformer,
the DeepSC aims at maximizing the system capacity and minimiz- as the performance metrics [2]. With the development from the
ing the semantic errors by recovering the meaning of sentences, first generation (1G) to the fifth generation (5G), the achieved
rather than bit- or symbol-errors in traditional communications. transmission rate has been improved tens of thousands of times
Moreover, transfer learning is used to ensure the DeepSC appli- and the system capacity is gradually approaching to the Shan-
cable to different communication environments and to accelerate non limit. Recently, various new applications appear, such as
the model training process. To justify the performance of semantic
communications accurately, we also initialize a new metric, named autonomous transportation, consumer robotics, environmental
sentence similarity. Compared with the traditional communication monitoring, and tele-health [3], [4]. The interconnection of these
system without considering semantic information exchange, the applications will generate a staggering amount of data in the
proposed DeepSC is more robust to channel variation and is able order of zetta-bytes. Besides, these applications need to support
to achieve better performance, especially in the low signal-to-noise massive connectivity over limited spectrum resources but require
(SNR) regime, as demonstrated by the extensive simulation results.
lower latency, which poses critical challenges to traditional
Index Terms—Deep learning, end-to-end communication, source-channel coding. Semantic communications can process
semantic communication, transfer learning, Transformer. data in the semantic domain by extracting the meanings of
data and filtering out the useless, irrelevant, and unessential
I. INTRODUCTION information, which further compresses data while reserving the
meanings. Moreover, semantic communication is expected to be
ASED on Shannon and Weaver [2], communication could
B be categorized into three levels: i) transmission of symbols;
ii) semantic exchange of transmitted symbols; iii) effects of se-
robust to terrible channel environments, i.e., low signal-to-noise
ratio (SNR) region, which fits well the applications requiring
high reliability. These factors motivate us to develop intelligent
mantic information exchange. The first level of communication communication systems by considering the semantic meaning
mainly concerns about the successful transmission of symbols behind digital bits to enhance the accuracy and efficiency of
from the transmitter to the receiver, where the transmission communications.
accuracy is mainly measured at the level of bits or symbols. Different from the conventional communications, semantic
The second level of communication deals with the semantic in- communications aim to transmit the information relevant to
formation sent from the transmitter and the meaning interpreted the transmission goal. For example, for text transmission, the
meaning is thereby essential information and the expression,
Manuscript received October 6, 2020; revised January 23, 2021 and March i.e., is expression of word, are unnecessary. By doing so, the
17, 2021; accepted March 31, 2021. Date of publication April 7, 2021; date of
current version May 18, 2021. The associate editor coordinating the review of data traffic would be reduced significantly. Such a system could
this manuscript and approving it for publication was Dr. A. Liu. Part of this be particularly useful when the bandwidth is limited, the SNR is
work was presented at the IEEE Global Communications Conference 2020 [1]. low, or the BER/SER is high in typical communication systems.
(Corresponding author: Zhijin Qin.)
Huiqiang Xie and Zhijin Qin are with the School of Electronic Engineering Historically, the concept of semantic communication was
and Computer Science, Queen Mary University of London, London E1 4NS, developed several decades ago. Inspired by Weaver [2], Carnap
U.K. (e-mail: [email protected]; [email protected]). et al. [5] were the first to introduce the semantic information
Geoffrey Ye Li is with the School of Electrical and Electronic Engineering, Im-
perial College London, London SW7 2AZ, U.K. (e-mail: [email protected]). theory (SIT) based on logical probabilities ranging over the
Biing-Hwang Juang is with the School of Electrical and Computer Engi- contents. Afterwards, a generic model of semantic communi-
neering, Georgia Institute of Technology, Atlanta, GA 30332 USA (e-mail: cation (GMSC) was proposed as an extension of the SIT, where
[email protected]).
Digital Object Identifier 10.1109/TSP.2021.3071210 the concepts of semantic noise and semantic channel were first

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
2664 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 69, 2021

defined [6]. As pointed out in [7], the analysis and design of a r Based on extensive simulation results, the proposed
communication system for optimal transmission of intelligence DeepSC outperforms the traditional communication sys-
are faced with several challenges. For instance, how to define tem and improves the system robustness at the low SNR
error in the intelligence transmission? In [8], a lossless semantic regime.
data compression theory by applying the GMSC was developed, The rest of this paper is organized as follows. Related work
which means that data can be compressed at semantic level so is briefly reviewed in Section II. The framework of a semantic
that the size of the data to be transmitted can be reduced signifi- communication system is presented and a corresponding prob-
cantly. Recently, an end-to-end (E2E) semantic communication lem is formulated in Section III. Section IV details the proposed
framework integrates the semantic inference and physical layer DeepSC and extends it to dynamic environments. Numerical
communication problems, where the transceiver is optimized results are presented in Section VI to show the performance of
to reach the Nash equilibrium while minimizing the average the DeepSC. Finally, Section VII concludes this paper.
semantic errors [9]. However, the semantic error in [9] measures N otation: Cn×m and Rn×m represent sets of complex and
the meaning of each word rather than the whole sentence. These real matrices of size n × m, respectively. Bold-font variables
aforementioned works provide some insights and remarks for denote matrices or vectors. x ∼ CN (μ, σ 2 ) means variable x
the design of semantic communications, but many issues remain follows a circularly-symmetric complex Gaussian distribution
unexplored. with mean μ and covariance σ 2 . (·)T and (·)H denote the
Recent advancements on deep learning (DL) based natural transpose and Hermitian, respectively. {·} and {·} refer to
language processing (NLP) and communication systems inspire the real and imaginary parts of a complex number. Finally, a ⊗ b
us to investigate semantic communication to realize the sec- indicates the inner product of vectors a and b.
ond level communications as aforementioned [10]–[15]. The
considered semantic communication system mainly focuses on
the joint semantic-channel coding and decoding, which aims II. RELATED WORK
to extract and encode the semantic information of sentences This section provides a brief review of the related work on the
rather than simply a sequence of bits or a word. For the semantic E2E physical layer communication systems and the deep neural
communication system, we face the following questions: network (DNN) techniques adopted in NLP.

Question 1: How to define the meaning behind the bits?


Question 2: How to measure the semantic error of sentences? A. End-to-End Physical Layer Communication Systems
Question 3: How to jointly design the semantic and channel DL techniques have shown great potential in processing vari-
coding? ous intelligent tasks, i.e., computer vision and NLP. Meanwhile,
it is possible to train neural networks and run them on mobile
In this paper, we investigate the semantic communication devices due to the increasing hardware computing capability.
system by applying machine translation techniques in NLP to In the communication area, some pioneering works have been
physical layer communications. Specifically, we propose a DL carried on DL based E2E physical layer communication sys-
enabled semantic communication system (DeepSC) to address tems, which merge the blocks in traditional communication
the aforementioned challenges. The main contributions of this systems [17]–[23]. By adopting the structure of autoencoder in
paper are summarized as follows: DL and removing block structure, the transmitter and receiver
r Based on the Transformer [16], a novel framework for in the E2E system are optimized jointly as an E2E reconstruc-
the DeepSC is proposed, which can effectively extract tion task. It has been demonstrated that such an E2E system
the semantic information from texts with robustness to outperforms uncoded binary phase shift keying (BPSK) and
noise. In the proposed DeepSC, a joint semantic-channel Hamming coded BPSK in terms of BER [17]. Besides, there
coding is designed to cope with channel noise and semantic are several initial works on dealing with the missing channel
distortion, which addresses aforementioned Question 3. gradient during training. A DNN based two-phase of training
r The transceiver of the DeepSC is composed of semantic processing has been proposed, where the transceiver is trained
encoder, channel encoder, channel decoder, and semantic by an stochastic channel model and the receiver is fine-tuned
decoder. To understand the semantic meaning as well as under real channels [18]. Reinforcement learning has been ex-
maximize the system capacity at the same time, the receiver ploited in [19] to acquire the channel gradient under an unknown
is optimized with two loss functions: cross-entropy and channel model, which achieves better performance than the
mutual information. Moreover, a new metric is proposed differential quadrature phase-shift keying (DQPSK) over real
to accurately reflect the performance of the DeepSC at the channels. A conditional generative adversarial net (GAN) has
semantic level. These address the aforementioned Ques- been applied in [20] to use a DNN to represent the channel
tions 1 and 2. distortion so that the gradients can pass through a unknown
r To make the DeepSC applicable to various communica- channel to the transmitter DNN during the training of the E2E
tion scenarios, deep transfer learning is adopted to ac- communication system. Meta-learning combined with a limited
celerate the model re-training. With the re-trained model, number of pilots has been developed for training the transceiver
the DeepSC can recognise various knowledge input and and enables the fast training of network with less amount of
recover semantic information from distortion. data [21].
XIE et al.: DEEP LEARNING ENABLED SEMANTIC COMMUNICATION SYSTEMS 2665

Considering the types of sources, the joint source-channel


coding for texts [22] and images [23] aims to recover the source
information at the receiver directly rather than the digital bits.
Meanwhile, traditional metrics, such as BER, cannot reflect the
performance for such systems well. Therefore, word-error rate
and peak signal-to-noise ratio (PSNR) are adopted for measuring
the accuracy of source information recovery.
Fig. 1. The framework of proposed DL enabled semantic communication
system, DeepSC.
B. Semantic Representation in Natural Language Processing
NLP makes machines understand human languages, with the
attention mechanism, language models based on FCNs, such
main goal to understand the syntax and text. Initially, natural
as Transformer [16], pay more attention to the useful seman-
language can be described by the joint probability model accord-
tic information for performance improvement on various NLP
ing to the context [24]. Thus, language models provide context
tasks. It is worth noting that the Transformer has the advantages
to distinguish words and phrases that have similar semantic
of both RNNs and CNNs [16]. Particularly, the self-attention
meaning. Although such NLP technologies based on statistical
mechanism is adopted, which enables the models to understand
model are developed to describe the probability of a certain
sentences regardless of their lengths.
word coming after another in a sentence, it is hard to deal with
long sentences, i.e. the ones over 15 words, and the syntax.
To understand long sentences, the word2vec model in [25] III. SYSTEM MODEL AND PROBLEM FORMULATION
captures the relationship among words, which makes similar The considered system model consists of two levels: se-
words ending up with a closer distance in the vector space. Even mantic level and transmission level, as shown in Fig. 1. The
if these dense word vectors can capture the relationship among semantic level addresses semantic information processing for
words, they fail to describe syntax information. In order to solve encoding and decoding to extract the semantic information.
such problems, the underlying meaning of texts is represented The transmission level guarantees that semantic information
by using various DL techniques, which is able to extract the can be exchanged correctly over the transmission medium.
semantic information in long sentences and their syntax. A deep Overall, we consider an intelligent E2E communication system
contextualized word representation has been proposed in [26], with the stochastic physical channel, where the transmitter and
which models both complex characteristics of word usages, e.g., the receiver have certain background knowledge, i.e., different
syntax and semantics, and how these usages vary across linguis- training data. The background knowledge could be various for
tic contexts (i.e., to model polysemy). However, the above word different application scenarios.
representation approaches are designed for specific tasks and Definition 1: Semantic noise is a type of disturbance in the
may need to be redesigned whenever the task changes. In [27], a exchange of a message that interferes with the interpretation of
general word representation model, named bidirectional encoder the message due to ambiguity in words, a sentence or symbols
representations from transformers (BERT), has been developed used in the message transmission.
to provide word vectors for various NLP tasks without requiring Definition 2: Physical channel noise is caused by the physical
redesign of word representations. channel impairment, such as, additive white Gaussian noise
(AWGN), fading channel, and multiple path, which incurs the
C. Comparison of State-of-Art NLP Techniques signal attenuation and distortion.
There are three types of neural networks used for NLP
tasks, including recurrent neural networks (RNNs), convolu- A. Problem Description
tional neural networks (CNNs) and fully-connected neural net- As in Fig. 1, the transmitter maps a sentence, s, into a complex
works (FCNs) [28]. By introducing RNNs, language models symbol stream, x, and then passes it through the physical channel
can learn the whole sentences and capture the syntax informa- with transmission impairments, such as distortion and noise. The
tion effectively [29]. However, for long sentences, particularly, received, y, is decoded at the receiver to estimate the original
the distance between subject and predicate is more than 10 sentence, s. We jointly design the transmitter and receiver with
words, RNNs cannot find the correct subject and predicate. DNNs since DL enables us to train a model with inputting
For example, for sentence “the person who works in the new variable-length sentences and different languages.
post office is walking to the store”, RNNs fail to recognise the Particularly, we assume that the input of the DeepSC is a
relationship between “the person” and “is”. Besides, because sentence, s = [w1 , w2 , . . . , wL ], where wl represents the l-th
of linear sequence structure, RNNs lack of parallel computing word in the sentence. As shown in Fig. 1, the transmitter consists
capability, which means that RNNs are time-consuming. CNNs of two parts, named semantic encoder and channel encoder, to
were born with the capability of parallel computing [30]. How- extract the semantic information from s and guarantee successful
ever, even if CNNs can use deeper network to extract semantic transmission of semantic information over the physical channel.
information in long sentences, its performance is not as good The encoded symbol stream can be represented by
as that of RNNs because the kernel size in CNNs is small to
guarantee the computational efficiency. By combining with the x = Cα (Sβ (s)) , (1)
2666 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 69, 2021

where x ∈ C M ×1 , Sβ (·) is the semantic encoder network with learning the knowledge for the specific goal. In other words, the
the parameter set β and Cα (·) is the channel encoder with the channel coding can pay more attention in protecting the semantic
parameter set α. In order to simplify the analysis, we assume information related to transmission goal while neglecting other
the coherent time is M . If x is sent, the signal received at the irrelevant information. Separately designing will make channel
receiver will be coding addressing all information equally.
y = hx + n, (2)
B. Channel Encoder and Decoder Design
where y ∈ C M ×1 , h represents the Rayleigh fading channel with
One important goal on designing a communication system is
CN (0, 1) and n ∼ CN (0, σn2 ). For E2E training of the encoder
to maximize the capacity or the data transmission rate. Com-
and the decoder, the channel must allow back-propagation.
pared with BER, the mutual information can provide extra
Physical channels can be formulated by neural networks. For
information to train a receiver. The mutual information of the
example, simple neural networks could be used to model the
transmitted symbols, x, and the received symbols, y, can be
AWGN channel, multiplicative Gaussian noise channel, and
computed by
the erasure channel [22]. While for the fading channels, more

complicated neural networks are required [20]. In this paper, p(x, y)
I (x; y) = X ×Y p(x, y) log dxdy
we mainly consider the AWGN channels and Rayleigh fading p(x)p(y)
channels for simplicity while focus on semantic coding and   (5)
p(x, y)
decoding. = Ep(x,y) log ,
As shown in Fig. 1, the receiver includes channel decoder and p(y)p(x)
semantic decoder to recover the transmitted symbols and then where (x, y) is a pair of random variables with values over the
transmitted sentences, respectively. The decoded signal can be space X × Y, where X and Y are the spaces for x and y. p(x)
represented as and p(y) are the marginal probability of sending x and received
 −1 
ŝ = Sχ−1
Cδ (y) , (3) y, respectively, and p(x, y) is the joint probability of x and y.
The mutual information is equivalent to the Kullback-Leibler
where the ŝ is the recovered sentence, Cδ−1 (·) is the channel (KL) divergence between the marginal probabilities and the joint
−1
decoder with the parameter set δ and Sχ (·) is the semantic probability, which is given by
decoder network with the parameter set χ.
The goal of the system is to minimize the semantic errors I (x; y) = DKL (p(x, y) p(x)p(y) ) . (6)
while reducing the number of symbols to be transmitted. How-
From [31], we have the following theorem,
ever, we face two challenges in the considered system. The
Theorem 1: The KL divergence admits the following dual
first challenge is how to design joint semantic-channel coding.
representation
The other one is semantic transmission, which has not been
  
considered in the traditional communication system. Even if DKL (P Q ) = sup EP [T ] − log EQ eT , (7)
the existing communication system can achieve a low BER, T :Ω→R

several bits, distorted by the noise and beyond error correction where the supremum is taken over all functions T such that the
capability, could lead to understanding difficulty as the partial two expectations are finite.
semantic information of the whole sentence might be missed. According to Theorem 1, the KL divergence can also be
In order to achieve successful recovery at semantic level, we represented as
design semantic and channel coding jointly in order to keep the   
meaning between ŝ and s unchanged, which is enabled by a new DKL (p(x, y) p(x)p(y) )  Ep(x,y) [T ] − log Ep(x)p(y) eT .
DNN framework. The cross-entropy (CE) is used as the loss (8)
function to measure the difference between s and ŝ, which can Thus, the lower bound of I(x; y) can be obtained from (6) and
be formulated as (8). In order to find a tight bound on the I(x; y), an unsupervised
method is used to train the network T , where T can be approxi-
LCE (s, ŝ; α, β, χ, δ) =
mated by neural network. Meanwhile, the expectation in (8) can

− q (wl ) log (p (wl )) + (1 − q (wl )) log (1 − p (wl )) , be computed by sampling, which converges to the true value
l=1 as the number of samples increases. Then, we can optimize the
(4) encoder by maximizing the mutual information defined in (8)
where q(wl ) is the real probability that the l-th word, wl , appears and the related loss function can be given by
in estimated sentence s, and p(wl ) is the predicted probability   
that the i-th word, wi , appears in sentence ŝ. The CE can LMI (x, y; T ) = Ep(x,y) [fT ] − log Ep(x)p(y) efT , (9)
measure the difference between two probability distributions. where fT is composed by a neural network, in which the inputs
Through reducing the loss value of CE, the network can learn are samples from p(x, y), p(x), and p(y). In our proposed design,
the word distribution, q(wl ), in the source sentence, s, which x is generated by the function Cα and Sβ , thus the loss function
indicates that the syntax, phrase, the meaning of words in context can be represented by LMI (x, y; T, α, β) with
can be learnt by the network. Besides, jointly designing and
training semantic-channel coding can make the whole network LMI (x, y; T, α, β)  I(x; y). (10)
XIE et al.: DEEP LEARNING ENABLED SEMANTIC COMMUNICATION SYSTEMS 2667

From (10), the loss function can be used to train neural net- According to the semantic similarity, we propose to calculate
works to get α, β, and T . For example, the mutual information the sentence similarity between the original sentence, s, and the
can be estimated by training network T when the encoders α and recovered sentence, ŝ, as
β are fixed. Similarly, the encoder can be optimized by training
α and β when the mutual information is obtained. B Φ (s) · B Φ (ŝ)T
match (ŝ, s) = , (13)
B Φ (s) BΦ (ŝ)
C. Performance Metrics where B Φ , representing BERT [27], is a huge pre-trained model
Performance criteria are important to the system design. In including billions of parameters used for extracting the semantic
the E2E communication system, the BER is usually taken as the information. The sentence similarity defined in (13) is a number
training target by the transmitter and receiver, which sometimes between 0 and 1, which indicates how similar the decoded
neglects the other aspect goals of communication. For text sentence is to the transmitted sentence, with 1 representing
transmission, BER cannot reflect performance well. Except from highest similarity and 0 representing no similarity between s
human judgement to establish the similarity between sentences, and ŝ.
bilingual evaluation understudy (BLEU) score is usually used Compared with BLEU score, BERT has been fed by billions
to measure the results in machine translation [32], which will be of sentences. Therefore, it has already learnt the semantic infor-
used as one of the performance metrics in this paper. However, mation from these sentences and can generate different semantic
the BLEU score can only compare the difference between words vectors in different contexts effectively. With the BERT, the
in two sentences rather than their semantic information. There- semantic information behind a transmitted sentence, s, can be
fore, we initialize a new metric, named sentence similarity, to expressed as c. Meanwhile, the semantic information conveyed
describe the similarity level of two sentences in terms of their by the estimated sentence is expressed as ĉ. For c and ĉ, we can
semantic information, which is introduced in the following. This compute the sentence similarity by match(c, ĉ).
provides a solution to Question 2.
1) BLEU Score: Through counting the difference of n-grams IV. PROPOSED DEEP SEMANTIC COMMUNICATION SYSTEMS
between transmitted and received texts, where n-grams means
In this section, we propose a DNN for the considered se-
that the size of a word group. For example, for sentence “weather
mantic communication system, named as DeepSC, of which the
is good today,” 1-gram: “weather,” “is,” “good” and “today,”
Transformer is adopted for text understanding. Then, transfer
2-grams: “weather is,” “is good” and “good today”. The same
learning is adopted to make the DeepSC applicable to different
rule applies for the rest.
background knowledge and dynamic communication environ-
For the transmitted sentence s with length ls and the decoded
ments. This provides the solutions to Question 1,3.
sentence ŝ with length lŝ , the BLEU can be expressed as
 N
lŝ A. Basic Model
log BLEU = min 1 − ,0 + un log pn , (11)
ls n=1 The proposed DeepSC is as shown in Fig. 2. Particularly, the
transmitter consists of a semantic encoder to extract the semantic
where un is the weights of n-grams and pn is the n-grams score,
features from the texts to be transmitted and a channel encoder
which is
to generate symbols to facilitate the transmission subsequently.
k min (Ck (ŝ) , Ck (s)) The semantic encoder includes multiple Transformer encoder
pn = , (12)
k min (Ck (ŝ)) layers and the channel encoder uses dense layers with different
where Ck (·) is the frequency count function for the k-th elements units. The AWGN channel is interpreted as one layer in the
in n-th grams. model. Accordingly, the DeepSC receiver is composited with a
The output of BLEU is a number between 0 and 1, which channel decoder for symbol detection and a semantic decoder
indicates how similar the decoded text is to the transmitted for text estimation, the channel decoder includes dense layers
text, with 1 representing highest similarity. However, few human with different units and the semantic decoder includes multiple
translations will attain the score of 1 since word error may not Transformer decoder layers. The loss function can be expressed
make the meaning of a sentence different. For instance, the two as
sentences, “my car was parked there” and “my automobile was Ltotal = LCE (s, ŝ; α, β, χ, δ) − λLMI (x, y; T, α, β), (14)
parked there,” have the same meaning but with different BLEU
scores since they use different words. To characterize such a where the first term is the loss function considering the sentence
feature, we propose a new metric, the sentence similarity, at the similarity, which aims to minimize the semantic difference be-
sentence level in addition to the BLEU score. tween s and ŝ by training the whole system. The second one is
2) Sentence Similarity: A word can take different meanings the loss function for mutual information, which maximize the
in different contexts. For instance, the meanings of mouse in achieved data rate during the transmitter training. The parameter
biology and machine are different. The traditional method, such λ (0 ≤ λ ≤ 1) is the weight for the second term.
as word2vec [25], cannot recognise the polysemy, of which The core of Transformer is the multi-head self-attention mech-
the problem is how to use an numerical vector to express the anism, which enables the Transformer to view the previous
word while the numerical vector varies in different contexts. predicted word in the sequence, thereby better predicting the
2668 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 69, 2021

Fig. 2. The proposed neural network structure for the semantic communication system.

Fig. 3. An example of the self-attention mechanism following long-distance


dependency in the Transformer encoder.

Algorithm 1: DeepSC Network Training Algorithm. Fig. 4. The training framework of the DeepSC: phase 1 trains the mutual
Initialization: Initial the weights W and bias b. information estimation model; phase 2 trains the whole network based on the
cross-entropy and mutual information.
1: Input: The background knowledge set K.
2: Create the index to words and words to index, and then
embedding words.
joint semantic-channel coding can preserve semantic informa-
3: while Stop criterion is not met do
tion when compressing data, which provides the detailed solu-
4: Train the mutual information estimated model.
tion for aforementioned Question 3. The two training phases are
5: Train the whole network.
described in the following:
6: end while
1) Training of Mutual Information Estimation Model: The
7: Output: The whole network
mutual information estimation model training process is illus-
Sβ (·), Cα (·), Cδ−1 (·), Sχ
−1
(·).
trated in Fig. 4 and the pseudocode is given in Algorithm 2. First,
the knowledge set K generates a minibatch of sentences S ∈
B×L×1 , where B is the batch size, L is the length of sentences.
next word. Fig. 3 gives an example of the self-attention mecha- Through the embedding layer, the sentences can be represented
nism for the word ‘it’. From Fig. 3, attention attend to a distant as a dense word vector E ∈ B×L×E , where E is the dimension
dependency of the pronoun, ‘it’, completing pronoun reference of the word vector. Then, pass the semantic encoder layer to
“the animal,” which demonstrates that the self-attention mecha- obtain M ∈ B×L×V , the semantic information conveyed by
nism can learn the semantic and therefore solve aforementioned S, where V is the dimension of Transformer encoder’s output.
Question 1. Then, M is encoded into symbols X to cope with the effects
As shown in Algorithm 1, the training process of the DeepSC from the physical channel, where X ∈ B×N L×2 . After passing
consists of two phases due to different loss functions. After through the channel, the receiver obtains signal Y distorted by
initializing the weights, W, bias, b, and using embedding vector the channel noise. Based on (9), the loss, LMI (X, Y; T, α, β),
to represent the input words, the first phase is to train the mutual can be computed based on the transmitted symbols, X, and
information model by unsupervised learning to estimate the the received symbols, Y, under the AWGN channels. Finally,
achieved data rate for the second phase. The second phase is to according to computed LMI , the stochastic gradient descent
train the whole system with (14) as the loss function. Each phase (SGD) is exploited to optimize the weights and bias of fT (·).
aims to minimize the loss by gradient descent with mini-batch 2) Whole Network Training: The whole network training
until the stop criterion is met, the max number of iteration is process is illustrated in Algorithm 3. First, minibatch S from
reached, or none of terms in the loss function is decreased any knowledge K is encoded into M at the semantic level, then M is
more. Different from performing semantic coding and channel encoded into symbol X for transmission over the physical chan-
coding separately, where the channel encoder/decoder will deal nels. At the receiver, distorted symbols Y are received and then
with the digital bits rather than the semantic information, the decoded by the channel decoder layer, where M̂ ∈ B×L×V is
XIE et al.: DEEP LEARNING ENABLED SEMANTIC COMMUNICATION SYSTEMS 2669

Algorithm 2: Train Mutual Information Estimation Model. Algorithm 4: Transfer Learning Based Training for Dy-
1: Input: The knowledge set K. namic Environment.
2: Transmitter: Initialization: Load the pre-trained model Sβ (·), Cα (·),
3: BatchSource(K) → S. Cδ−1 (·), Sχ
−1
(·).
4: Sβ (S) → M. Function: Training for different background knowledge
5: Cα (M) → X. 1: Input: The different background knowledge set K1 .
6: Transmit X over the channel. 2: Freeze Cα (·) and Cδ−1 (·).
−1
7: Receiver: 3: Redesign and train part of Sβ (·) and Sχ (·).
8: Receive Y. 4: while Stop criterion is not met do
9: Compute loss LMI by (9). 5: Train the mutual information estimated model.
10: Train T → Gradient descent (T, LMI ). 6: Train the whole network.
11: Output: The mutual information estimated model 7: end while
fT (·). 8: Output: The adopted whole network.
Function: Training for different channel conditions
9: Input: The background knowledge set K with the
Algorithm 3: Train the Whole Network. different channel parameters.
−1
1: Input: The knowledge set K. 10: Freeze Sβ (·) and Sχ (·).
2: Transmitter: 11: Redesign and re-train part of Cα (·) and Cδ−1 (·).
3: BatchSource(K) → S. 12: while Stop criterion is not met do
4: Sβ (S) → M. 13: Train the mutual information estimated model.
5: Cα (M) → X. 14: Train the whole network.
6: Transmit X over the channel. 15: end while
7: Receiver: 16: Output: The re-trained network.
8: Receive Y.
9: Cδ−1 (Y) → M̂.
−1
10: Sχ (M̂) → Ŝ.
11: Compute loss function Ltotal by (14).
12: Train β, α, δ, χ → Gradient descent (β, α, δ,
χ, Ltotal ).
13: Output: The whole network
Sβ (·), Cα (·), Cδ−1 (·), Sχ
−1
(·).

the recovered semantic information of the sources. Afterwards,


the transmitted sentences are estimated by the semantic decoder
layer. Finally, the whole network is optimized by the SGD, where Fig. 5. Transfer learning based training framework: (a) re-train channel en-
coder and decoder for different channels; (b) re-train semantic encoder and
the loss is computed by (14). decoder for different background knowledge.

B. Transfer Learning for Dynamic Environment communication environments, we redesign and train part of the
In practice, different communication scenarios result in the channel encoder and decoder layers and freeze the semantic
different channels and the training data. However, the re-training encoder and decoder layers. If the knowledge and channel are
of transmitter and receiver to meet the requirements of dynamic totally different, the pre-trained transceiver can also reduce the
scenarios introduces extra costs. To address this, a deep transfer time consumption because the weights of some layers in the
learning approach is adopted, which focuses on storing knowl- pre-trained model can be reused in the new model even if the
edge gained while solving a problem and applying it to a different most layers need to redesign. After the other modules are trained,
but related problem. we will unfreeze them and train the whole network with few
The training process of adopting transfer learning is illustrated epochs to converge to the global optimum.
in Fig. 5 and the pseudocode is given in Algorithm 4, where the
training modules, mutual information estimation model training,
and whole network training, are the same as Algorithm 2 and V. NUMERICAL RESULTS
Algorithm 3. First, load the pre-trained transmitter and receiver In this section, we compare the proposed DeepSC with other
based on knowledge K0 and channel N0 . For applications with DNN algorithms and the traditional source coding and channel
different background knowledge, we only need to redesign coding approaches under the AWGN channels and Rayleigh
and train part of the semantic encoder and decoder layers and fading channels, where we assume perfect CSI for all schemes.
freeze the channel encoder and decoder layers. For different The transfer learning aided DeepSC is also verified under the
2670 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 69, 2021

erase channel and fading channel as well as different background TABLE I


THE SETTING OF THE DEVELOPED SEMANTIC NETWORK
knowledge.

A. Simulation Settings
The adopted dataset is the proceedings of the European Par-
liament [33], which consists of around 2.0 million sentences and
53 million words. The dataset is pre-processed into lengths of
sentences with 4 to 30 words and is split into training data and
testing data.
In the experiment, we set three Transformer encoder and
decoder layer with 8 heads and the channel encoder and decoder
are set as dense with 16 units and 128 units, respectively. For the
mutual information estimation model, we set two dense layers
with 256 units and one dense layer with 1 unit to mimic the
function T in (7), where 256 units can extract full information
and 1 unit can integrate information. These settings can be found
with long block-length, and can correct long series of bits,
in Table I. For the baselines, we adopt joint source-channel
however, turbo coding is a type of convolutional coding with
coding based on neural network and the typical methods for
short block-length, so that the adjacent words have higher error
separate source and channel codings.
r DNN based joint source-channel coding [22]: The net- rate. DeepSC is not only suitable for short block-length but
also performs better in decoding adjacent words, i.e., 4-grams.
work consists of Bi-directional Long Short-Term Memory
Note that the BLEU score of the method with Brotil coding and
(BLSTM) layers. We label it as JSCC [22] in the simulation
turbo coding is always 0 over Rayleigh fading channels. This
figures.
r Traditional methods: To perform the source and channel is because that 128 sentences are compressed together, while
Brotil decoding requires error-free codes after channel decoding
coding separately, we use the following technologies re-
for the codes corresponding to the 128 sentences. However, it
spectively:
is almost to guarantee the error-free transmission over Rayleigh
– Source coding: Huffman coding, fixed-length coding
fading channels. Therefore, we fail to restore any of the 128
(5-bit), and Brotli coding, where Brotli coding uses
sentences compressed together in Brotil coding as shown in
2nd context model to compress the context information
Fig. 6(b). Besides, the lower BLEU score of the DL enabled
and every 128 sentences are compressed together in the
approaches may not be caused by word errors. For example,
simulation.
it may be due to substitutions of words using synonyms or
– Channel coding: Turbo coding [34] and Reed-Solomon
rephrasing, which does not change the meaning of the word.
(RS) coding [35]. We adopt turbo decoding method is
Fig. 6 also demonstrates that the joint semantic-channel coding
log-MAP algorithm with 5 iterations.
design outperforms the traditional methods, which provides
The BLEU and sentence similarity are used to measure the
solution to Question 1 and 3.
performance. The simulation is performed by the computer with
Fig. 7 shows that the proposed performance metric, the sen-
Intel Core i7-9700 [email protected] GHz and NVIDIA GeForce GTX
tence similarity, with respect to the SNR under the same total
2060.
number of symbols, where the traditional approaches use 8-
QAM, 64-QAM and 128-QAM. In Fig. 7(a), the proposed metric
B. Basic Model
has shown the same tendency compared with the BLEU scores.
Fig. 6 shows the relationship between the BLEU score and Note that for part of the traditional methods, i.e., Huffman with
the SNR under the same number of transmitted symbols over Turbo coding, even if it can achieve about 20% word accuracy
AWGN and Rayleigh fading channels, where the traditional in BLEU score (1-gram) from Fig. 6(a) when SNR = 9 dB,
approaches use 8-QAM, 64-QAM, and 128-QAM for the mod- people are usually unable to understand the meaning of texts
ulation. Among the traditional baselines in Fig. 6(a), Brotli full of errors. Thus, the sentence similarity in Fig. 7(a) almost
coding outperforms the Huffman and fixed-length encoding over converges to 0. For the DeepSC, it achieves more than 90%
AWGN channels when the turbo coding is adopted for channel word accuracy in BLEU score (1-gram) when SNR is higher
coding. The traditional approaches perform better than the DNN than 6 dB in Fig. 6(a), which means people can understand
based method when the SNR is above 12 dB since the distortion the texts well. Therefore the sentence similarity tends to 1.
from channel is decreased, where the Brotli with turbo coding Fig. 6(b) and Fig. 7(b) show the same tendency. The benchmark,
performs better than the DeepSC. We observe that all DL enabled including the DNN based JSCC method in [22] under Rayleigh
approaches are more competitive in the low SNR regime. fading channels, also gets much higher score than the traditional
In Fig. 6(b), the DL enabled approaches outperform all tra- approaches in terms of the sentence similarity since it can capture
ditional approaches over the Rayleigh fading channels, where the features of the syntax and the relationship of the words, as
RS coding is better than turbo coding in terms of 2-grams well as present texts that is easier for people to understand. Few
to 4-grams. This is because RS coding is linear block coding representative results are shown in Table II.
XIE et al.: DEEP LEARNING ENABLED SEMANTIC COMMUNICATION SYSTEMS 2671

Fig. 6. BLEU score versus SNR for the same total number of transmitted symbols, with Huffman coding with RS (30,42) in 64-QAM, 5-bit coding with RS
(42, 54) in 64-QAM, Huffman coding with Turbo coding in 64-QAM, 5-bit coding with Turbo coding in 128-QAM, Brotli coding with Turbo coding in 8-QAM;
the DNN based JSCC [22] trained over the AWGN channels and Rayleigh fading channels, our proposed DeepSC trained over the AWGN channels and Rayleigh
fading channels.

Fig. 7. Sentence similarity versus SNR for the same total number of transmitted symbols, with Huffman coding with RS (30,42) in 64-QAM; 5-bit coding with RS
(42, 54) in 64-QAM; Huffman coding with Turbo coding in 64-QAM; 5-bit coding with Turbo coding in 128-QAM; Brotli coding with Turbo coding in 8-QAM; an
E2E trained over the AWGN channels and Rayleigh fading channels [22]; our proposed DeepSC trained over the AWGN channels and Rayleigh fading channels.
2672 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 69, 2021

TABLE II
THE SAMPLE SENTENCES BETWEEN DIFFERENT METHODS OVER RAYLEIGH FADING CHANNELS WHEN SNR IS 18 dB

Fig. 8. BLEU score (1-gram) versus the average number of symbols used for Fig. 9. SNR versus mutual information for different trained encoders, with 8
one word in the DeepSC, SNR = 12 dB. symbols per word.

In brief, we can conclude that the tendency in sentence similar-


ity is more closer to human judgment and the DeepSC achieves
the best performance in terms of both BLEU score and sentence
similarity. Compared to the simulation results with BLEU score
as the metric, the sentence similarity score can better measure
the semantic error, which solves the Question 2.
Fig. 8 illustrates that the impact of the number of symbols
per word on the 1-gram BLEU score when SNR is 12 dB.
As the number of symbols per word grows, the BLEU scores
increase significantly due to the increasing distance between
constellations gradually. Generally, people can understand the
basic meaning of transmitted sentences with over 85% word
accuracy in BLEU score (1-gram). For short sentences consisted
of 5 to 13 words, our proposed DeepSC can achieve 85% Fig. 10. The impact of different learning rates with training SNR = 12 dB.
accuracy with 4 symbols per word, which means that we can
use fewer symbols to represent one word in the environment model outperforms that without such a model. From Fig. 9, with
that mainly transmits short sentences. Therefore, it can achieve the proposed mutual information estimation model, the obtained
high speed transmission rate. For longer sentences consisted mutual information at SNR = 4 dB is approximately same as
from of 21 to 30 words, the proposed DeepSC faces more that without the training model at SNR = 9 dB. From another
difficulties to understand the complex structure of the sentences point of view, the mutual information estimation model leads to
in the transmitted texts. Hence the performance is degraded with better learning results, i.e., data distribution, at the encoder to
longer sentences. One way to improve the BLEU score is to achieve higher data rate. In addition, this shows that introducing
increase the average number of symbols used for each word. (9) in loss function can improve the mutual information of the
system.
Fig. 10 draws the relationship between the loss value in (14)
C. Mutual Information
and the mutual information with increasing epoch. Fig. 11 indi-
Fig. 9 demonstrates the relationship between SNR and mutual cates the relationship between BLEU score and SNR. The two
information after training. As we can imagine, the mutual infor- figures are based on models with the same structure but different
mation increases with SNR. From the figure, the performance of training parameters, i.e., learning rate. In Fig. 10, the obtained
the transceiver trained with the mutual information estimation mutual information is different, i.e., the mutual information of
XIE et al.: DEEP LEARNING ENABLED SEMANTIC COMMUNICATION SYSTEMS 2673

TABLE III
THE AVERAGE SENTENCE PROCESSING RUNTIME VERSUS VARIOUS SCHEMES

pre-trained model can provide additional knowledge so that the


corresponding model training outperforms that of re-training
the whole system. This demonstrates that the transfer learning
aided DeepSC can help the transceiver to accommodate the new
requirements of communication environment.
Fig. 13 shows the training efficiency and the performance for
different channels, where the DeepSC transceiver is pre-trained
under the AWGAN channel, and then it is re-trained under the
Fig. 11. BLEU score (1-gram) versus SNR for different learning rates, with
training SNR = 12 dB. erasure channel and the Rician fading channel, respectively, with
the same background knowledge. The models have the same
structure and re-train with the same parameters in each scenario.
From Fig. 13(a) and Fig. 13(b), the adoption of the pre-trained
model can speed up the training process for both the erasure
channel and Rician fading channel. In Fig. 13(c) and Fig. 13(d),
the performance of the DeepSC with pre-trained model is similar
to that without pre-trained model channel while the required
complexity is reduced significantly as less number of epochs is
required during the re-training process. It is further noted that
the BLEU score achieved by the DeepSC is slightly degraded
under the fading channel, especially in the lower SNR region,
compared to that under the erasure channel.

E. Complexity Analysis

Fig. 12. Transfer learning (TL) aided DeepSC with different background
The computational complexities of the proposed DeepSC,
knowledge: (a) loss values versus the number of training epochs, (b) BLEU the JSCC in [22], the RS coding, Turbo coding, are compared
score (1-gram) versus the SNR. in Table III in terms of the average processing runtime per
sentence.1 All the DL enabled approaches have lower runtime
model with learning rate 0.001 increases along with decreasing than the traditional approaches, where turbo coding costs much
loss value while the other one with learning rate 0.002 stays zero longer runtime in log-map iterations and the JSCC [22] requires
although the loss values of two models gradually converge to a the lowest average time due to its simple network architecture,
stable state. From Fig. 11, the BLEU score with learning rate however, it comes with poorer semantic processing capability.
0.001 outperforms that with learning rate 0.002, which means As a comparison, the runtime of our proposed DeepSC signifi-
that even if the neural network converges to a stable state, it is cantly outperforms the traditional schemes and is slight higher
possible that gradient decreases to a local minimum instead of than JSCC [22] but with significant performance improvement.
the global minimum. During the training process, the mutual
information can be used as a tool to decide whether the model VI. CONCLUSION
converges effectively. In this paper, we have proposed a semantic communication
system, named DeepSC, which jointly performs the semantic-
D. Transfer Learning for Dynamic Environment channel coding for texts transmission. With the DeepSC, the
In this experiment, we present the performance of transfer length of input texts and output symbols are variable, and the
learning aided DeepSC for two tasks: transmitter and receiver mutual information is considered as a part of the loss function
re-training over different channels and diffident background to achieve higher data rate. Besides, the deep transfer learning
knowledge. has been adopted to meet different transmission conditions
Fig. 12 shows the training efficiency and the performance and speed up the training of new networks by exploiting the
for different background knowledge, where the model will be knowledge from the pre-trained model. Moreover, we initialized
trained and re-trained in new background knowledge with the sentence similarity as a new performance metric for the semantic
same channel (AWGN) for different background knowledge. error, which is a measure closer to human judgement. The sim-
The models have the same structure and re-train with the same ulation results has demonstrated that the DeepSC outperforms
parameters in each scenario. From Fig. 12(a), the epochs are
reduced from 30 to 5 to reach convergence. In Fig. 12(b), the 1 The runtime of source coding and decoding are omitted in the comparison.
2674 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 69, 2021

Fig. 13. Transfer learning aided DeepSC with different channels: (a) loss values versus epochs under the erasure channel; (b) Loss values versus epochs under
the Rician fading channel; (c) BLEU score (1-gram) versus the dropout rate; (d) BLEU score (1-gram) versus the SNR.

various benchmarks, especially in the low SNR regime. The [6] R. Carnap et al., “An outline of a theory of semantic information,” Res. Lab.
proposed transfer learning aided DeepSC has shown its ability to Electronics, Massachusetts Inst. Technol., Cambridge MA, RLE Tech.
Rep. 247, Oct. 1952.
adapt to different channels and knowledge with fast convergence [7] J. Bao et al., “Towards a theory of semantic communication,” in Proc. IEEE
speed. Therefore, our proposed DeepSC is a good candidate for Netw. Sci. Workshop, West Point, NY, USA, Jun. 2011, pp. 110–117.
text transmission, especially in the low SNR regime, which could [8] B. H. Juang, “Quantification and transmission of information and
intelligence-history and outlook [DSP history],” IEEE Signal Process.
be very useful for cases with massive number of devices to be Mag., vol. 28, no. 4, pp. 90–101, Jul. 2011.
connected with the limited spectrum resource. [9] P. Basu, J. Bao, M. Dean, and J. Hendler, “Preserving quality of informa-
We conclude the difference between semantic communica- tion by using semantic relationships,” Pervasive Mobile Comput., vol. 11,
pp. 188–202, Apr. 2014.
tion systems and conventional communication systems into the [10] B. Guler, A. Yener, and A. Swami, “The semantic communication game,”
following: IEEE Trans. Cogn. Commun. Netw., vol. 4, no. 4, pp. 787–802, Dec. 2018.
1) Different data processing domains. The former process [11] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by
jointly learning to align and translate,” in Proc. Int. Conf. Learn. Repre-
data in semantic domain while the latter compress data in sentations, ICLR, San Diego, CA, USA, May 2015.
entropy domain. [12] E. Cambria and B. White, “Jumping NLP curves: A review of natural
2) Different communication targets. The conventional com- language processing research [review article],” IEEE Comput. Intell. Mag.,
vol. 9, no. 2, pp. 48–57, May 2014.
munication systems focus on the exact data recovery while [13] N. Farsad and A. Goldsmith, “Neural network detection of data sequences
the semantic communication systems serve for the deci- in communication systems,” IEEE Trans. Signal Process., vol. 66, no. 21,
sions or targets of the transmission. pp. 5663–5678, Nov. 2018.
[14] H. He, C. Wen, S. Jin, and G. Y. Li, “Model-driven deep learning for
3) Different system designs. The conventional systems only MIMO detection,” IEEE Trans. Signal Process., vol. 68, pp. 1702–1715,
design and optimize the information transmission mod- 2020.
ules, which are contained in the traditional transceiver, [15] H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos, “Learn-
ing to optimize: Training deep neural networks for interference man-
however, the semantic systems jointly design the whole agement,” IEEE Trans. Signal Process., vol. 66, no. 20, pp. 5438–5453,
information processing blocks from source information Oct. 2018.
to final targets of applications. [16] Z. Qin, H. Ye, G. Y. Li, and B.-H. F. Juang, “Deep learning in physical
layer communications,” IEEE Wireless Commun., vol. 26, no. 2, pp. 93–99,
Following the concept of semantic communications proposed Apr. 2019.
in this paper, we have developed L-DeepSC [36] and DeepSC- [17] A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural Inf.
S [37] for text and speech transmission. Process. Syst., Long Beach, CA, USA. Dec. 2017, pp. 5998–6008.
[18] T. O’Shea and J. Hoydis, “An introduction to deep learning for the physical
layer,” IEEE Trans. Cogn. Comm. Netw., vol. 3, no. 4, pp. 563–575,
REFERENCES Dec. 2017.
[19] S. Dörner, S. Cammerer, J. Hoydis, and S. ten Brink, “Deep learning based
[1] H. Xie, Z. Qin, G. Y. Li, and B. H. Juang, “Deep learning based semantic communication over the air,” IEEE J. Sel. Topics Signal Process., vol. 12,
communications: An initial investigation,” Proc. IEEE Global Comm. no. 1, pp. 132–143, Feb. 2018.
Conf., Taipei, Taiwan, Dec. 7-11, 2020, pp. 1–6. [20] F. A. Aoudia and J. Hoydis, “Model-free training of end-to-end com-
[2] C. E. Shannon and W. Weaver, The Mathematical Theory of Communica- munication systems,” IEEE J. Sel. Areas Commun., vol. 37, no. 11,
tion. The University of Illinois Press, 1949. pp. 2503–2516, Nov. 2019.
[3] D. Tse and P. Viswanath, Fundamentals Wireless Communication. Cam- [21] H. Ye, L. Liang, G. Y. Li, and B. Juang, “Deep learning based end-to-
bridge, U.K.: Cambridge Univ. Press, 2005. end wireless communication systems with conditional GAN as unknown
[4] L. Atzori, A. Iera, and G. Morabito, “The internet of things: A survey,” channel,” IEEE Trans. Wireless Commun., vol. 19, no. 5, pp. 3133–3143,
Comput. Netw., vol. 54, no. 15, pp. 2787–2805, May 2010. May 2020.
[5] F. Jameel, Z. Chang, J. Huang, and T. Ristaniemi, “Internet of au- [22] S. Park, O. Simeone, and J. Kang, “End-to-end fast training of communi-
tonomous vehicles: Architecture, features, and socio-technological chal- cation links without a channel model via online meta-learning,” Mar. 2020,
lenges,” IEEE Wireless Commun., vol. 26, no. 4, pp. 21–29, Aug. 2019. arXiv:2003.01479.
XIE et al.: DEEP LEARNING ENABLED SEMANTIC COMMUNICATION SYSTEMS 2675

[23] N. Farsad, M. Rao, and A. Goldsmith, “Deep learning for joint source- Geoffrey Ye Li (Fellow, IEEE) has been a Chair
channel coding of text,” in Proc. IEEE Int. Conf. Acoust. Speech, Signal Professor with Imperial College London, London,
Process., Calgary, AB, Canada, Apr. 2018, pp. 2326–2330. U.K., since 2020. Before moving to Imperial College
[24] E. Bourtsoulatze, D. B. Kurka, and D. Gündüz, “Deep joint source-channel London, he was a Professor with the Georgia Institute
coding for wireless image transmission,” IEEE Trans. Cogn. Commun. of Technology, Atlanta, GA, USA, for 20 years and a
Netw., vol. 5, no. 3, pp. 567–579, Sep. 2019. Principal Technical Staff Member with AT&T Labs
[25] R. Kneser and H. Ney, “Improved backing-off for m-gram language - Research in New Jersey, USA, for five years. His
modeling,” in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process., publications have been cited over 45000 times and he
Detroit, Michigan, USA, 1995, pp. 181–184. has been recognized as a Highly Cited Researcher, by
[26] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of Thomson Reuters, almost every year. He has authored
word representations in vector space,” in Proc. Int. Conf. Learn. Repre- or coauthored more than 500 journal and conference
sentations, Scottsdale, Arizona, USA, May 2013. papers in addition to more than 40 granted patents in his research, which include
[27] M. E. Peters et al., “Deep contextualized word representations,” in Proc. statistical signal processing and machine learning for wireless communications.
North Amer. Chapter Assoc. Comput. Linguistics: Hum. Lang. Tech.,, New He has been involved in editorial activities for more than 20 technical journals,
Orleans, Louisiana, Jun. 2018, pp. 2227–2237. including the Founding Editor-in-Chief of the IEEE JOURNAL ON SELECTED
[28] J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training AREAS IN COMMUNICATIONS Special Series on ML in Communications and
of deep bidirectional transformers for language understanding,” in Proc. Networking. He has organized and chaired many international conferences,
North Amer. Chapter Assoc. Comput. Linguistics: Hum. Lang. Tech., including the Technical Program Vice-Chair of the IEEE ICC’03, the General
Minneapolis, MN, USA, Jun. 2019, pp. 4171–4186. Co-Chair of the IEEE GlobalSIP’14, the IEEE VTC’19 (Fall), and the IEEE
[29] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” SPAWC’20. He was the recipient of the IEEE Fellow for his contributions to sig-
IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2673–2681, Nov. 1997. nal processing for wireless communications in 2005, several prestigious awards
[30] A. Graves, “Generating sequences with recurrent neural networks,” from IEEE Signal Processing Society (Donald G. Fink Overview Paper Award in
Aug. 2013, arXiv:1308.0850. 2017), IEEE Vehicular Technology Society (James Evans Avant Garde Award in
[31] N. Kalchbrenner, E. Grefenstette, and P. Blunsom, “A convolutional neural 2013 and Jack Neubauer Memorial Award in 2014), and IEEE Communications
network for modelling sentences,” in Proc. Annu. Meeting Assoc. Comput. Society (Stephen O. Rice Prize Paper Award in 2013, Award for Advances in
Linguistics, Baltimore, MD, USA, Jun. 2014, pp. 655–665. Communication in 2017, and Edwin Howard Armstrong Achievement Award in
[32] M. I. Belghazi et al., “Mutual information neural estimation,” in Proc. Int. 2019), and the the 2015 Distinguished ECE Faculty Achievement Award from
Conf. Mach. Learn., Stockholm, Sweden, Jul. 2018, pp. 531–540. Georgia Tech.
[33] K. Papineni, S. Roukos, T. Ward, and W. Zhu, “Bleu: A method for auto-
matic evaluation of machine translation,” in Proc. Annu. Meeting Assoc.
Comput. Linguistics, Philadelphia, PA, USA, Jul. 2002, pp. 311–318.
[34] P. Koehn, “Europarl: A Parallel Corpus for Statistical Machine Transla-
tion,” in MT summit, vol. 5. Citeseer, Sep. 2005, pp. 79–86.
[35] C. Heegard and S. B. Wicker, Turbo Coding. Boston, MA, USA: Springer
Science & Business Media, 2013, vol. 476.
[36] I. S. Reed and G. Solomon, “Polynomial codes over certain finite fields,” Biing-Hwang Juang (Life Fellow, IEEE) received
J. Soc. Ind. Appl. Math., vol. 8, no. 2, pp. 300–304, Jan. 1960. the Ph.D. from the University of California, Santa
[37] H. Xie and Z. Qin, “A lite distributed semantic communication system Barbara, Santa Barbara, CA, USA. He was with
for Internet of Things,” IEEE J. Sel. Areas Commun., vol. 39, no. 1, Speech Communications Research Laboratory and
pp. 142–153, Jan. 2021. Signal Technology, Inc., on a number of Government-
[38] Z. Weng, Z. Qin, and G. Y. Li, “Semantic communications for speech sponsored research projects. Notable accomplish-
signals,” in Proc. IEEE Int. Conf. Commun. ICC, Montreal, Canada, ments during the period include development of vec-
Jun 2021. tor quantization for voice applications, voice coders at
extremely low bit rates, 800 bps and around 300 bps,
and robust vocoders for use in satellite communica-
tions. He was also a Co-Principal Investigator for a
Huiqiang Xie ( Graduate Student Member, IEEE) re- project on co-channel separation of speech signals. He subsequently joined the
ceived the B.S. degree from Northwestern Polytech- Acoustics Research Department of Bell Laboratories, working in the area of
nical University, Xi’an, China, and the M.S. degree speech enhancement, coding and recognition. In 1996, he became the Director
from Chongqing University, Chongqing, China. He is of Acoustics and Speech Research at Bell Labs, and in 2006, the Director of
currently working toward the Ph.D. degree with the Multimedia Technologies Research at Avaya Labs (a spin-off of Bell Labs).
Queen Mary University of London, London, U.K. His His group continued the long heritage of Bell Labs in speech communication
research interests include semantic communication, research, including, most notably, the invention of electret microphone, network
massive MIMO, and machine learning. echo canceller, a series of speech CODECs, and key algorithms for signal
modeling and automatic speech recognition. He and his group developed a
speech server for applications, such as AT&T’s advanced 800 calls and the
Moviefone, the Perceptual Audio Coder for digital audio broadcasting in North
America (in both terrestrial and satellite systems), and a world-first real-time
Zhijin Qin (Member, IEEE) received the B.S. degree full-duplex hands-free stereo teleconferencing system. In 2002, he was a Mo-
from the Beijing University of Posts and Telecom- torola Foundation Chair Professor with the School of Electrical and Computer
munications, Beijing, China, in 2012 and the Ph.D. Engineering, Georgia Institute of Technology and was also named a Georgia
degree from the Queen Mary University of London Research Alliance Eminent Scholar.
(QMUL), London, U.K., in 2016. Since 2018, she has He has authored or coauthored extensively, including the book Fundamentals
been a Lecturer with QMUL. of Speech Recognition, coauthored with L.R. Rabiner, and holds about twenty
Her research interests include semantic communi- patents. He was on a number of positions in the IEEE Signal Processing Society,
cations, federated learning for wireless resource allo- including the Editor-in-Chief of the IEEE TRANSACTIONS ON SPEECH AND
cation, and compressive sensing. She is an Associate AUDIO PROCESSING. He was the recipient of a number of technical awards,
Editor for the IEEE TRANSACTIONS ON COMMUNICA- which include several Best Paper awards in the area of speech communications
TIONS, IEEE COMMUNICATIONS LETTERS, and IEEE and processing, the Technical Achievement Award from the Signal Processing
TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING. She was Society of the IEEE, the IEEE Third Millennium Medal, and the IEEE James
the recipient of the 2017 IEEE GLOBECOM Best Paper Award and the 2018 L. Flanagan Medal Field Award. He is a Fellow of Bell Laboratories, a Member
IEEE Signal Processing Society Young Author Best Paper Award. of the U.S. National Academy of Engineering, a Charter Fellow of the U.S.
National Academy of Inventors, and an Academician of Academia Sinica.

You might also like