Streaming Transmission For Random Arrivals of Data Packets: Won S. Yoon
Streaming Transmission For Random Arrivals of Data Packets: Won S. Yoon
Streaming Transmission For Random Arrivals of Data Packets: Won S. Yoon
Won S. Yoon
Cambridge, MA 02142 USA
Email: [email protected]
Abstract
We study the potential benets of using stream codes for transmitting random arrivals of
data packets. Stream codes operate in a continuous manner by allowing data to be encoded and
decoded at any time, without the need to wait for older packets to nish decoding. We develop
a random-coding formulation for streaming that generalizes the traditional block-coding model,
and obtain exponentially tight expressions for optimal decoding error as a function of decoding
delay. For a single-user channel with Poisson arrivals of xed-length packets, we obtain bounds
on the end-to-end delay of rate-adaptive stream codes and show that simple greedy rate control
policies achieve nearly optimal performance.
1 Introduction
In modern communication networks, data is typically transmitted as independent blocks, or packets.
Achieving high throughput and low error rates requires encoding a large amount of data into
long codeword blocks, thus reaping the well-known information-theoretic benets of joint-coding
eciency. If the data trac is bursty and random, however, this block-coding strategy results in
large queueing delays as the encoder must wait to collect data.
A more general transmission strategy is to relax the block-coding constraint and allow data to
be encoded and decoded at any time in a continuous stream. We refer to this general strategy
as stream coding. Streaming achieves the benets of joint-coding eciency without suering extra
encoding (queueing) delay. The tradeo, however, is potentially larger decoding (transmission)
delay due to jointly decoding with both the past and the future data in the stream.
The goal of our study is twofold: rst, to analyze this fundamental tradeo of encoding (queue-
ing) delay versus decoding (transmission) delay in general stream codes (which subsumes the class
of block codes); and second, to design streaming rate control policies that minimize the overall
end-to-end delay, subject to constraints on average transmission power and average probability
of decoding error. We rst develop an information-theoretic random-coding framework to model
the tradeo between reliability and decoding delay for stream codes. We then consider two dif-
ferent cases of data trac, deterministic arrivals and Poisson arrivals, and determine the resulting
end-to-end delay at the network layer using queueing analysis.
Streaming transmission has long been a common technique in the realm of convolutional codes,
which encode data continuously in a stream and decode data using either a block-based (ML)
decoder or a symbol-based (MAP) decoder. The error-versus-delay performance of streaming con-
volutional codes was rst investigated in [1] and more recently in [2]. Our approach is from a
fundamentally cross-layer perspective, in which we analyze the network-layer delay performance of
physical-layer stream codes.
1
Transmitter
Channel
Encoder
Rate
Ctrl
A
t
channel
Channel
Decoder
U
t
X
t
Y
t
U
t-d
^
Figure 1. The General System Model
2 Formulation of the Problem
The canonical model of rate-adaptive streaming transmission is shown in Figure 1. Assume a time-
slotted system. In slot t, A
t
packets of length m bits arrive randomly into the transmitters queue,
and the rate control removes M
t
packets from the queue to create a message U
t
of dimension 2
M
t
m
.
The encoder takes this message, stores it in memory and creates a new channel symbol from the
entire history of messages, X
t
= f(U
1
, . . . , U
t
). The receiver observes the resulting channel output
Y
t
and attempts to decode all the messages {
U
1
, . . . ,
U
t
} using parallel per-message MAP decoding.
As soon as a message accumulates sucient reliability, it is decoded and forwarded.
It is evident that this is a generalization of the traditional block-coding model, which has the
additional constraint that data may be encoded (M
t
> 0) only when all previous data have been
reliably decoded. This implies that the resulting performance of stream codes can be no worse than
that of block codes, and in some scenarios may be strictly better. The cost of this improvement
is increased code memory and processing, without requiring any extra channel resources such as
power or bandwidth.
2.1 The Encoder
The stream encoder is a rate-adaptive convolutional shift-register-based encoder. The dynamics
of the encoder can be represented by a code tree (Figure 2), which is dened by two parameters:
the branch length, which is the minimum delay between branching points (minimum delay between
encodings), and the branch dimension, which is the maximum number of outgoing edges from a
branching point (maximum number of bits per encoding). Traditional convolutional codes assume
a xed branch length N and xed branch dimension M, so that the tree grows at a constant rate
of log M/N bits per slot. Recently, punctured codes have been designed to adapt the branch length
N by removing some symbols from the encoders output {X
t
}, while keeping the branch dimension
M xed (i.e., constant input rate to the encoder).
In our study, we adapt both the length and dimension of branches by controlling the code input
rate M
t
. This results in a tree that grows at a time-varying rate. More input packets result in
more branches, while fewer input packets result in fewer branches (and zero input packets results
in no branching, or parallel branches). When all the packets in the encoder have achieved sucient
reliability, then the encoder purges it memory and resets the tree to a xed initial state.
For simplicity, we assume a binary control of either one packet (M
t
= 1) or zero packets (M
t
= 0)
encoded per slot. This binary assumption suers little loss of optimality when the channel is power-
limited, in which case only a few bits of reliability are transmitted per channel use anyways.
2.2 The Decoder
The decoder has its own version of the code tree, on which it employs per-packet MAP decoding.
For the packet encoded at time t, the decoder compares the aggregate likelihoods of groups of paths,
2
0
1
0
1
T
subtrees
with correct
time-T bit.
U
t
1 0 1 1 0 0 1
t
Figure 2. Sample path of adaptive-rate encoder with encoded bits U
t
, and subtrees corresponding to time-T packet, U
T
.
each group containing all subtrees that share the same time-t branch. Figure 2 shows a sample
path of a tree in which two groups of subtrees (corresponding to U
T
= 0 and U
T
= 1) each contain
two subtrees. The decoder compares the aggregate likelihood of the two subtrees with U
T
= 1
versus the aggregate likelihood of the two subtrees with U
T
= 0.
At stage t + d, the decoder makes a decision on a packet that was transmitted back at stage
t by choosing the branch whose corresponding group of subtrees has maximum likelihood. The
probability of error for this single branch decision, as a function of decoding delay d, is derived in
the next section.
At any given time, there may be multiple packets in transmission (awaiting decisions). As soon
as a packet accumulates enough reliability, it is decoded and released. The per-packet nature of
MAP decoding allows packets to depart out of order. In contrast, sequential and ML algorithms
decode the best overall path through the entire tree, and therefore data must be decoded in a serial
rst-in-rst-out manner. For our assumption of binary encoding, it turns out that MAP decoding
is also serial and therefore all three algorithms have the same error-versus-delay performance.
However, for general non-binary variable-rate transmission, MAP decoding achieves smaller delay
than sequential and ML decoding by allowing a later low-rate stage to be decoded before an earlier
high-rate stage.
2.3 Idle Periods and Resetting the Trellis
If there is a long enough lull in the arrivals during which time all packets have been reliably
transmitted, then both the encoder and decoder erase their memory and re-synchronize their trees.
As soon as a new packet arrives, the encoding process re-starts from a known initial state. We refer
to time 0 as the most recent re-synchronization point, which is equal to the rst arrival time
after an idle period (i.e., the start of a new busy period). Therefore, the encoding process can be
viewed as alternating cycles of busy and idle periods.
If the queueing system is stable (arrival rate is strictly less than the channel capacity), then it is
known that the busy periods are nite with probability one. Since the number of packets in memory
at any time is bounded by the number of packets transmitted in a busy period, we are guaranteed
that the required memory and computational time are nite with probability one, although they
may be arbitrarily large nite values. In practice nite code memory can be handled by switching
to block-coding mode and holding new data in queue until old data is reliably decoded.
3
N(t)
Figure 3. Sample path of undecoded packets, N(t), in the encoder and the corresponding dynamics of the code tree.
3 Error Performance for Stream Codes
A fundamental issue for stream codes is the behavior of error probability as a function of decoding
delay. For maximum-likelihood decoding, [1] showed that the error probability as a function of
decoding delay satises the block random coding exponent. We show an analogous result for the
case of time-varying code inputs with per-packet MAP decoding.
3.1 Error-versus-Delay for Real-Time Codes
Before analyzing the error performance of stream codes, we address the subtle but important
concept of error versus delay. Traditional random coding analysis assumes a xed code length
and proves the existence of a good code for that code length. However, this says nothing about the
error performance when the same code is used at dierent code lengths (dierent decoding delays).
This is an important constraint because in practice, it is impractical to switch codes in the middle
of a transmission. Therefore, we need to show the existence of codes that are uniformly good for
all code lengths, not just for one particular code length. Similar results have been independently
derived in [1] and [2].
Dene a real-time optimal code (x
T
1
, . . . , x
T
M
), as having the property that all truncations (x
t
1
, . . . , x
t
M
),
t = 1, . . . , T have a probability of error P
n
em
that is upper-bounded by the block random coding
error exponent.
Proposition 3.1. There exists a code (x
T
1
, . . . , x
T
M
) with truncations (x
t
1
, . . . , x
t
M
), t = 1, 2, . . . , T,
such that P
t
e,m
is uniformly bounded for every block length t T and every message m = 1, . . . , M,
P
t
e,m
exp tE
r
(m/t) for t C log M
where m = log M, and E
r
(R) = max
[0,1]
{E
0
() R} is Gallagers reliability function for block
codes and C = E
0
()/ is channel capacity.
The proof uses the Markov inequality and the union bound to show that, for a random ensemble of
codes, there is a probability strictly less than one that every code in the ensemble has a truncation
that exceeds the average error probability by a specied amount.
3.2 Error Exponents for Real-Time MAP Decoding
We use the previous result to analyze the error-versus-delay performance of stream codes. Assume
innite code memory (innite constraint length) for ease of analysis.
At time t, we decode stage < t using the observations y
t
1
= (y
1
, . . . , y
t
). Dene the maximum
a posteriori probability of stage as (y
t
1
) = max
x
Pr(X
= x|Y
t
1
= y
t
1
). It can be shown that this
is proportional to the sum of likelihoods for paths that share a common stage-t,
(y
t
1
) max
x
x
1
/x
x
t
Pr(Y
t
1
= y
t
1
|X
t
1
= x
t
1
)
4
In other words, nding the X
that maximizes the aggregate likelihoods of all subtrees that emanate from it.
Consider making a decision about time t just before reaching time t + d (decoding delay of d
slots). Let M
t
denote the number of dierent inputs possible at time t (i.e., the total number
of branches in stage t), which is equal to M
t
= 2
U
t
m
, where U
t
is again the number of packets
encoded at time t. Assume that the correct path corresponds to an information sequence of all
zeros. Although the comparison at time t is only amongst U
t
1 other paths, there are actually
many more paths that are potential adversaries. An adversarial path is one that diverged from the
correct path at or before time t.
1. Stage t: there are M
t
1 branches that diverge from the correct path, each with
d1
i=1
M
t+i
subsequent tails of length d 1. The total number of length-d paths that diverged from the
correct path at time t is therefore W
t
= (M
t
1)
d1
i=1
M
t+i
, which is upper-bounded by
W
t
d1
i=0
M
t+i
.
2. Stage (t 1): there are M
t1
1 branches that diverge from the correct path, each with
W
t
subsequent tails of length d that are incorrect in stage t. The total number of diverging
paths of length d + 1 that are incorrect in stage t is therefore W
t1
= (M
t1
1)W
t
, which
is upper-bounded by W
t1
d1
i=1
M
t+i
.
3. In general, in stage (t k): the total number of diverging paths of length d + k that are
incorrect in stage t is upper-bounded by W
tk
d1
i=k
M
t+i
.
The error contribution of each adversarial group can be upper-bounded by the the random coding
bound for block codes. Using Proposition 3.1, we know that there exists a single code for which
the probability of error is upper-bounded by the random coding bound, for each decoding delay.
Taking a union bound over all such adversarial paths, the overall probability of error for decoding
stage t as a function of decoding delay d is upper-bounded
P
e
d1
i=0
M
t+i
e
dnE
0
k=1
i=k
M
t+i
e
knE
0
= e
P
d1
i=0
m
t+i
e
dnE
0
k=1
exp
i=k
m
t+i
e
knE
0
= e
dnE
0
+
P
d1
i=0
m
t+i
k=1
e
knE
0
+
P
1
i=k
m
t+i
Simplifying the notation yields a general expression for the probability of error for decoding stage
t after a delay d,
P
e,t,d
1 +
N
t
k=1
e
d
j
E
(
)+jm
e
dE
0
()+(1+N
d
)m
(1)
A lower bound can be derived by only considering the worst-case (highest-rate) past adversary.
The dierence between the upper and lower bounds appears as dierent multiplicative coecients
in front of the exponential term. In the asymptotic limit of large decoding delay, the dierences in
this coecient becomes negligible.
The random variables N
t
and N
d
denote the total number of packets encoded in the past [0, t)
and in the future (t, t + d], respectively. This is illustrated in Figure 4 for a sample path of the
system. The random variable d
j
is the backwards delay to the j-th previous packet, which is the
sum of j inter-arrival times, d
j
=
j
i=1
i
.
5
t t+d 0
t-d
4
t-d
3
t-d
1
N
t
N
d
t-d
2
Figure 4. For a stream code observed at time t +d, a packet that was encoded back at time t must be jointly-coded with
all N
t
packets in the past, as well as all N
d
packets in the future.
In order to guarantee an average probability of error no greater than , the transmission (de-
coding) delay d must satisfy the following reliability condition
dE
0
m +N
d
m + log
N
t
j=0
e
d
j
E
0
+jm
log
1
/E
0
and =
E
0
m
and (d
N
t
1
) =
1
E
0
log
N
t
j=0
e
d
j
E
0
+jm
.
These denitions have a useful interpretation: represents a setup cost for each new transmission,
1
represents the payload of a single packet (thus is the maximum throughput of the channel in
packets per slot) and represents the workload due to past encodings in the stream. Solving for
the delay required to achieve the reliability condition,
d
+
1
+
N
d
+(d
N
t
1
) (2)
Therefore, reliable transmission of packet t is achieved when the decoding delay d exceeds a thresh-
old. Examining the three terms on the right-hand side of Equation 2 more carefully: the rst
bracketed term represents the workload of a single packet in isolation; the second term represents
the workload due to packets encoded in the past; the third term represents the workload due to
packets encoded in the future.
Note that if there is no data in the past nor the future (N
t
= N
d
= 0) then the workload on the
right-hand side of Equation 2 is that of a single packet; this is not surprising, as there is only one
branching point at the root of the tree and the stream decoder is comparing a set of completely
disjoint paths, which are essentially codewords in a block code. As more packets are encoded in
the past and future of the stream, the workload (and decoding delay) for packet t increases.
3.3 A Queueing Model
Using the reliability condition of Equation 2, a packet transmission can be described in queueing-
theoretic terms:
1. A new packet enters service with an initial workload of +
1
is the packet payload and is the additional work due to existing packets already in
service.
2. The new packet causes all existing packets in service to incur an additional workload of
1
,
equal to the new packets payload.
3. Every packet in service receives one unit of service per slot. When a packet accumulates
sucient service to satisfy Equation 2, it departs the system.
6
This model is somewhat reminiscent of the queueing-theoretic notion of processor-sharing, but
there are two key dierences here: (1) the joint workload of packets is less than the sum of their
individual workloads, thanks to joint coding eciency, and (2) packets encoded in the future impose
more workload than packets encoded in the past.
This queueing model highlights a fundamental delay tradeo in stream codes: encoding more
packets per slot reduces the queueing delay for future packets, but at the expense of increasing
transmission delay for older packets. The goal of rate control is to nd the optimal balance in this
tradeo.
4 Deterministic Arrivals
We begin with the simplied scenario in which one packet of length m bits arrives into the system
at every xed interval of 1/ slots.
4.1 Block Codes
Dene d
k
as the delay required to reliably transmit k packets in a single transmission: d
k
= +
k
.
If the packet arrival rate is small enough (inter-arrival durations are large enough), then each packet
can be encoded and transmitted before the next arrival. More precisely, if <
1
d
1
, then the optimal
per-packet delay is simply its transmission delay: D
block
= d
1
.
For larger values of arrival rate , the next packet arrives too quickly and must be jointly-coded
with the previous packet. Due to the block-coding constraint, this implies that previous packet(s)
must be held in queue until a sucient number of packets arrive to achieve the necessary joint-
coding eciency. More precisely, for xed , stability requires that the number of packets k per
codeword satisfy: k d
k
. Letting k
packets per codeword. This results in a queueing delay equal to half the
transmission delay, D
block
=
3
2
.
4.2 Stream Codes
As in the block-coding scenario, small values of throughput <
1
d
1
imply that each packet can be
reliably transmitted before the next packet arrives. In this case, stream codes essentially behave
like block codes by transmitting each packet independently with delay equal to D
stream
= d
1
.
For larger values of throughput , it can be shown that the optimal stream code transmits each
packet immediately upon arrival. We call such a policy greedy because it transmits each packet
as quickly as possible, with minimal queueing. More precisely, for
1
d
1
, the reliability condition
in Equation 2 can be evaluated with N
d
= d and d
j
= j/ to yield, D
stream
=
+1
+
1
m
log
1
1
,
where = e
E
0
+m
is the single-stage error probability. The last log term represents the extra
delay due to joint decoding with the past (and turns out to be negligible).
For comparison, the previously derived expressions for delay of optimal stream codes is compared
with that of optimal block codes in Figure 5, for a sample value of channel SNR (A = 1) and
packet length (m = 500). This numerical analysis is representative of a wide range of scenarios,
and provides evidence that an optimal stream code typically achieves 30% smaller delay than the
corresponding optimal block code. Given the fact that the queueing delay is, on average, half
of the transmission delay, this suggests that an optimal stream code nearly achieves the optimal
joint-coding delay given by the information-theoretic limits of channel coding.
7
0.70 0.75 0.80 0.85 0.90 0.95 1.00
0
1
2
3
4
Deterministic Arrivals
Queue Utilization
A
v
e
r
a
g
e
D
e
l
a
y
(
1
0
^
5
s
l
o
t
s
)
packet length (bits) = 500 , error prob. = 1e05 , SNR = 0.1
Optimal Block
Optimal Stream
Figure 5. For deterministic arrivals, the optimal block code (dashed) is compared with the optimal stream code (solid) for
an AWGN channel with SNR A = 1 and packet length m = 500.
One implication of these results is that, for both block codes and stream codes, there is a
minimum number of packets required per encoding in order to maintain stability of the system.
This minimum number of packets is identical for both block codes and stream codes, which is not
surprising considering any channel code must be ultimately limited by the fundamental information-
theoretic constraints of the channel coding theorem. Namely, any channel encoder must ensure
that k/ > d
k
in order to reap sucient benets of joint-coding, and thus guarantee that the
throughput is below the capacity of the channel. Otherwise, it is impossible to achieve the desired
error constraints without the number of undecoded packets in the system growing unbounded.
In the high-throughput regime, >
1
d
1
, stream codes are continuously transmitting data with
no idle periods. This implies that the memory and processing of the system must grow unbounded.
This is an artifact of systems that are perfectly deterministic. For random arrivals, a stable system
has idle periods that occur with innite recurrence. Therefore, the memory and computation is
guaranteed not to grow unbounded as long as the system is stable.
5 Poisson Arrivals
Now assume that packets arrive randomly according to a Poisson process. We derive upper and
lower bounds on the average packet delay, and show that greedy policies achieve nearly-optimal
performance. Furthermore, the results indicate that stream codes achieve strictly smaller delay
than block codes, and that the dierence grows larger in the power-limited (wideband) regime.
The analysis here closely mirrors that of [3] for block codes.
5.1 Lower Bound on Delay
A simple lower bound is obtained by ignoring the queueing delay and lower-bounding the streaming
transmission delay in Equation 2. Using the randomization technique from [3], we assume that there
exists a steady-state average number of packets encoded per slot, Pr(packet encoded in a slot) =
(where > for stability of the queue itself). This results in a random decoding delay in
8
Equation 2, which we can take the expected value of with respect to the encoding distribution. On
the right hand-side, we recognize that N
d
= d > d and ignore the log term to obtain the lower
bound
E[D
stream
]
+ 1
(3)
5.2 Upper Bound on Delay
An upper bound is obtained by analyzing a particular policy. Consider a greedy policy that encodes
each packet immediately upon arrival, with no queueing. The transmission delay in Equation 2 is
analyzed by substituting N
d
= d,
E[D
stream
]
+ 1
+
1
m
E
log
N
t
j=0
e
d
j
E
0
+jm
(4)
The expected value is taken with respect to the distribution of the inter-arrival times,
i
, i = 1, . . . , j
subject to the constraint
j
i=1
i
= d
j
< t and the constraint that the interval [0, t) must be a busy
period. As in the case of deterministic arrivals, this log term turns out to be negligible.
The results of these expressions are plotted for an AWGN channel with dierent values of SNR.
It can be seen that the upper and lower bounds are very close, implying that greedy policies are
nearly optimal for Poisson arrivals. This is similar to the result in [3], which showed that greedy
policies are nearly optimal for rate-adaptive block codes. For comparison, we have independently
obtained delay bounds for block codes,
3
2
E[D
block
]
3
2
( + 1)
(5)
Our results appear to indicate that stream codes achieve strictly smaller delay than block codes,
and that the gap increases for smaller values of SNR.
0.0 0.2 0.4 0.6 0.8 1.0
0
1
0
0
0
0
2
0
0
0
0
3
0
0
0
0
4
0
0
0
0
5
0
0
0
0
utilization factor
d
e
l
a
y
(
s
l
o
t
s
)
Figure 6. For Poisson arrivals, greedy block codes (dashed) are compared with greedy stream codes (solid) for an AWGN
channel with SNR A = 1, A = 0.1 and A = 0.01 (from right to left). The lower bound for stream codes (nely dotted) is
also plotted to show the near-optimality of greedy policies.
9
6 Conclusions and Discussion
We have analyzed the use of streaming channel codes for transmitting bursty data. We developed
a general random coding framework that encompasses both block codes and stream codes, and
derived the error exponent as a function of delay for real-time MAP decoding. The fundamental
quantity that governs the real-time error exponent is the block coding reliability function. This
adds to the analogous results shown by other authors for ML decoding and sequential decoding,
and suggests that Gallagers block coding error exponent determines the fundamental behavior of
error versus delay for coding systems under a wide variety of decoding algorithms.
Under the assumptions that the packet arrival times have either a deterministic or Poisson
distribution, we have:
Obtained bounds on the optimal delay of stream codes, and showed that they appear to be
very tight.
Analyzed the dierence in delay between optimal stream codes and optimal block codes, and
showed that the dierence is more signicant in the power-limited regime.
Showed that greedy rate control policies achieve nearly optimal delay for stream codes.
In the result for greedy policies, Equation 4, the fact that the log term (representing the delay
due to jointly decoding with the past) was negligible is likely due to the rather uniform nature of
deterministic and Poisson arrivals. For more bursty arrivals with high probability of short inter-
arrivals, we suspect that this extra delay term will be more signicant and degrade the performance
of greedy policies.
Finally, note that the common denominator appearing in Equations 3 through 5 is proportional
to E
0
() m, which is Gallagers reliability function. This implies a fundamental relationship
between physical-layer reliability and network-layer delay: for a xed channel coding strategy, the
resulting error exponent determines the gap between the arrival rate and the system capacity, ,
which is a fundamental measure of performance in queueing theory. This is analogous to the well-
known information-theoretic fact that the error exponent represents the gap between code rate and
channel capacity.
References
[1] G.D. Forney, Convolutional codes ii: maximum-likelihood decoding,Information and Control,
25:222-266, 1974.
[2] A. Sahai, Why block length and delay are not the same thing, submitted 2006 to IEEE Trans.
Inform. Theory, preprint arXiv: cs.IT/0610138.
[3] S. Musy and E. Telatar, On the transmission of bursty sources, Proceedings, 2006 Int. Symp.
on Information Theory, Seattle, July 2006.
10