VLSI Architectures For The MAP Algorithm
VLSI Architectures For The MAP Algorithm
Transactions Papers
Abstract—This paper presents several techniques for the very information were not evident. In this paper, we describe tech-
large-scale integration (VLSI) implementation of the maximum a niques for implementing the MAP algorithm that are suitable
posteriori (MAP) algorithm. In general, knowledge about the im- for very large-scale integration (VLSI) implementation.
plementation of the Viterbi algorithm can be applied to the MAP
algorithm. Bounds are derived for the dynamic range of the state The main idea in this paper can be summarized as extending
metrics which enable the designer to optimize the word length. well-known techniques used in implementing the Viterbi al-
The computational kernel of the algorithm is the Add- MAX gorithm to the MAP algorithm. The MAP algorithm can be
operation, which is the Add-Compare-Select operation of the thought of as two Viterbi-like algorithms running in opposite
Viterbi algorithm with an added offset. We show that the critical
path of the algorithm can be reduced if the Add- MAX
operation
directions over the data, albeit with a slightly different compu-
tational kernel.
is reordered into an Offset-Add-Compare-Select operation by
adjusting the location of registers. A general scheduling for the This paper is structured in the following way. Section II is a
MAP algorithm is presented which gives the tradeoffs between brief description of the MAP algorithm in the logarithmic do-
computational complexity, latency, and memory size. Some of main. Section III studies the problem of internal representation
these architectures eliminate the need for RAM blocks with of the state metrics for a fixed-point implementation. Section IV
unusual form factors or can replace the RAM with registers.
These architectures are suited to VLSI implementation of turbo focuses on efficient architectures to realize a forward (or back-
decoders. ward) recursion. The log-likelihood ratio (LLR) calculation is
also briefly described. Section V proposes several schedules for
Index Terms—Forward–backward algorithm, MAP estimation,
turbo codes, very large-scale integration (VLSI), Viterbi decoding. the forward and backward recursions. As the computations of
the forward and the backward recursions are symmetrical in
time (i.e., identical in terms of hardware computation), only the
I. INTRODUCTION forward recursion is described in Sections III and IV.
the logarithmic domain like the Viterbi algorithm, then the mul- The trellis termination condition requires the entire block to be
tiplications become additions and the exponentials disappear. received before the backward recursion can begin.
Addition is transformed according to the rule described in [8]. • Soft-Output Calculation. The soft output, which is called the
Following [9], the additions are replaced using the Jacobi loga- LLR, for each symbol at time is calculated as
rithm
(2) (8)
which is called the operation, to denote that it is essen-
tially a maximum operator adjusted by a correction factor. The
where the first term is over all branches with input label 1, and
second term, a function of the single variable , can be pre-
the second term is over all branches with input label 1.
calculated and stored in a small lookup table (LUT) [9]. The
The MAP algorithm, as described, requires the entire mes-
computational kernel of the MAP algorithm is the Add–
sage to be stored before decoding can start. If the blocks of data
operation, which is analogous, in terms of computation, to the
are large, or the received stream continuous, this restriction can
Add–Compare–Select (ACS) operation in the Viterbi algorithm
be too stringent; “on-the-fly” decoding using a sliding-window
adjusted by an offset known as a correction factor. In what fol-
technique has to be used. Similar to the Viterbi algorithm, we
lows, we will refer to this kernel as ACSO (Add–Compare–Se-
can start the backward recursion from the “all-zero vector”
lect–Offset).
(i.e., all the components of are equal to zero) with data { },
The algorithm is based on the same trellis as the Viterbi al-
from down to . iterations of the backward recur-
gorithm. The algorithm is performed on a block of received
sion allows us to reach a very good approximation of
symbols which corresponds to a trellis with a finite number of
(where is a positive additive factor) [10], [11]. This additive
stages . We will choose the transmitted bit from the set
coefficient does not affect the value of the LLR. In the following,
of { }. Upon receiving the symbol from the additive
we will consider that after cycles of backward recursion, the
white Gaussian noise (AWGN) channel with noise variance ,
resulting state metric vector is the correct one. This property can
we calculate the branch metrics of the transition from state to
be used in a hardware realization to start the effective decoding
state as
of the bits before the end of the message. The parameter is
called the convergence length. For on-the-fly decoding of non-
(3)
systematic convolutional codes as discussed in [10] and [11],
five to ten times the constraint length was found to lead only to
where is the expected symbol along the branch from
marginal signal-to-noise ratio (SNR) losses. For turbo decoders,
state to state . The multiplication by can be done
due to the iterative structure of the computation, an increased
with either a multiplier or an LUT. Note that in the case of a
value of might be required to avoid an error floor. A value of
turbo decoder which uses several iterations of the MAP algo-
is reported in [12] for a recursive systematic code with
rithm, the multiplication by need only be done at the
a constraint length of five. In practice, the final value of has
input to the first MAP algorithm [6].
to be determined via system simulation and analysis of the par-
The algorithm consists of three steps.
ticular decoding system at hand.
• Forward Recursion. The forward state metrics are recur-
sively calculated and stored as
B. Upper Bounds for
All the following upper bounds are derived from the defini-
(4) tion of in (2):
The recursion is initialized by forcing the starting state to state
0 and setting (9)
LUT is three bits for our example. An LUT is the most straight- the differences between the state metrics and not their absolute
forward way to perform this operation [9], [15]. In the general values that are important. Rescaling of the state metrics can be
case, there is a positive value such that performed.
(12)
Thus (15)
are normalized so that
(16)
(13)
In a real system, the are bounded (by the analog–digital con-
According to (13), the operator is linear. Thus, a version) and the standard deviation of the noise is a nonzero
global shift of for all values (or ) would not value, thus, according to (15) and (16), we have the relation
change the value of , since the contribution of , when
put outside the two operators, is cancelled. Thus, it is for all (17)
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on July 29,2024 at 08:44:03 UTC from IEEE Xplore. Restrictions apply.
178 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 51, NO. 2, FEBRUARY 2003
Let us first assume that the all-zero path is sent in the channel Finally, we justify the monotonic increasing of (which
and that all the received symbols have the highest possible relia- achieves the proof) by an intuitive argument. is the like-
bility. The forward recursion is performed on the received sym- lihood ratio between the state that has the highest probability
bols. Let us study, in this case, the ratio of state probabilities (state 0, by construction) and the state with the lowest proba-
between the state with the highest probability and the state with bility. Since every new incoming branch metric confirms state
the lower probability when the forward recursion is performed. 0, is an increasing function of .
Note that this ratio, in the log domain, is associated with the Using the same type of argument, if one, or more, of the first
maximum difference between state metrics, i.e., the dynamic received signals do not have the highest reliability, the resulting
range of the state metric. ratio will be smaller than .
The initial state vector is the Since the code is linear, the result obtained for the all-zero
uniformly distributed vector of length , where is the number sequence is true for all sequences of bits. Thus, the logarithm
of states of the trellis. of the ratio gives the maximum differences of the state
Since by hypothesis all the branch metrics are independent of metric.
time, we can express the forward recursion in an algebraic form 2) Exact Bound in Finite Precision: The exact maximum
using transition matrix difference obtained with a fixed precision architecture is
obtained from (19) starting from the all-zero vector until the
(18) system reaches stationarity, i.e., if all state metrics increase by
the same constant value at each iteration, is then equal
By recursion, we have to .
Note that this algorithm is a generalization of the algorithm
(19) proposed in [22] for the case of Viterbi decoder.
3) Simplification of the Computation of the Branch Metrics
By construction, is a positive irreducible matrix (the coeffi-
for a Convolutional Decoder: For a rate convolutional
cients are positive, and only performs a modification of the
code, is an -dimensional vector with elements
probability distribution of the state metric vector). Thus, ac-
{ } (or { 1, 0, 1} in the case of a punctured code
cording to the Perron–Frobenius theorem [25], can be ex-
where 0 is used for a punctured bit). Using (13), the computa-
pressed, in the basis of eigenvectors , by a diag-
tion of the branch metrics can be extended and simplified
onal matrix with the two properties:
1) , the Perron eigenvector of , is the only eigenvector of
that has all of its components positive;
2) , the Perron eigenvalue associated with is positive (23)
and for all . The first terms are common to all branch metrics, thus, they can
be dropped. The last terms can be decomposed on the dimen-
Since, in the trellis, all the states at time are connected to
sions of the vector. Thus, the modified branch metrics
the states at time , we deduce that all the coefficients of
are
are strictly positive. Using the Perron–Frobenius theorem
for gives an extra property: the Perron eigenvalue of is
strictly greater than its other eigenvalues. From this property,
we deduce that this property is also true for , i.e., (24)
for all . where takes the value of zero for a punctured code
Let be the decomposition of in the basis symbol. This expression can be used to find the exact bound
. The vector can be expressed as of .
4) Example: As an example, let us consider a recursive sys-
(20) tematic encoder with generator polynomials (7, 5). Moreover,
let us assume that the modified branch metrics are
coded using 128 levels, from 15.75 up to 15.75 (the inputs
with , for . are coded between 7.875 up to 7.875, with a step size
Let us call (respectively, ) the maximum of 0.125). We assume that the all-zero path is received with
(minimum) coordinate value of vector , and the ratio the maximum reliability. The resulting state transition diagram
. (with values of modified branch metrics) is given in Fig. 1.
Conjecture: For all , . Table I shows the evolution of the state metrics for the first eight
Proof: First, , since iterations of the forward recursion.
. Second, using (20), we have As shown in Table I, the value of does not increase after
seven iterations. The limit value is the maximum
(21) value of the state metric dynamic range obtained for our ex-
ample. The approximate bound of (14) gives, for this example,
and thus . The bound obtained by the above method is much
more precise and can lead to more efficient hardware realiza-
(22) tions, since the precision of the state metrics is reduced.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on July 29,2024 at 08:44:03 UTC from IEEE Xplore. Restrictions apply.
BOUTILLON et al.: VLSI ARCHITECTURES FOR THE MAP ALGORITHM 179
Fig. 1. State transition of a systematic recursive encoder with polynomials (7,5) and modified branch metric when, for all k , (y = ; y = ) =
0
( 7:875; 0
7;875).
TABLE I
VARIATION OF STATE METRICS AFTER t STAGES ON THE ALL-ZERO PATH
This section is divided into two parts. The first part is a review
of the architecture usually used to compute the forward state
metrics [9]. The second part is an analysis of the position of the
register for the recursion loop in order to increase the speed of
the architecture.
Three different positions of the recursion loop register somehow, memory is needed to store a given type of vector
are shown. The first position is the classical one. It leads to (say, ), until the corresponding vector ( ) is generated.
an ACSO unit. The second position leads to a compare-se- Each state metric vector is composed of 2 state metrics (the
lect-offset-add (CSOA) unit, while the third position leads size of the trellis), each one bits wide. The total number of
to an offset-add-compare-select (OACS) unit. The last one, bits for each vector is large ( ) and thus, the reduction of
the OACS unit shown in Fig. 4, has a smaller critical path the number of state metrics is an important issue for minimizing
compared with the ACSO unit. Briefly, in the case of a ACSO the implementation area.
unit, the critical path is composed of the propagation of the The first part of this section describes the architecture of
carry ( ) in the first adder, the propagation of one full adder a high-speed VLSI circuit for the forward algorithm. Then,
( ) for the comparison (as soon as a result of the sum is through different steps, we propose several organizations of
available, it can be used for the comparison), the time of the computation that reduce the number of vectors that need to be
LUT access ( ) and the multiplexer ( ), and then, once stored by up to a factor of eight. Note that several authors have
more, the time of the propagation of the carry in the offset separately achieved similar results. This point will be discussed
addition. For the OACS unit, the critical path is only composed in the last section.
of the propagation of the carry in the first adder (the addition of
the offset), the propagation of one full adder for the addition of A. Classical Solutions [( ) and
the branch metric, another propagation of one full adder for the ( )] Architecture
comparison, and then, the maximum of the LUT access and the The first real-time MAP VLSI architectures in the literature
multiplexer. Thus, the critical path is decreased from are described in [11], [13], and [24]. The architecture of [11]
(25) and [13] is based on three recursion units (RUs), two used for
the backward recursion ( and ), and one forward
to unit ( ). Each RU contains operators working
(26) in parallel so that one recursion can be performed in one clock
cycle. The two backward RUs play a role similar to the two
The decrease of the critical path is paid for by an additional
trace-back units in the Viterbi decoder of [26].
register needed to store the offset value between two iterations.
Let us use the same graphical representation as in [11],
The area-speed tradeoff is determined by the specification of
[27], and [28] to explain the organization of the computation.
the application. As mentioned by one of the paper’s reviewers,
In Fig. 5, the horizontal axis represents time, with units of
a Carry–Save–Adder (CSA) architecture can also be efficiently
a symbol period. The vertical axis represents the received
used in this case [23].
symbol. Thus, the curve ( ) shows that, at time ,
The last step of the MAP algorithm is the computation of the the symbol { } becomes available. Let us describe how the
LLR value of the decoded bit. Parallel architectures for the LLR symbols are decoded (segment I of Fig. 5).
computation can be derived directly from (8). The first stage From to , performs recursions, starting
is composed of 2 adders. The second stage is composed of from down to (segment II of Fig. 5). This process
two 2 operand operators. Finally, the last operation is is initialized with the all-zero state vector , but after
the subtraction. A classical tree architecture can be used for the iterations, as noted in [11], the convergence is reached and
hardware realization of the operand operators. is then obtained. During those same cycles, generates
the vectors (segment III of Fig. 5). The vectors
V. GENERAL ARCHITECTURE are stored in the state vector memory (SVM)
Each element of the MAP architecture has now been de- until they are needed for the LLR computation (grey area
scribed. The last part of our survey on VLSI architectures of Fig. 5). Then, between and , starts
for the MAP algorithm is the overall organization of the from state to compute down to (segment IV
computation. Briefly speaking, the generation of the LLR of Fig. 5). At each cycle, the vector corresponding to the
values requires both and values, which are generated computed is extracted from the memory in order to compute
in chronologically reverse order. The first implication is that, . Finally, between and , the data are
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on July 29,2024 at 08:44:03 UTC from IEEE Xplore. Restrictions apply.
BOUTILLON et al.: VLSI ARCHITECTURES FOR THE MAP ALGORITHM 181
Fig. 5. Graphical representation of a real-time MAP architecture. Fig. 6. Graphical representation of the (n = 1; n = 2; M )
architecture.
Fig. 7. Graphical representation of the (n = 1; n = 3; M ) architecture. Fig. 8. Graphical representation of the (n = 2; n = 2; M ) architecture.
Fig. 10. Graphical representation of the (n = 1; n = 3; M ;Pt ) Fig. 12. Graphical representation of the (p = 2; n = 1; n = 1; M ;
architecture. Pt ) architecture.
TABLE II
PERFORMANCE OF THE DIFFERENT ARCHITECTURES
to , the second, in reverse order, from data down to [2] R. W. Chang and J. C. Hancock, “On receiver structures for channels
. Moreover, Worm et al. [34] extend the architecture of Sec- having memory,” IEEE Trans. Inform. Theory, vol. IT-12, pp. 463–468,
Oct. 1966.
tions V-A and -B for a massively parallel architecture where [3] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear
several processes are done in parallel. With this massive paral- codes for minimizing symbol error rate,” IEEE Trans. Inform. Theory,
lelism, very high throughput (up to 4 Gbit/s) can be achieved. vol. IT-20, pp. 284–287, Mar. 1974.
[4] A. J. Viterbi, “Error bounds for convolutional codes and an asymptoti-
The pointer idea described in Section V-E has been proposed cally optimum decoding algorithm,” IEEE Trans. Inform. Theory, vol.
independently by Dingninou et al. in the case of a turbo decoder IT-13, pp. 260–269, Apr. 1967.
in [35] and [36]. In this “sliding window next iteration initializa- [5] G. D. Forney, Jr., “The Viterbi algorithm,” Proc. IEEE, vol. 61, pp.
268–278, Mar. 1973.
tion” method, the pointer generated by the backward recursion [6] J. Hagenauer, E. Offer, and L. Papke, “Iterative decoding of binary
at iteration is used to initialize the backward recursion at itera- block and convolutional codes,” IEEE Trans. Inform. Theory, vol. 42,
tion . As a result, no further backward convergence process pp. 429–445, Mar. 1996.
[7] A. J. Viterbi, “An intuitive justification and a simplified implementation
is needed and area and memory are saved at the cost of a slight of the MAP decoder for convolutional codes,” IEEE J. Select. Areas
degradation of the decoder performance. Note that Dielissen et Commun., vol. 16, pp. 260–264, Feb. 1998.
al. have improved this method by an efficient encoding of the [8] N. G. Kingsbury and P. J. W. Rayner, “Digital filtering using logarithmic
arithmetic,” Electron. Lett., vol. 7, no. 2, pp. 56–58, Jan. 1971.
pointer [37]. [9] J. A. Erfanian and S. Pasupathy, “Low-complexity parallel-structure
Finally, an example of an architecture using a ratio of two be- symbol-by-symbol detection for ISI channels,” in Proc. IEEE Pacific
tween clock frequency and symbol frequency (see Section V-F) Rim Conf. Communications, Computers and Signal Processing, June
1–2, 1989, pp. 350–353.
is partially used in [38]. [10] H. Dawid, Algorithms and VLSI Architecture for Soft Output Maximum
a Posteriori Convolutional Decoding (in German). Aachen, Germany:
VI. CONCLUSION Shaker, 1996, p. 72.
[11] H. Dawid and H. Meyr, “Real-time algorithms and VLSI architectures
We have presented a survey of techniques for VLSI imple- for soft output MAP convolutional decoding,” in Proc. Personal, In-
door and Mobile Radio Communications, PIMRC’95, vol. 1, 1995, pp.
mentation of the MAP algorithm. As a general conclusion, the 193–197.
well-known results from the Viterbi algorithm literature can be [12] S. S. Pietrobon, “Efficient implementation of continuous MAP decoders
applied to the MAP algorithm. The computational kernel of the and a new synchronization technique for turbo decoders,” in Proc. Int.
Symp. Information Theory and Its Applications, Victoria, BC, Canada,
MAP algorithm is very similar to that of the ACS of the Viterbi Sept. 1996, pp. 586–589.
algorithm with an added offset. The analysis shows that it is [13] S. S. Pietrobon and S. A. Barbulescu, “A simplification of the modified
better to add the offset first and then do the ACS operation in Bahl algorithm for systematic convolutional codes,” in Proc. Int. Symp.
Information Theory and Its Applications, Sydney, Australia, Nov. 1994,
order to reduce the critical path of the circuit (OACS). A gen- pp. 1073–1077.
eral architecture for the MAP algorithm was developed which [14] S. S. Pietrobon, “Implementation and performance of a turbo/MAP de-
exposes some interesting tradeoffs for VLSI implementation. coder,” Int. J. Satellite Commun., vol. 16, pp. 23–46, Jan.-Feb. 1998.
[15] P. Robertson, E. Villebrun, and P. Hoeher, “A comparison of optimal and
Most importantly, we have presented architectures which elimi- sub-optimal MAP decoding algorithms operating in the log domain,” in
nate the need for RAMs with a narrow aspect ratio and possibly Proc. IEEE Int. Conf. Communications (ICC ’95), 1995, pp. 1009–1013.
allow the RAM to be replaced with registers. An architecture [16] C. B. Shung, P. H. Siegel, G. Ungerboeck, and H. K. Thapar, “VLSI
architectures for metric normalization in the Viterbi algorithm,” in Proc.
which shares a memory bank between two MAP decoders en- IEEE Int. Conf. Communications (ICC ’90), vol. 4, Atlanta, GA, Apr.
ables efficient implementation of turbo decoders. 16–19, 1990, pp. 1723–1728.
[17] P. Tortelier and D. Duponteil, “Dynamique des métriques dans l’al-
gorithme de Viterbi,” Annales des Télécommun., vol. 45, no. 7-8, pp.
ACKNOWLEDGMENT 377–383, 1990.
The authors would like to thank F. Kschischang and O. [18] G. Masera, G. Piccinini, M. R. Roch, and M. Zamboni, “VLSI archi-
tectures for turbo codes,” IEEE Trans. VLSI Syst., vol. 7, pp. 369–379,
Pourquier for their help on the Perron–Frobenius theorem. Sept. 1999.
[19] A. Worm, H. Michel, F. Gilbert, G. Kreiselmaier, M. Thul, and N. Wehn,
REFERENCES “Advanced implementation issues of turbo decoders,” in Proc. 2nd Int.
Symp. on Turbo Codes, Brest, France, Sept. 2000, pp. 351–354.
[1] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit [20] G. Montorsi and S. Benedetto, “Design of fixed-point iterative decoders
error-correcting coding and decoding: Turbo codes,” in Proc. IEEE Int. for concatenated codes with interleavers,” IEEE J. Select. Areas
Conf. Communications (ICC’93), May 1993, pp. 1064–1070. Commun., vol. 19, pp. 871–882, May 2001.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on July 29,2024 at 08:44:03 UTC from IEEE Xplore. Restrictions apply.
BOUTILLON et al.: VLSI ARCHITECTURES FOR THE MAP ALGORITHM 185
[21] A. P. Hekstra, “An alternative to metric rescaling in Viterbi decoders,” Warren J. Gross (S’92) was born in Montreal, QC,
IEEE Trans. Commun., vol. 37, pp. 1220–1222, Nov. 1989. Canada, in 1972. He received the B.A.Sc. degree
[22] P. H. Siegel, C. B. Shung, T. D. Howell, and H. K. Thapar, “Exact bounds in electrical engineering from the University of
for Viterbi detector path metric differences,” in Proc. Int. Conf. Acous- Waterloo, Waterloo, ON, Canada, in 1996 and the
tics, Speech, and Signal Processing, vol. 2, 1991, pp. 1093–1096. M.A.Sc. degree in 1999 from the University of
[23] G. Fettweis and H. Meyr, “Parallel Viterbi algorithm implementation: Toronto, Toronto, ON, Canada, where he is currently
Breaking the ACS bottleneck,” IEEE Trans. Commun., vol. 37, pp. working toward the Ph.D. degree.
785–790, Aug. 1989. From 1993 to 1996, he worked in the area of space-
[24] H. Dawid, G. Gehnen, and H. Meyr, “Map channel decoding: Algorithm based machine vision at Neptec Design Group, Ot-
and VLSI architecture,” VLSI Signal Processing VI, pp. 141–149, 1993. tawa, ON, Canada. His research interests are in the
[25] F. R. Gantmacher, Matrix Theory. New York: Chelsea, 1960, vol. II. areas of VLSI architectures for digital communica-
[26] 20 Mbps convolutional encoder Viterbi decoder STEL-2020: Stanford tions algorithms and digital signal processing, coding theory, and computer ar-
Telecom, 1989. chitecture.
[27] E. Boutillon and N. Demassieux, “A generalized precompiling scheme Mr. Gross received the Natural Sciences and Engineering Research Council
for surviving path memory management in Viterbi decoders,” in Proc. of Canada postgraduate scholarship, the Walter Sumner fellowship and the Gov-
ISCAS’93, vol. 3, New Orleans, LA, May 1993, pp. 1579–1582. ernment of Ontario/Ricoh Canada Graduate Scholarship in Science and Tech-
[28] E. Boutillon, “Architecture et implantation VLSI de techniques de mod- nology.
ulations codées performantes adaptées au canal de Rayleigh,” Ph.D. dis-
sertation, ENST, Paris, France, 1995.
[29] J. Hagenauer and P. Hoeher, “A Viterbi algorithm with soft-decision out-
puts and its applications,” in Proc. IEEE Globecom Conf., Nov. 1989, pp.
1680–1686. P. Glenn Gulak (S’82–M’83–SM’96) received the
[30] C. Douillard, M. Jézéquel, C. Berrou, N. Bengarth, J. Tousch, and N. Ph.D. degree from the University of Manitoba, Win-
Pham, “The turbo code standard for DVB-RCS,” in Proc. 2nd Int. Symp. nipeg, MB, Canada.
on Turbo Codes, Brest, France, Sept. 2000, pp. 535–538. From 1985 to 1988, he was a Research Associate
[31] C. Berrou and M. Jézéquel, “Nonbinary convolutional codes for turbo with the Information Systems Laboratory and the
coding,” Electron. Lett., vol. 35, no. 1, pp. 39–40, Jan. 1999. Computer Systems Laboratory, Stanford University,
[32] C. Schurgers, F. Catthoor, and M. Engels, “Energy efficient data transfer Stanford, CA. Currently, he is a Professor with the
and storage organization for a MAP turbo decoder module,” in Proc. Department of Electrical and Computer Engineering,
1999 Int. Symp. Low Power Electronics and Design, San Diego, CA, University of Toronto, Toronto, ON, Canada, and
Aug. 1999, pp. 76–81. holds the L. Lau Chair in Electrical and Computer
[33] , “Memory optimization of MAP turbo decoder algorithms,” IEEE Engineering. His research interests are in the areas
Trans. VLSI Syst., vol. 9, pp. 305–312, Apr. 2001. of memory design, circuits, algorithms, and VLSI architectures for digital
[34] A. Worm, H. Lamm, and N. Wehn, “VLSI architectures for high-speed communications.
MAP decoders,” in Proc. 14th Int. Conf. VLSI Design, 2001, pp. Dr. Gulak received a Natural Sciences and Engineering Research Council of
446–453. Canada Postgraduate Scholarship and several teaching awards for undergrad-
[35] A. Dingninou, “Implémentation de turbo code pour trame courtes,” uate courses taught in both the Department of Computer Science and the Depart-
Ph.D. dissertation, Univ. de Bretagne Occidentale, Bretagne, France, ment of Electrical and Computer Engineering, University of Toronto, Toronto,
2001. ON, Canada. He served as the Technical Program Chair for ISSCC 2001. He is
[36] A. Dingninou, F. Rafaoui, and C. Berrou, “Organization de la mémoire a registered professional engineer in the province of Ontario.
dans un turbo décodeur utilisant l’algorithme SUB-MAP,” in Proc.
Gretsi, Gretsi, France, Sept. 1999, pp. 71–74.
[37] J. Dielissen and J. Huisken, “State vector reduction for initialization of
sliding windows MAP,” in Proc. 2nd Int. Symp. Turbo Codes, Brest,
France, Sept. 2000, pp. 387–390.
[38] A. Raghupathy and K. J. R. Liu, “VLSI implementation considerations
for turbo decoding using a low-latency log-MAP,” in Proc. IEEE Int.
Conf. Consumer Electronics, ICCE, June 1999, pp. 182–183.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on July 29,2024 at 08:44:03 UTC from IEEE Xplore. Restrictions apply.