0% found this document useful (0 votes)
101 views6 pages

Navid Lashkarian, Signal Processing Division, Xilinx Inc., San Jose, USA, Chris Dick, Signal Processing Division, Xilinx Inc., San Jose, USA

Uploaded by

Syed M. Yaseen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views6 pages

Navid Lashkarian, Signal Processing Division, Xilinx Inc., San Jose, USA, Chris Dick, Signal Processing Division, Xilinx Inc., San Jose, USA

Uploaded by

Syed M. Yaseen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

FPGA IMPLEMENTATION OF DIGITAL PREDISTORTION LINEARIZERS

FOR WIDEBAND POWER AMPLIFIERS

Navid Lashkarian, Signal Processing Division, Xilinx Inc., San Jose, USA,
[email protected],

Chris Dick, Signal Processing Division, Xilinx Inc., San Jose, USA,
[email protected]

ABSTRACT

This paper reports on the FPGA implementation of a


MODEM Pre-Distorter Power Amplifier
Volterra series PA pre-distorter. The implementation
of the pre-distorter and the indirect learning architec-
ture for initializing the system is described. We sup-
ply insight into the implementation of the adaptive Demodulator
process itself and how the pre-distorter can exploit
new generation heterogeneous FPGAs that provide a
massively parallel compute fabric for demanding real- Figure 1: Adaptive digital predistortion.
time tasks and an embedded processor for processes
that have softer schedules. A recent generation vi-
sual programming design flow has been used for the
implementation. The paper comments on the design pared to an uncompensated transmit signal.
productivity and efficiency aspects of the final FPGA
implementation using this development environment. Traditionally, digital predistortion was imple-
mented using a lookup table (LUT) approach. The
1. INTRODUCTION LUT employed in this approach is representative of the
inverse of characteristic of the amplifier [2]. While this
Bandwidth efficiency and transmission power effi- approach has widespread application in narrowband
ciency are often conflicting criteria in digital communi- power amplifier (memoryless nonlinear systems) lin-
cation systems. One usually has to be traded-off with earization, its effectiveness is hampered by the mem-
the other according to the system requirements. In ory effects in wideband power amplifiers, such as those
wireless applications, the cost of bandwidth accounts used in multi-carrier Universal Mobile Telecommuni-
for a considerable portion of overall cost, and it is cations System (UMTS) and CDMA2000 systems.
therefore important to accommodate as many users
in the system within the link frequency budget. This In this paper, we address the design of a lin-
requirement imposes a heavy constraint on the power earizer based on an adaptive truncated Volterra se-
efficiency of the amplifier, contributing to nonlinear ries (TVS) approach. TVS systems have become a
behavior in this part of the transmitter [1]. Nonlinear very popular tool in adaptive nonlinear signal process-
radio frequency (RF) power amplifiers (PA) generate ing [3]. However, their real-time implementation has
intermodulation (IM) distortion as adjacent channel been restricted by the computational complexity as-
interference for many modulation formats. Therefore, sociated with the filtering and adaptive mechanisms.
the design of linearizers has become a key technology Field programmable gate arrays (FPGAs) are an at-
in wideband mobile communication transceivers. tractive option for realization of these highly com-
One solution is the linearization of the amplifier by plex signal processing functions for reasons of perfor-
means of predistorter as shown in Figure 1. The digital mance, power consumption and configurability. We
predistortion (DPD) linearizer creates a version of the propose an efficient and robust architecture for the
desired modulation making use of the feedback mea- linearizer based on truncated Volterra filters and pro-
surements of the actual amplifier output. The result- vide a simulation model of the system within the Xil-
ing signal, when passed through the nonlinear power inx System Generator for DSPTM [4] design flow. The
amplifier creates a signal in which the power spectral implementation achieves up to 50 dB spectral suppres-
density has significantly lower spectral leakage com- sion in neighboring frequency bands.

Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
where K is the nonlinearity order and Q represents
z(n) Power
x(n) Predistorter y(n) the memory length of the power amplifier. In order to
Amplifier
reduce the implementation complexity of the predis-
torter while maintaining acceptable performance, only
e(n) the odd-order terms in the nonlinearity are included
1
G in the model. This compromise reduces the complex-
^z(n) ity of the predistorter by approximately 40% at the
Predistorter expense of 3 to 5 dB spectral regrowth. A detailed
Training investigation of the benefits of even-order terms in the
baseband model is presented in [7].

Figure 2: Baseband equivalent model of the DPD cir- 2.2. Indirect Learning
cuit. G represents the gain of the power amplifier.
Initialization of the DPD linearizer is performed
2. PREDISTORTER ARCHITECTURE using optimum filtering, which is done as an off-line
computation in our DPD implementation. Adaptive
Our approach to nonlinear predistortion is based on filter coefficient estimation can be considered a lin-
the method proposed by Eun & Powers [5]. In this ear optimization task. Any of the common estima-
approach, two identical truncated Volterra systems tion methods - Least Square Estimation [8], minimum
are used for training and predistortion. Figure 2 de- mean squared error (MMSE) [8] or Wiener Filtering,
picts the block diagram of the equivalent baseband
Kalman [8] or recursive least squares (RLS) filter-
model of the digital predistortion network. The ob-
ing [8] - can be used.
jective of the linearizer is to find a transformation
We note that while all of the above methods try
of the signal (z̃(n) = V (x(n))) that in combination
to solve one optimization problem, that is the linear
with the nonlinear amplifier (responsible for the dis-
parameter estimation, the stationary point obtained
tortion) will result in an identity system that produces
from using these methods might be quite different.
the signal of interest without distortion at the out-
This is mainly due to the fact that the error criterion
put of the power amplifier (y(n) ' x(n)). The main
for the approaches are different, causing a different
challenge of this approach is to track and identify the
profile for the error surface.
time varying characteristics of the amplifier. To ad-
dress this task a stochastic gradient adaptation mech-
anism is employed. The adaptation of the truncated 2.3. Tracking and Direct Learning
Volterra system is a two stage process. During initial-
ization, the input and output signals of the power am-
The inverse of the nonlinear amplifier is adaptively
plifier are probed and the Volterra filter coefficients are
tracked using a stochastic gradient method. Least
adapted off-line using Recursive Least Squares (RLS
mean squares (LMS) adaptive filters are known to
or Kalman Filtering) estimation. This process is also
have a slow convergence rate. However, since the
known as initialization through indirect learning. Once
power amplifier characteristics vary slowly as a func-
the adaptive filter is initialized at an optimum station-
tion of time, the LMS approach is a reasonable choice
ary point, a stochastic adaptive mechanism is used to
for performing parameter tracking.
track the time-varying characteristics of the nonlinear
amplifier. At each iteration of the stochastic gradient algo-
rithm, an update for the unknown vector is obtained
2.1. Memory Polynomial Predistorter from

We use the memory polynomial model (Eq. 1) for Wn+1 = Wn + µ × en × X∗n (2)
the predistorter block as described in [6]
where the error vector is defined as
K Q
X X
z[n] = akq x(n − q) ∗ |x(n − q)|k−1 (1) en = z(n) − Wn × Xn (3)
q=0
k=1
k even X is the vector containing all the necessary nonlinear

Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
products of the input sample and can be expressed as
  x(n) z-1 z-1
y(n)
 y(n) ∗ |y(n)|2  a10 a11 a12
  M1
 4 
 y(n) ∗ |y(n)|  z(n)
 y(n − 1) 

∆  
2 
Xn =  y(n − 1) ∗ |y(n − 1)|  (4) M2
 y(n − 1) ∗ |y(n − 1)|4  z-1 z-1
 
 y(n − 2) 
  a30 a31 a32
 y(n − 2) ∗ |y(n − 2)|2 
y(n − 2) ∗ |y(n − 2)|4 M3

A nonlinear combiner is used to form all the necessary z-1 z-1


powers of the training signal as needed by the memory
polynomial filter. a50 a51 a52

3. DPD FPGA IMPLEMENTATION


The DPD linearizer implemented in this case study is Figure 3: Direct form realization of the truncated
defined in Eq. 5. This is a slight modification to the Volterra series linearizer. This implementation pro-
procedure in Eq. 1 and includes signal phase informa- vides minimum multiplicative complexity.
tion.
K Q
X X Data d(n) Pulse Shaping x(n) z(n) Power
z[n] = akq x(n − q)k (5) Predistorter y(n)
Generator Filter Amplifier
k=1,k odd q=0
Psuedo Random e(n)
Sequence
In the DPD System Generator reference design the ^
z(n)
linearity order was selected as K = 5 with a PA mem- Predistorter
Training
ory duration Q = 2. Fully expanded, the linearized
signal z(n) is expressed as

Figure 4: DPD simulation model comprising data gen-


X
5 X
2
eration, pulse shaping filter, the DPD circuit itself, PA
z(n) = akq x(n − q)k (6) model, and adaptive learning sub-system. The predis-
q=0
k=1 torter training is based on the LMS algorithm.
k odd
= a10x(n) + a11x(n − 1) + a12 x(n − 2) (7) rate is a function of the system performance require-
+a30 x3(n) + a31x3(n − 1) + a32x3(n − 2) (8) ments. These requirements would typically include
+a50 x5(n) + a51x5(n − 1) + a52x5(n − 2) (9) considerations that account for variations in the PA
characteristics that are functions of time and temper-
Only odd terms are included in the model. Eq. 7 is ature, in addition to electro-thermal effects that influ-
recognized as a standard inner-product, with Eq. 8 ence the PA effective memory. In our first implemen-
and 9 contributing a weighted combination of third- tation the Volterra filter coefficients are updated at
and fifth-order non-linearities respectively. each simulation time-step. Given the relatively long
The truncated Volterra adaptive filter is imple- time-constant associated with changes in the PA char-
mented using the minimum-multiplier direct-form re- acteristics this high rate of adaption is probably to
alization shown in Figure 3. rapid. However, the flexible nature of the FPGA pro-
The simulation setup for the DPD system is shown vides for any amount of hardware resource sharing to
in Figure 4. A pseudo-random data sequence is first achieve a target performance/cost objective.
pulse-shaped and then processed by the predistorter To produce a DPD simulation, a simulation model
before being presented to the power amplifier model. for the PA is required. The Wiener model [3] shown
The coefficients in the DPD linearizer are updated us- in Figure 5 will be employed in the simulation. This
ing an LMS-based adaptive process. The apdation system consists of a linear time invariant (LTI) sub-

Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
3.1. Simulation Model
v(n)
x(n) H(z) F(v) y(n)
When the System Generator simulation is opened
in the Simulink environment a pre-load function is
Figure 5: Wiener system PA model employed in the called that computes an initial estimate of the sys-
DPD simulation. tem coefficients using RLS estimation. The optimum
coefficients resulting from the estimate are
Effectiveness of DPD in Suppressing Specral Growth
−20 0.0003 - j0.0066
Complex Baseband Signal 0.0005 + j0.0120
Amp. output without DPD
−40 Amp output with DPD -0.0036 + j0.0005
1.1632 - j0.0936
−60
0.0890 + j0.3610
-0.0554 + j0.0254
−80
-0.6712 + j0.0543
dB

−100
-0.0525 - j0.2041
0.0295 - j0.0144
−120
In order to demonstrate adaptive tracking one of the
−140
coefficients is deliberately perturbed - the fourth co-
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
efficient (1.1632 − j0.0936) is scaled by a factor of 3.
Normalized Frequency
The modified coefficient vector is used as the initial
condition for the adaptive processor.
Figure 6: Effectiveness of DPD in suppressing spectral The least mean squares processor in the system
regrowth. The figure shows an overlay of the baseband adaptively updates the coefficients, iteratively forcing
signal spectrum, the PA output without linearization the perturbed coefficient back to its optimum value.
and with linearization. Figure 7 shows the trajectory of the real component
of the modified element (fourth entry of the vector)
as a function of the LMS update iteration number.
system, H(z), in cascade with a memoryless nonlin-
The figure provides an overlay of the Matlab double-
earity F (ν). Ding [6] provides expressions for
precision floating-point simulation and the fixed-point
arithmetic FPGA implementation. The floating-point
1 + 0.5z −2
H(z) = (10) and fixed-point simulations are in close agreement.
1 − 0.2z −1 The residual mean-squared error (MSE) of the FPGA
based LMS filtering is plotted in Figure 8.
and
The System Generator predistorter reference imple-
X
K
mentation employs a fully parallel adaptive processor
y(n) = bk v(n)|v(n)|k−1 (11)
for the adaptive learning sub-system. This means that
k=1,k odd
the 9 complex coefficients in Eq. 6 are all updated at
where v(n) and y(n) are the input and output of the the output sample rate. This is a very high-frequency
memoryless non-linearity F (ν). Ding [6] provides val- update rate and may be too rapid for many applica-
ues for the coefficients bk based on measurements from tions. It is straightforward to modify the adaptive pro-
a class AB PA as cessor to employ a decimated update. In this scenario
the coefficients would be updated at a lower frequency
b1 = 1.0108 + j0.0858 (12) than the Volterra filter processing rate. Using a dec-
imated update permits functional unit folding in the
b3 = 0.0879 − j0.1583 (13)
adaptive processor so that the FPGA footprint can
b5 = −1.0992 − j0.8891 (14) be reduced, i.e., both the number of logic slices and
embedded multiplier can be minimized.
Figure 6 shows the effectiveness of the baseband When the simulation completes a post simulation
predistortion in suppressing spectral regrowth. As stop function is executed that plots the linearizer input
shown in the figure, DPD can effectively reduce spec- function overlaid with the predistorter output, gener-
tral regrowth by 40 dB. ating a plot similar to Figure 6.

Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
Trajectory of the Perturbed Volterra Kernel Based on LMS Direct Learning
3.5
Table 1: FPGA Resource Utilization for DPD and
LMS Adaptive Learning. The Volterra filter coeffi-
3 cients are updated at the full output data rate using
a fully parallel LMS processor. The design is easily
FPGA Fixed Point
modified to accommodate a decimated update using a
Perturbated Coefficient

MATLAB Floating Point

2.5
reduced number of embedded multipliers.

2
Volterra Filter LMS Total
Slices 2032 3483 5515
Block Memory 0 0 0
1.5
Multipliers 48 106 154
0 0.5 1 1.5 2 2.5 3 3.5 4
Iteration Number 4
x 10

Figure 7: Volterra Kernel tracking based on LMS in- The computation rate of the predistorter alone is
direct learning. The figure shows the evolution of the 212e6 × 48 = 10.176e9. This 10 Giga-op process-
fourth coefficient in the model. The floating-point and ing rate exceeds the compute capacity of other pro-
FPGA fixed-point simulation results are overlaid in grammable DSP technologies. The FPGA implemen-
the figure. tation easily supports the processing requirements,
while providing the system architect with a flexible
Residual MSE Error of LMS Tracking (FPGA Implementation) solution that can be easily modified based on evolving
0 specifications or future system requirements.
−10
4. ADAPTIVE COEFFICIENT UPDATE
−20 USING EMBEDDED PROCESSING
−30 In many typical applications the PA characteristics do
dB

not change rapidly with time. The PA characteristics


−40
vary as a function of temperature drift and component
−50 aging, parameters that have long time-constants.
The previous section described a predistorter de-
−60
sign that employed a dedicated customized datapath
−70 constructed using the logic fabric and embedded mul-
0 0.5 1 1.5 2
Iteration Number
2.5 3 3.5 4
x 10
4
tipliers, to implement the DPD coefficient update. De-
pending on system requirements, and in particular the
Figure 8: Residual MSE of the LMS coefficient update. required rate of coefficient adaption, an FPGA em-
bedded processor could be employed to realize the up-
date. In this approach a buffer of the samples y(n) and
Table 1 provides the FPGA resource utilization for z(n) in Figure 2 are prepared and processed offline.
the complete design employing a fully parallel DPD This lowers the overall implementation requirements
coefficient update operating at the output sample rate. of the system. State-of-the-art FPGAs like Virtex-II
In applications where the adaptive learning is not Pro [9] and Virtex-4 [10] include embedded Power PC
needed (due to the time invariant characteristics of 405 (PPC405) processing cores. The adaptive algo-
the amplifier), the DPD can be implemented using rithm can be coded in C and executed on the PPC405.
only 2032 FPGA slices and 48 embedded multipliers. When a new coefficient vector is available the PPC405
The predistorter sub-system in the design operates can transfer this data to the coefficient memory in the
at a clock frequency of 212 MHz in a Virtex-II Pro Volterra filter. The PPC could also be used for other
XC2VP50FF1152-7 FPGA (System Generator v 6.2, tasks in the system, in addition to periodically servic-
ISE 6.3.03i, Speedfile, Production 1.86 2004-05-01). ing the DPD processor.
There is adequate resources available in this device The Xilinx MicroblazeTM soft processor core [11]
to support the predistorter along with other system could also be used for implementing the adaptive up-
functions such as up-conversion and crest factor re- date. Microblaze is supported by the Virtex-II Pro
duction. and Virtex-4 platforms, in addition to architectures

Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
like Virtex-II [12] and the low-cost Spartan-3 [13] fam- 8 , pp. 1461-1466, Aug. 1999.
ily that do not include embedded PPC405 processors.
[2] J.K Cavers, “Amplifier linearization using a digi-
tal predistorter with fast adaptation and low mem-
5. CONCLUSION ory requirements,” IEEE Transactions on Vehicu-
In this paper we have provided an architecture study lar Technology, Vol. 39 , Issue 4 , pp. 374-382, Nov.
for the FPGA implementation of a wideband digi- 1990.
tal baseband predistortion processor. As communica- [3] V. J. Mathews, G. Sicuranza, Polynomial Signal
tion infrastructure providers operating in the UMTS, Processing John Wiley & Sons, 2000.
CDMA2000 and military radio application spaces con-
tinue to increase transmission bandwidth and sup- [4] System Generator for DSP, Xilinx Inc., Xilinx Inc.,
port multi-carrier systems, traditional look-up table http://www.xilinx.com/xlnx/xebiz/designResources
approaches to power amplifier linearization are no /ip product details.jsp?key=
longer appropriate, and alternative methods that sup- dr dt system generator
port wideband signals are required. Linearization [5] C. Eun, E. Powers, “A new Volterra Predistorter
techniques based on non-linear signal processing tech- Based on the Indirect Learning Architecture,”
niques have been studied for some time, but their IEEE trans. on Signal Processing, Vol. 45, No. 1,
practical deployment has been restricted due to the January 1997.
limited processing capabilites of traditional config-
urable signal processors. While an application spe- [6] L. Ding et. al. , “A Robust Digital Baseband Pre-
cific integrated circuit (ASIC) approach could meet distorter Constructed Using Memory Polynomi-
the processing requirements, non-recurring engineer- als,” IEEE trans. on comm., Vol. 52, No. 1, Jan-
ing (NRE) costs, high mask-set costs, lengthy devel- uary 2004.
opment schedules and lack of flexibility have limited [7] L. Ding et. al. , “Effects of Even-Order Nonlinear
the ASIC implementation of sophisticated PA lineariz- Terms on Power Amplifier Modeling and Predis-
ers. tortion Linearization,” IEEE Transactions on Ve-
The highly parallel nature of Xilinx FPGAs eas- hicular Technology, Vol. 53 , Issue 1, pp. 156-162,
ily support the processing requirements of complex Jan. 2004.
non-linear signal processing algorithms. The System
Generator design described in this paper implements [8] S. Haykin, Adaptive Filter Theory, Prentice Hall,
a baseband linearizer that includes a 5th order non- New Jersey, 1996.
linearity and a 2nd order term that accounts for PA [9] Virtex-II Pro Datasheet, Xilinx Inc.,
memory. These design parameters are easily modi- http://www.xilinx.com/xlnx
fied to reflect the characteristics of any given power /xweb/xil publications display.jsp?category=Publications
amplifier. /FPGA+Device+Families/Virtex-
The LMS coefficient update procedure used in the II+Pro&iLanguageID=1
implementation is a fully parallel design that updates
all of the linearizer coefficients at the output sam- [10] Xilinx Virtex-4 Revolutionizes Platform FPGAs,
ple rate. Depending on the system requirements, the Xilinx Inc., http://www.xilinx.com
adaptive processor could be modified to include func- /company/press/kits/v4 arch/v4 finalwhitepaper4.pdf
tional unit time-sharing that would reduce the FPGA [11] Microblaze Soft Processor Core, Xilinx Inc.,
footprint in return for a decimated coefficient update http://www.xilinx.com/xlnx/xebiz/designResources
rate. The coefficient update procedure could entirely, /ip product details.jsp?sSecondaryNavPick
or partially, be relocated to embedded software run- =Design+Tools&key=micro blaze
ning on either a Microblaze soft processor core or
embedded PPC405 hard core in the Virtex-II Pro or [12] Virtex-II Datasheet, Xilinx Inc.,
Virtex-4 FPGA families. http://www.xilinx.com/xlnx/xweb/xil publications
display.jsp?category=/Data+Sheets/FPGA+Device
+Families/Virtex-II&iLanguageID=1
References
[13] Spartan-3 Datasheet, Xilinx Inc.,
[1] C. Liang et. al., “Nonlinear amplifier effects in http://www.xilinx.com/xlnx/xil prodcat
communications systems,” IEEE Transactions on landingpage.jsp?title=Spartan-3
Microwave Theory and Techniques, Vol. 47 , Issue

Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved

You might also like