Digital Signal Processors For A Signal P
Digital Signal Processors For A Signal P
3, AUGUST 1999
Abstract— An undergraduate level laboratory course on dig- as the Intel 80 86 or Pentium in collaboration with the
ital signal processors and their use in real-time digital signal math coprocessor 80 87.
processing (DSP) systems is described. Special emphasis is put 2) The second semester is focused on real-time DSP and
on some practical issues such as execution time optimization,
finite wordlength effects and a pipelined strategy as a suitable the use of DSP’s to improve the performance of a real-
way to improve the performance of a real-time DSP system. time DSP system. The theoretical part covers A/D and
The primary purpose of the laboratory is a better understanding D/A signal conversion, finite impulse response (FIR)
of the theoretical concepts and a more active participation of and infinite impulse response (IIR) filter design, struc-
the students in the course. We think that this paper will be of ture and instruction set of fixed-point and floating-
interest to those teaching related courses such as Introduction to
DSP, Digital Signal Processing, Real-time DSP, Microprocessors point DSP’s and the finite wordlength effects in the
and Hardware/Software Interaction. The well-known Texas In- implementation of DSP algorithms. The practical part
struments’ TMS320C2x and TMS320C5x fixed-point processors of this course includes, besides the usual FIR and
are used as the fundamental tool in the lab experiments and the IIR filters implementation—which has been extensively
TMS320C3x is suggested as a low-cost floating-point alternative. covered in many publications [1], [2]—other lab ex-
Enough detail is provided in the paper to duplicate certain parts
of the experiments. periments which are in fact the subject of this paper.
The fundamental tool in these experiments is the DSP
Index Terms— DSP, educational, FFT, fixed-point, real-time, which is programmed in assembly language in order
signals.
to benefit from its specialized structure and instruction
set. In addition, the students have to deal with the
I. INTRODUCTION practical issues associated with real-time processing
S most readers will know out of experience, the gap and the DSP environment. In our laboratory, we use
A between learning the fundamentals in any subject and
actually applying them is quite wide. In this respect, digital
the well-known Texas Instruments’ TMS320C2x and
TMS320C5x processors which are available at low-cost
signal processing (DSP) is not an exception. Therefore, we in education kits. Although we have constantly referred
have decided to write this paper to provide undergraduates to the TMS320 family of DSP’s, it is important to remark
with the tools to undertake practical DSP assignments and that the proposed experiments can be realized with other
projects. families of processors as well.
The computer science studies at the University of La La- 3) The third semester deals with more advanced topics
guna, Spain, include three one-semester courses devoted to in DSP such as multirate digital signal processing,
DSP. The courses, consisting of a theoretical part and a filter banks, time-frequency representations and adaptive
practical part, are organized as follows. systems.
1) In the first one, the basics of discrete-time signals Finally, from our experience we are convinced that these
and systems are discussed incorporating the following experiments help the students to improve their understanding
topics: properties of discrete sequences, LTI systems, of those difficult concepts presented in a theoretical course.
frequency domain representation, the sampling theorem,
-transform, the discrete Fourier transform (DFT) and
different fast Fourier transform (FFT) algorithms (the II. DESCRIPTION OF THE SRFFT ALGORITHM
basic Cooley-Tukey FFT and more efficient algorithms Several papers have been published on algorithms to cal-
such as the Split-Radix FFT which is explained below). culate a length- DFT more efficiently than a
The laboratory that supplements the lecture notes com- Cooley–Tukey FFT of any radix. One of these algorithms is
prises the implementation of these FFT algorithms in the split-radix FFT (SRFFT) [3], which basically consists of
assembly language in a general purpose processor such the application of a radix-2 index map to the even indexed
terms and a radix-4 map to the odd indexed terms. Starting
Manuscript received June 23, 1997; revised April 13, 1999. This work was from the DFT definition
supported in part by the Comisión Interministerial de Ciencia y Tecnologı́a
(CICYT) under Project TIC95-597.
The authors are with the Centro Superior de Informática, University of La
Laguna, Tenerife, Spain.
Publisher Item Identifier S 0018-9359(99)06318-9.
TABLE I
SRFFT PERFORMANCE FOR A 2 2
80 86/80 87 FLOATING-POINT IMPLEMENTATION
TABLE II
SRFFT PERFORMANCE FOR A TMS320C25 FIXED-POINT IMPLEMENTATION
TABLE III
SRFFT PERFORMANCE FOR A TMS320C25 FLOATING-POINT IMPLEMENTATION
Fig. 5. Block diagram of the real-time digital signal processing system under consideration.
1) Acquisition Stage: For this stage, the students have at computation times in Tables I and II correspond to a
their availability, in the laboratory, a device designed by two-butterfly version, i.e., removing the unnecessary
ourselves that simply combines the signals generated by a multiplications by unity.
common function generator (sinusoidal, sawtooth, and square) 2) As the students work with real signals, the imaginary
to give up to 16 analog signals. These analog signals must parts of the complex entries are simply set to zero. This
be converted to a digital form before they can be processed. results in a certain amount of redundancy. To utilize
For this purpose, the students use a plug-in acquisition board the bandwidth of the FFT algorithm more effectively,
(Data Translation DT2801 or Axiom AX5611C) with up two -point real FFT’s can be computed simultaneously
to 16 input channels and 12-bit resolution. These boards with a single -point complex FFT [6].
support programmed input–output (PIO), interrupts, and direct Last but not least, unlike nonreal-time FFT applications,
memory access (DMA), which are the three primary data real-time FFT applications demand careful considerations of
transfer mechanisms for computer-based data acquisition. system memory utilization. Optimizing memory use is one
Once the lab equipment for the acquisition stage has been of the most difficult areas for the designer because there are
described, the students have to choose the fastest transfer many possibilities. We will focus on two of these possibilities:
mechanism and this is the DMA transfer. Although the differ- internal versus external memory and looped versus straight-
ent acquisition boards normally support several driver utilities line code.
to minimize the application programming, we think that these 1) To decrease execution time, a large -point FFT can
utilities may defeat the purpose of teaching the concepts. That be divided into smaller 256-point complex FFT’s and
is why the students are asked to develop their own routines to executed 256 complex points at a time utilizing the
program the acquisition board and the DMA controller. on-chip RAM (this is applicable to the TMS320C25,
2) Processing Stage: For this stage, the students can take if any other processor is used, memory considerations
advantage of the fact that they have already implemented will probably be different). This scheme takes advantage
the SRFFT algorithm in the TMS320C25 as described in the of the fact that off-chip data accesses take at least two
previous sections. So, what they have to do is to try to optimize cycles each while on-chip data accesses take one cycle
this implementation to reduce the processing time. Here are each. To speed execution, off-chip data blocks can be
some possible suggestions. efficiently moved into on-chip data memory via the
First of all, as this is a real-time application, the students BLKD instruction, which executes in a single cycle
should choose the fixed-point implementation of the SRFFT when used in the repeat mode.
as long as the precision obtained is acceptable. 2) Another possibility to get higher execution speed is
Second, a number of improvements in the SRFFT algorithm achieved by using straight-line code instead of looped
are possible. code. The tradeoff for this optimization is the larger
1) A reduction of additions as well as multiplications is program memory requirements of the straight-line code.
achieved by removing unnecessary complex multipli- 3) Display Stage: Once the DFT of the signals is per-
cations. In a program, this reduction is achieved by formed, the next stage is to display the transformed signals
having special butterflies for these cases. In fact, the on the computer monitor. For this purpose, the students are
198 IEEE TRANSACTIONS ON EDUCATION, VOL. 42, NO. 3, AUGUST 1999
asked to develop a program in C or a similar language. The they have to set certain important parameters such as
program running on the PC should give, at least, the possibility the number of input channels (1–16), the sampling
of displaying the real and imaginary parts as well as the frequency (avoiding the aliasing), and the length of
magnitude and phase of the DFT of one or more signals at the FFT ). The times
the same time. Many improvements are possible, for instance, required for the three stages of the pipeline to perform
temporal as well as frequency information (spectral power) their individual evaluations depend on these parameters,
could be displayed but it is obvious that all these improvements that is why we encourage the students to try different
would complicate the problem and would probably exceed the values for them and see how this affects the performance
time assigned to our laboratories. of the system. There will probably be idle times which,
4) Interfacing Considerations: The students have hitherto in order to achieve a good throughput rate, should be
separately considered the different stages of the pipeline: minimized as much as possible.
1) acquisition stage handled by the DMA coprocessor; 3) Taking advantage of the fact that the SRFFT program
2) processing stage carried out by the DSP coprocessor; which runs on the PC is available, the students can
3) display stage accomplished by the 80 86 or Pentium compare the difference in efficiency between the former
host processor. system and the system without the DSP coprocessor so
The next step is to interface one stage with the other trying that the host processor has also to compute the SRFFT,
to optimize the performance of the whole system in order to as well as displaying the results. One way to do this is
satisfy the real-time requirements. by varying the values of the parameters, for example,
In real-time applications, input–output data buffering is increasing the sampling frequency, until the system no
generally required. A circular buffer for DMA transfers is longer works correctly in real-time. It is expected that the
used with a buffer size of 64 Kbytes. The DMA coprocessor pipelined system with the DSP coprocessor will allow
is supposed to be constantly transferring data without the a higher frequency operation, but the students should
need to be reprogrammed. When a complete frame of input confirm this point, making sure that no data are lost.
data is collected, it is passed to the DSP coprocessor to be Finally, it is interesting to comment that a real-world appli-
transformed. Once the SRFFT is computed, the results are cation, which represents an improvement on this design, has
displayed on the screen. been developed for everyday clinical practice in neurophysi-
In Fig. 6, a timing diagram of the pipeline is shown. ology [7]. Our goal is to provide the neurophysiologist with a
From the figure, it may be deduced that at any time frame tool that makes possible the real-time acquisition, processing
is being displayed, frame is being processed and frame and display of the EEG and EP (electroencephalogram and
is being captured. evoked potentials) brain mapping representation.
In order to cope with this system and help students gain Due to the greater computational requirements, several
more insight into the experiment, we suggest they follow these changes have been made to the system shown in Fig. 5.
steps. Firstly, 16 or more electrodes situated on the patients scalp are
1) First, we propose they pass a single signal through the used for acquiring the EEG/EP signals. The acquisition and
system but without performing any kind of processing processing subsystems are integrated into a data translation
on it, just to verify the interfacing between the different fulcrum board [8] based on the TMS320C40 floating-point
stages. The signal displayed should reflect, in real-time, DSP. Moreover, in order to provide the neurophysiologist
the possible changes of frequency or amplitude realized with an exhaustive information about the signal features,
on the acquired signal. different data representations are displayed on the computer
2) Once this has been done, the students are asked to monitor. The information contained in these signals is rep-
supervise the behavior of the system to check if it resented in different ways: brain mapping in the time and
works as expected (as described above). Before starting, frequency domains (several spectral bands) and the usual one-
MORENO et al.: DIGITAL SIGNAL PROCESSORS FOR A SIGNAL PROCESSING LABORATORY 199
dimensional representation showing the behavior of the signals Lorenzo Moreno (M’91) received the M.S. and Ph.D. degrees from the
along the time. Universidad Complutense de Madrid, Spain, in 1973 and 1977, respectively.
He is presently Professor of the Department of Applied Physics at the
From the experience acquired with such an implementation, Universidad de La Laguna, Tenerife, Spain. His areas of interest include
we think that it would be interesting for the students to work control and signal processing.
with floating-point DSP’s, which are less sensitive to finite
precision errors. Since the hardware and software for the
C40 is usually too expensive for a students’ laboratory, we
suggest using the new TMS320C3x DSK by Texas Instruments
that uses a floating point DSP, more adequate for teaching Jose F. Sigut received the M.S. degree from the Universidad de La Laguna,
Tenerife, Spain, in 1993.
purposes. He is presently an Associate Professor in the Department of Applied Physics
at the same university. His areas of interest include signal processing and
artificial intelligence.
IV. CONCLUSIONS
A laboratory course about DSP’s and their use in real-time
DSP systems is described. The Split-Radix FFT algorithm is
implemented in the TMS320C25 processor both off-line and in
real-time putting special emphasis on some key aspects such Juan J. Merino received the M.S. degree from the Universidad de Sevilla,
as: Spain, in 1975.
Since 1991, he has been an Associate Professor in the Department of
1) optimization in execution time taking advantage of the Applied Physics at the Universidad de La Laguna, Tenerife, Spain. His areas
facilities offered by the TMS320C25 structure and in- of interest include biomedical signal processing and real-time systems.
struction set;
2) finite wordlength effects very important in fixed-point
processors and hardly appreciable in floating-point pro-
cessors;
3) pipelined strategy as a suitable way of improving the Jose I. Estevez received the degree in astrophysics in 1992 and the M.S.
degree in applied physics in 1994 from the Universidad de La Laguna,
performance of a real-time DSP system. Tenerife, Spain. He is working toward the Ph.D. degree in informatics.
Our experience allows us to state that these experiments He is an Associate Professor in the Universidad de La Laguna.
result in a better understanding of the theoretical concepts and
a more active participation of the students in the course.
REFERENCES
Jose Luis Sanchez received the M.S. and Ph.D. degrees from the Universidad
[1] D. W. Horning and R. Chaissaing, “IIR filter scaling for real-time signal de La Laguna, Tenerife, Spain, in 1987 and 1993, respectively.
processing,” IEEE Trans. Educ., vol. 34, pp. 108–112, Feb. 1991. He has been Assistant Professor (“profesor titular”) in the Department of
[2] T. Bose, “A digital signal processing laboratory for undergraduates,” Applied Physics at the same university since 1996. His areas of interest include
IEEE Trans. Educ., vol. 37, pp. 243–246 Aug. 1994. digital signal processing as well as DSP’s and their application to biomedical
[3] J. S. Lim and A. V. Oppenheim, Advanced Topics in Signal Processing. signals.
Englewood Cliffs, NJ: Prentice-Hall, 1988, pp. 227–232.
[4] TMS320C2x User’s Guide, Texas Instruments, 1993.
[5] Digital Signal Processing Applications with the TMS320 Family, vol. 1.
Texas Instruments, 1989, ch. 7.
[6] E. O. Brigham, The Fast Fourier Transform and its Applications.
Englwood Cliffs, NJ: Prentice-Hall, 1988, pp. 188–191.
[7] J. Merino, J. Sigut, A. Brito, L. Moreno, J. Estévez, J. D. Piñeiro, J. L. Ana Brito received the degree in informatics in 1996 from the Universidad
Sánchez, “A low-cost solution for real-time signal processing in brain de La Laguna, Tenerife, Spain. She was a research assistant in the Department
mapping,” in Proc. ICSPAT’97, vol. 1, Sept. 1997, pp. 328–332. of Applied Physics from 1995 to 1997. She is currently working in the field
[8] Spox/DT3801 User Manual. Data Translation, 1992. of geographical information systems for a Spanish company.