0% found this document useful (0 votes)
29 views8 pages

Swartzlander 1984

This paper describes the development of a semicustom delay commutator circuit to support the implementation of high speed fast Fourier transform processors based on the McClellan and Prady radix 4 pipeline FFT algorithm. The delay commutator is a 108000 transistor circuit comprising 12288 shift register stages and approximately 2000 gates of random logic realized with 2.5 micrometer design rule CMOS standard cell technology. It operates at a 10 MHz clock rate which processes data at a 40 MHz rate.

Uploaded by

Tân Giang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views8 pages

Swartzlander 1984

This paper describes the development of a semicustom delay commutator circuit to support the implementation of high speed fast Fourier transform processors based on the McClellan and Prady radix 4 pipeline FFT algorithm. The delay commutator is a 108000 transistor circuit comprising 12288 shift register stages and approximately 2000 gates of random logic realized with 2.5 micrometer design rule CMOS standard cell technology. It operates at a 10 MHz clock rate which processes data at a 40 MHz rate.

Uploaded by

Tân Giang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

702 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. SC-19, NO.

5, OCTOBER 1984

Katsrsyrdd Kaneko was born in Nagano, Japan,


on July 21, 1955. He received the B.S. and M.S.
degrees in communication engineering from
Tohoku University, Sendai, Japan, in 1978 and
1980, respectively.
He joined the Matsushita Electric Industrial
Co., Ltd., Osaka, in 1980. He has been working
on the design of MOS LSI circuits.
Mr. Kenko is a member of the Institute of
Electronics and Communication Engineers of
Japan.

A Radix 4 Delay Commutator for Fast


Fourier Transform Processor
Implementation
EARL E. SWARTZLANDER, JR., SENIOR MEMBER, IEEE, WENDELL K. W. YOUNG, MEMBER, IEEE, AND
SAUL J. JOSEPH, MEMBER, IEEE

,4/mtracf — This paper describes tbe deyelopmerst of a semicustom delay Software only and Software-PSP implementations are ade-
commutator circuit to support the implementation of high speed fast
quate when the spectral bandwidth is under 10 MHz.
Fourier transform processors based on the McCellan and Pnrdy radix 4
Custom processors achieve analysis bandwidths of 10-50
pipeline FFT algorithm. The delay commutator is a 108000 transistor
circuit comprising 12288 shift register stages and approximately 2000 gates MHz, but most are optimized for a specific application and
of random logic realized with 2.5 micrometer design rule CMOS standard would require extensive (and expensive) redesign to modify
cell technology. It operates at a 10 MHz clock rate which processes data at them to suit other applications. Thus general purpose
a 40 i%fHz rate. The delay commutator is suitable for implementing
computers with or without PSP augmentation are too slow
processors that compute transforms of 16, 64, 256, 1024, and 4096 (com-
while custom processors lack the required flexibility.
plex) points. It is implemented as a 4 bit wide data slice to facilitate
concatenation to accommodate common data word sizes and to use a Current signal processing systems require many diverse
standard 48 pin dual-in-line package. functions: transform computation, time and frequency do-
main vector processing, and general purpose computing.
We are developing a growing family of building block
I. INTRODUCTION
modules to facilitate the development and implementation
LTHOUGH the Cooley-Tukey FFT algorithm [1] of such systems on a semicustom basis. The result is the
developed nearly two decades ago has made it possible ability to quickly develop high performance signal
A
to apply digital signal analysis techniques to many applica- processing systems for a wide variety of algorithms. The
tions, many others (e.g., radar and sonar beam forming, use of predesigned and precharacterized modules reduces
adaptive filtering, communications spectrum analysis, etc.) cost, development time, and most importantly, risk. The
require both flexibility and speed that exceeds the present initial set of modules was described in 1983 [2]. The
state of the art. Currently, there are three approaches for modules defined include a data acquisition module, build-
signal processing: software implemented on general pur- ing block elements that are replicated to realize pipeline
pose computers, software implemented on a general pur- FFT and inverse FFT modules, a frequency domain filter
pose computer augmented with a Programmable Signal module, a power spectral density computational module,
Processor (PSP), and custom hardware development. and an output interface module.
The modules all have separate data and control inter-
Manuscript receivedApril 11, 1984; revisedJune 14, 1984. faces. The separation of the data and control is analogous
E. E. Swartzlander, Jr. and W. K. W. Young are with TRW Defense
SystemsGroup, Redondo Beach,CA 90278. to the Harvard mainframe computer architecture which
S. J. Josephk with AT&T Bell Laboratories, Allentown, PA 18103. uses separate data and instruction memories to eliminate

0018 -9200/84/1000-0702$01 .00 01984 IEEE


SWARTZLANDER et a[.: RADIX 4 DELAY COMMUTATOR 703

DATA { \ TR#JOFtM
INPUT

w COMPUTATIONAL ELEMENT CE
1
OELAY COMMUTATOR DC(XI

Fig. 1. Pipeline FFT arclutecture,

the “ von Neumann bottleneck.” In signal processing the the transform of long sequences (e.g., lK, 4K, or 116K
separation of data and control allows the simple data points) often requires complex logic, thereby mitigating the
interfaces to operate at high speed while the more flexible advantages of the VLSI realizations. They also lack the
and complex control interfaces operate at a slower rate. All flexibility to efficiently transform sequences of varying
data interfaces satisfy a common interface protocol so that lengths as may be required for many applications.
modules can be connected together to form architectures
that match the data flow of each specific system.

HI. THE PIPELINE FFT ALGORITHM


II. FOURIER TRANSFORM ALGORITHM SELECTIOPJ
Pipeline FFT algorithms are a small subset of the dltiny
Due to its importance in signal processing, the Fourier Fourier transform algorithms that have been devel(];p,ed’
transform module was selected for initial development. The over the last two decadw [6]. The pipeline algorithms, [~j,,
choice of algorithm for Fourier transform computation [8] are well suited for signal processing applications, wha~e.
depends on many factors including the system require- high data throughput is the dominant requirement. ,They
ments, component technology, and computational environ- are well suited to hardware implementation, due to; th$ir
ment. There are three main classes of algorithms: the inherent modularity. A ~ n length FFT is implemente:[f’~bj,
discrete Fourier transform (DFT), the fast Fourier trans- sequentially linking n modules where each module pe~r~
form (FFT) [1], and the Winograd Fourier transform (WFT) forms a radix K butterfly. Since K data paths are used, the
[3]. Each is optimum for certain situations. For example, if pipeline processor achieves a data rate of K times ‘the
only a relatively small number of spectral components are intermodule clock rate. The clock rate is independent of
required (as may occur in some speech and beam forming the transform length. ‘
applications) then the DFT may be optimum because of its Our FFT processor uses the radix 4 pipeline algorithm
simple control and negligible memory demands, even developed a decade ago by McClellan and Purdy [71. ,t
though it may involve more arithmetic than the other represents an extension to radix 4 of the pipeline ‘l-a
4 ~
approaches [4]. If the complete spectrum of long data concepts developed by Groginsky and Works [8]. With, the
sequences is required, either the FFT or WFT is generally radix 4 pipeline algorithm, data passes through a pipeline.
best. The WFT minimizes multiplication operations but network comprised of computational elements and dklay
involves rather complex data manipulation and control commutators as shown in Fig. 1. An important feature of
sequencing, so it is generally used for minimum hardware this algorithm and architecture is that only two types of
mini or microcomputer based implementations. The FFT is elements are used: computational elements and delay corm
attractive for VLSI implementation because of its modular- mutators. Only minor changes are required to implement
ity: processors can be realized based on repeated use of a forward and inverse transforms of lengths that are powers
single butterfly computation with simple control and data of 4 from 16 to 4096 points. The changes involve varying
manipulation. the number of stages connected in series, changing the
Recently descriptions of two “single chip” transform counter step size on the computational elements, and
processors have been published [4], [5]. Both of these changing the length of the delays on the delay commutator.
designs are tailored to perform transforms of specific The computational element performs a four point discrete
lengths (i.e., 32 or 256 points). Concatenation of the short Fourier transform in 22 bit floating point arithmetic with
transforms computed by these implementations to compute single chip adders, subtracters, and multipliers [9]. The
704 IEEE JOURNAL OF SOLID-STATE C IRCU2TS, VOI,, SC-19, NO 5, OCTOBER 1984

01234567 89101112131415

INPUT
DATA
f 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 3?
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
\\ /
v
INPUT 1
DELAY 2+ 4 STAGE OELAYj-
3+ 8 STAGE DELAY j-
4+2 STAGF DELAY ~

01234$67 89101112131415
16 17 18 19 20 21 22 23 24 25 26 2? 28 29 30 31
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

A A A A A A /
v v v v v v

REORDERING 1—
THROUGti
DELAY COMMUTATOR COMMUTATOR 2
x=4 3
4 )( 2?’ XX:5

012 3 16 17 18 19 32 33 34 35 48 49 50 51
4567202122 23 36 37 38 39 52 53 54 55
8 9 10 1~ 24 25 26 27 40 41 42 43 56 57 58 59
12 13 14 15 28 29 30 31 44 45 46 47 60 61 62 63
\
v
1 +12 STAGE DELAY

2 + 8 STAGE DELAY+
OUTPUT
DEL4Y 3+ 4 STAGE DEW}

012 3 16 17 18 19 32 33 34 35 48 49 50 51
RADIX 4 45672021 22 23 36 37 38 39 52 53 54 55
BUTTERFLY
8 9 10 11 24 25 26 27 40 41 42 43 56 57 58 59
DATA
12 13 14 15 28 29 30 31 44 45 46 47 60 61 62 63

v
{
2 1 STAGE DEL4Y
INPUT
DELAY 342 STAGE DELAY
4 +3 STAGE DELAY ~

012 3 16 17 18 19 32 33 34 35 48 49 50 51
456 7 20 21 22 23 36 37 38 39 52 53 54 55
8 9 10 71 24 25 26 27 40 41 42 43 56 57 58 59

12 13 14 15 28 29 30 31 44 45 46 47 60 61 62 63

REORDERING

W%WQ962?96Z?!
THROUGH COMMUTATOR
DELAY COMMUTATOR
x=1

04 812162024283236 404d 48525660


1591317212529 33374145495357 61
2 610141822263024 38424650545862
3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63
,

OUTPUT 1 + 3 STAGE DEL4Y~


DEL4Y 2 + 2 STAGE DELAY}
3{ 1 STAGE DEIAY ~
\
4

04 812162024283236 404448525660
RADIX 4
BUTTERFLY 159131721 25 29 W 37 41 46 49 53 57 61
DATA 2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62
3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63

Fig. 2. Data patterns through two delay commutators for a 64 point


FFT (after [10, p. 611]).

delay commutator reorders the data between computa- commutator switch, where they are switched to selected
tional stages as required for the FFT algorithm. data paths. Lastly, the data are deskewed through a second
The interstate data reordering required at stage i in the set of delay lines.
implementation of a 4“ point transform is a base 4 digit The routing of data that occurs in processing a 64 point
reversal [10] of the data elements in a 4 X 4“ matrix. When transform with the radix 4 pipeline FFT algorithm is
data enter the delay commutator they pass through four graphically charted in Fig. 2. Input data numbered O-63
parallel delay lines which skew the data. The first data path are shown on four parallel streams at the top of the figure.
receives no delay, the second receives a delay of 4“ – i‘- 1, the The action of the first delay commutator (set for X=4) is
third receives a delay of 2 x 4“ –‘ – 1, and the fourth receives shown. It transforms the four streams of data separated by
a delay of 3 X 4“ –‘ – 1. The data then pass through the 16 points into four streams where the data are separated by
SWARTZLANDER et a[. ! RADIX 4 DELAY COMMUTATOR 705

OUTPUT
ENABLE %

— o 768 WORD
1 41
SHIFT
~ 2 MUX —
REGISTER -&
3
● a

DATA OATA
INPUT < } OUTPUT

1 *
768 WORD
SHIFT
REGISTER
.+
*
*
5:1
MUX

o
1
2
3
41
MuX +

OELAY 9

/ t ~ r
LENGTH
‘3
CTR RESET ●

CTR PRESET ● /
‘2
PRESET ●

ENABLE


81
MUX

COUNTER WITH
TAPS AT EVEN STAGES

Fig. 3, Delay commutator block diagram,

four points. These data are operated upon by a radix 4 and held in state O to disable the commutator switch
butterfly which does not change the data order. A second function. In this mode the chip provides fixed length
delay commutator (set for X=1) reorders the data to registers with delays of 256, 2 X 768, and 1280 which are
produce streams of adjacent data. This process is derived used to expand the delay commutator for 16384 point
and explained in greater detail in [10]. transforms. Data from the 4:1 multiplexer are output
through programmable length shift registers that are simi-
lar to the input registers.
IV. THE DELAY COMMUTATOR CIRCUIT Gate array, standard cell, and custom technologies were
considered for implementation of the delay commutator.
Careful examination of our initial (off the shelf technol- An optimum balance between high circuit performance
ogy) FFT module design revealed that much of the com- and low implementation cost was achieved using the AT&T
plexity was due to the delay commutator element. Initial Bell Laboratories’ polycell (standard cell) CMOS technol-
complexity estimates are 80 commercial integrated circuits ogy. This technology was selected because it is well suited
for the computational element and 180 circuits for the to the development of VLSI with high density shift reg-
delay commutator. The disparity in complexity arises be- isters and random logic. It is a twin tub 2.5 micrornr,ter
cause of the difficulty of realizing shift registers that can be CMOS technology with chain-stops for device isolation
set to a variety of lengths as required for the various and an epitaxial layer for latch-up protection [11].
delays. The most efficient approach involved simulating a The delay commutator circuit contains 12288 shift reg-
delay line by using a RAM with write and read addresses ister stages and about 2000 gates of random logic, fo]r a
displaced by a constant (i.e., the length of the simulated total complexity of 108000 transistors. At a clock rate of
delay line). In view of the high complexity of the delay 10 MHz, the power dissipation is under 1/2 W. The chip
commutator, development of a semicustom implementation size is 340X 376 roil. The very high speed integrated circuit
was undertaken. The resulting design of the delay commu- (VHSIC) program uses functional throughput rate (F’TR)
tator is a 4 bit wide slice that uses programmable length as a measure of circuit performance [12]. FTR is definecl as
shift registers and a 4 X 4 switch as shown in Fig. 3. Data the product of the number of gates times the chip clock
enter through shift registers with taps and multiplexer to rate divided by the area. By this measure (assuming a
set the delay at 1, 4, 16, 64, or 256 ( = X) in the uppermost conversion rate of 3 transistors to 1 gate), the delay com-
input register and multiplies of 2X and 3X in the middle mutator FTR is 4.3X 1011 gate. Hz/cm*. Although not
and lower registers, respectively. Four 4:1 multiplexer satisfying the important VHSIC environmental require-
implement the commutator function under the control of ments, this is impressive performance for a semicustom
the programmable rate counter. The final 2 bit counter/de- circuit. The chip is shown on Fig. 4. Each of the four bit
coder that controls the multiplexer settings can be reset slices is constructed with input registers in a column,
706 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. SC-19, NO. 5, OCTOBER i!)8A

VDD

DATA
IN

Fig. 5. Shift register cell

used to detect design flaws and give a measure of the


testability of the logic. While translating the logic design
into the silicon layout, a layout routine is used to automati-
cally produce an optimum layout with respect to size and
Fig, 4. Delay commutator photograph. parasitic. The parasitic are used during timing simulation
to verify the performance of the circuit. Masks are then
switching logic in two standard cell “random logic” col- generated and wafers are processed, tested, and packaged.
umns, and output registers in a column. The four nearly The output waveform for one of the channels of the
identical slices are about four times as tall as they are wide delay commutator is shown in Fig. 7. The 10 MHz clock
producing a roughly square chip when they are properly waveform is shown on the upper channel. A 5 MHz output
stacked. Thert is a minor variation in the random logic of data waveform is shown on the lower channel. Note that
eabh bit slice to account for sharing of the counters, the rise and fall times are in the 10–15 ns range.
decoders, click drivers, etc. The circuit is packaged in a 48 A logic analyzer test pattern for one bit of the 4 bit delay
pin ceramic dual-in-line package to accommodate the 32 commutator slice is shown in Fig. 8. Here input channels 1,
data, 11 control, and 5 power and ground connections. 2, 3, and 4 are commutated to output channels 4, 3, 2, and
The shift register cells consiit of eight transistors each. 1, respectively. The varying delays between the input chan-
Each cell measures 3.8 square mil and uses the same nels and the outputs are due to the shift registers (set here
two-phase nonoverlapping master-slave clocks as the ran- for X= 1). Tracing through the delay commutator circuit
dom logic. The circuit diagram of a single shift register cell in Fig. 1 indicates that the delays should be 6, 4, 2, and O
is shown in Fig. 5. The upper transistor is connected in a for output channels 1, 2, 3, and 4, respectively.
diode configuration to reduce the effective “l” level within The point of semicustom chip development is to reduce
the cell. This eliminates the threshold voltage drop across system complexity. This circuii succeeds admirably as
the clocked transmission gate since the clock maintains a shown in Table I. Development of the delay commutator
full swing from V~~ to V~~. This approach eliminates static chip reduces the complexity of a 40 MHz 4096 point FFT
dc power consumption while using a simple two-phase from 1375 commercial integrated circuits to 546 circuits (of
nonoverlapping clock since there is no p-type transmission which 66 are delay commutator chips) [13]. This is a 60
gate in parallel with the n-type transmission gate. A photo- percent complexity reduction achieved through use of a
micrograph of the cell is shown in Fig. 6. single sernicustom chip. For larger transforms, the com-
The 2000 gates .of random logic were implemented using plexity of a 16384 point FFT processor is reduced from
a library of standard cells. The geometrical descriptions of 1634 integrated circuits to 670 circuits with the VLSI delay
each cell serves as a common data base. Descriptions of commutator circuit. Such a reduction greatly improves
each cell are automatically generated for use in the auto- system reliability since connections between circuits repre-
mated layout and simulation programs for fault, logic, and sent the dominant failure mechanism in modern systems
timing analysis. [14]. With 60 percent fewer circuits (and a corresponding
The complete design cycle comprised two phases. In the reduction in the number of interconnections) the reliability
initial phase the functional block diagram was translated is greatly improved.
into a logic design where the logic elements are drawn from
the polycell library. In the second phase the logic design is
translated into a silicon layout. While translating the func- V. CONCLUSIONS
tional design into the logic design, the designer uses a
schematic capture routine which permits automatic polycell A high performance semicustom circuit has been devel-
implementation of high level functions and automatically oped to implement the interstage reordering required for
generates a machine readable circuit description which is radix 4 pipeline FFT computation. The chip is realized
SWARTZLANDER et U[.: RADIX 4 DELAY COMMUTATOR 707

Fig. 6. Photograph of shift register cell.

--J
1

7 5V

~20ns-1
UPPER TRACE 10 MHz CLOCK

LOWER TRACE OEL4V COMMUTATOR OUTPUT

Fig. 7. Clock and data output waveforms (10 MHz clock rate).
708 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. SC-19, NO. 5, :TOBER 1984

Fig. 8 Logic analyzer display of delay commutator operation showing


mapping of input channels 1, 2, 3, and 4 onto output channels 4, 3, 2,
and 1, respectively (with delay X = 1).

TABLE I
COMPLEXITY REDUCTION ACHIEVED WITH THE DELAY
COMMUTATOR CIRCUIT

r SYSTEM COMPLEXITY (INTEGRATED CIRCUIT COUNTI

*I THOUT OELAY COMMUTATOR CURCUIT WITH DELAY COMMUTATOR CIRCUIT

409S POINT FFT

COMPUTATIONAL ELEMENT 6 CARDS AT 80 CKTS, CARD 480 CKTS 6 CARDS AT 91 CKTS CARD 546 CkTS

DELAY COMMUTATOR 5 CA RDSAT 179 CKTS CARD 895 CKTS {I NC LUOEOONCGI

TOTAL
I 11 CARDS 1375 CKTS 6CAR0S 546 CKTS

COMPUTATIONAL EL GMENT 7 CARDS AT 80 CKTS CARDS 560 CKTS 7CARDSAT91 CKTS CARD 637 CKTS

DELAY COMMUTATOR 6CARD5AT 179 CKTS CARDS 1074 CKTS 1 EXTENOEDDC,1O24, 33 CKTS

TOTAL 13 CARDS 1634 CKTS 8 CAROS 670 CKTS

with twin tub 2.5 urn CMOS technology. It achieves an REFERENCES

FTI? of approximately 5 x 1011 gate. Hz~~m2.


[1] J. W Cooley and J. W. Tukey, “An algorithm for the machine
Use of this circuit reduces the complexity of high speed calculation of complex Fourier series,” Math. Cornput,, vol. 19, pp.
FFT processors by 60 percent relative to a commercial 297–301, 1965.
[2] E, E. Swartzlander, Jr., L. S. Lome, and G. Hallnor, “Digital signal
circuit implementation. Since it is organized as a 4 bit wide processing with VLSI technology,” in Proc. IEEE Int. Con/. A cow-
data slice, it is applicable to the realization of FFT ncs, Speech, und Signal Processing, 1983, pp. 951 –954,
[3] S Winograd, “On computing the discrete Fourier transform.”
processors for common data word sizes. It is used directly Math Cor?tpul., VOi. 32, pp. 175-199, 1978,
to implement processors for transform lengths of 16, 64, [4] J. L. van Meerbergen and F J. van Wyk, “A 256-point discrete
Fourier transform processor fabricated in a 2 ~m NMOS technol-
256, 1024, and 4096 (complex) points, and can be ex- ogy,” IEEE J. Solld-State Cvmuts, vol SC-18, pp. 604-609, 1983,
panded to accommodate 16384 point transforms. [5] G. D. Covert, “A 32 point monolithic FFT processor chip;’ in
SWARTZLANDER et a[.: RADIX 4 DELAY COMMUTATOR 709

Proc. IEEE In~. Conf. Acoustics, Speech, and Signal Processing, Editor of the books Computer Design Development (Rochelle Park, NJ:
. 1081-1083.
1982. . .DD. Hayden Book Co., 1976) and Computer Arithmetic (Stroudsburg, PA:
[6] I-L J. Nussbattmer, Fast Fourier Transform and Convolution Al- Dowden, Hutchinson & Ross, 1980).
gorithrns. New York: Springer-Verlag, 1982.
Dr. Swartzlander is an Editor of the IEEE TRANSACTIONS ON COM-
[7] J. H. McClellan and R. J. Purdy, “Applications of digital signal
PUTERS. He is a member of the Association for Computing Machinery and
processing to radar, “ in Applications of Digital Signal Processing,
A, V. Oppenheim, Ed. Englewood Cliffs, NJ: Prentice-Hafl, 1978, belongs to Eta Kappa Nu, Sigma Tau, and Omicron Delta Ktlppa
Ch 5 honorary fraternities, and is a registered Professional Engineer in Alabama,
[8] H. L Groginsky and G. A. Works, “A pipeline fast Fourier California, and Colorado.
transform,” IEEE Trans. Comput., vol. C-19, pp. 1015–1019, 1970,
[9] J. A. Eldon and C. Robertson, “A floating point format for signal
processing,” in Proc. IEEE Int. Conf. Acoustics, Speech, and
Signal Processing, 1982, PP. 717-120.
[10] L R. Rabiner and B. Gold, Theoiy and Applications of Digital
Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1975, ch.
10.
[11] L. C. Parrillo et al., “Twin-tub CMOS II—An advanced VLSI
Wendell K. W. Young (M83) received the IB. S.
technology~’ in Proc. Int. E[ectron Devices Meeting, 1982,pp.
degree in information computer sciences from
..
706–709 ..
[12] L. W. Sumney, “ VHSIC: A status report,” IEEE Spectrum, vol. 19, the University of Hawaii, Honolulu, in 1978.
pp. 34–39, Dec. 1982. In 1979 he joined TRW, Redondo Beach, CA,
[13] E. E. Swartzlander, Jr., and G. Haflnor, “Fast transform processor as a Member of the Technical Staff conducting
implementation,” in Proc. IEEE Int. Conf. Acoustics, Speech, and anafysis work on the post-flight navigation data
Signal Processing, 1984, pp. 25 A.5.I-25A.5.4. system for the Space Shuttle program. In 1980,
[14] G. W, Preston, “The very large scafe integrated circuit; Amer. he joined Magnavox Government Systems where
Scientist, vol. 71, pp. 466-472, 1983.
he was engaged in the development of integrated
navigation systems. He is currently a Sys terns
Engineer in the TRW Defense Systems Group
involved in the development of advanced signal processirig circuits and
Earl E. Swartzlander, Jr. (S’64-M’72-SM79) systems.
received the B. S.E.E. degree from Purdue Uni-
versity, Lafayette, IN, in 1967, the M. S.E.E. de-
gree from the University of Colorado, Boulder, in
1969, and the Ph.D. degree from the University
of Southern California, Los Angeles, in 1972. He
obtained his doctorate in computer design with
the support of a Howard Hughes Doctoraf Fel-
lowship. Saul J. Joseph (S’77-M’80) received the B. S.E.E.
He is currently the Manager of the Advanced degree from Rutgers University, New Brunswick,
Development Office in the Systems Engineering NJ, in 1978 and the M. S.E.E. degree from Lehigh
Operations of the TRW Def;nse Systems Group. - This inv;lves th~ University, Bethlehem, PA, in 1979.
conceptual definition and development of advanced signal processing In 1978 he joined AT&T Bell Laboratories. He
systems. His current activity focuses on the related issues of algorithms, is currently a Member of the Technical Staff in
architecture, and implementation with off-the-shelf as well as custom the VLSI Design Laboratory in Allentown, PA.
VLSI. He has directed the development of a variety of advanced systems, He is responsible for the design of semicustom
and has developed the architectural and functional design of VLSI CMOS circuits using the Polycell family of
components which are in varying stages of design, development, and standard cells.
production. He has published over 50 papers in the fields of computer Mr. Joseph is a member of Tau Beta Pi, Eta
architecture, VLSI implementation, and computer arithmetic, and is the Kappa Nu, and the National Socie~y of Professional Engineers.

You might also like