DSP Processor Architecture: TMS 320C67XX Blackfin Processor On Chip Resources and Programming Considerations
DSP Processor Architecture: TMS 320C67XX Blackfin Processor On Chip Resources and Programming Considerations
Summary
The TMS320C67x is a family of 32-bit floating-point DSP processor.
Its architecture is based on a VLIW architecture, which is similar
to fixed-point TMS320C62x and TMS320C64x processor.
The TMS320C67x extends the TMS320C62x instruction set to
support floating-point arithmetic.
Hence C67x is upward compatible with C62x but not with C64x.
The C67x has high precision and a large dynamic range suitable
for the applications like RADAR, SONAR, 3-D graphics, wireless
base stations and medical imaging.
C67x processor can execute 8 instruction per cycle.
On Chip Resources
On chip resources include memory, peripherals and external
memory interfaces.
Below table shows
chip resources for TMS320C67x
On-chipon
Memory
Data Memory
Program Memory
C6701
16K x 32
16K x 32
C6711
32K bits
L1 cache
32K bits
L1 cache
C6712
32K bits
L1 cache
32K bits
L1 Cache
C6713
4K byte
L1 cache
4K byte
L1 cache
Processo
r
Programming Considerations
Writing Correct and efficient assembly code for C67x processor
can be very challenging task due to the complex architecture and
deep pipeline.
Therefore, programming in C is highly recommended for the C67x.
The user may write the code in linear assembly (using .sa
extension), which is assembly code that has not allocated registers.
The assembly optimizer performs the task of assigning registers,
inserting NOP instructions automatically, and using loop
optimization before passing the code to the assembler and linker.
In addition, using intrinsic in C code can further enhance the
Programming Considerations
Similar to the C62x, the C67x processor use the same optimization
methods, such as parallel optimization, filling delay slots, loop
unrolling, and SIMD optimization.
The SIMD optimization is further enhanced in the C67x processor
with its long data path (64 bits).
For example, the C67x processor can perform LDDW, which reads
64 bits of data into a register pair.
It ca read two words or four short words, thus accessing two
single-precision floating-point data at a time.
C67x instruction also perform 2 32 x 32-bit or 4 16 x 16-bit
Conclusion
In this presentation we had an overview of the TMS320C67x DSP
processor (Blackfin Processor)
The on chip resources and memory of various C67x Processor
series were compared.
Programming consideration for TMS320C67x were highlighted.
And also saw how SIMD instruction double the performance of
the C62x processor.s
Questions??
Thank You