0% found this document useful (0 votes)

218 views90 pages

DSP Cours V2 PDF

Uploaded by

Hamza Smahri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

218 views90 pages

DSP Cours V2 PDF

Uploaded by

Hamza Smahri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 90

2016/2017

Filière : ING en Génie Electrique

Semestre : S4

Higher School of Technical Education

Electrical Engineering Department
Rabat, Morocco

Elément de module:
Traitement du signal et implémentation sur DSP
Digital Signal Processing and Applications with the
TMS320C6713 DSK

Prof A.JBARI
[email protected]

Pr. A.JBARI DSP implementation 1

Objectives

Become familiar with

DSP basics
TMS320C6713 floating point DSP architecture
TMS320C6713 DSP starter kit (DSK)
Code composer studio integrated development environment (IDE)
Matlab design and analysis tools
Learn how to program the C6713
Writing and compiling code
Fixing errors
Downloading code to the target and executing
Debugging
Write and run useful programs on the C6713 DSK
Learn about DSP applications
Learn where to find help

Pr. A.JBARI DSP implementation 2

Bibliography

Books:
– Digital Signal Processing Using MATLAB®, Third Edition, Vinay K. Ingle and John G.
Proakis, (Cengage Learning 200 First Stamford Place, Suite 400, Stamford, CT 06902,
USA)
– Digital Signal Processing Fundamentals and Applications, Li Tan, DeVry University
Decatur, Georgia, Copyright 2008, Elsevier Inc
– Digital Signal Processing and Applications with the C6713 and C6416 DSK, Rulph
Chassaing,Worcester Polytechnic Institute, Copyright © 2005 by John Wiley & Sons, Inc.
– DSP Applications Using C and the TMS320C6x DSK, Rulph Chassaing, Copyright © 2002
by John Wiley & Sons, Inc.
Web documents and links:
• http://www.analog.com/media/en/technical-documentation/data-sheets/ADSP-
2101_2103_2105_2115.pdf?doc=AD7475_7495.pdf
• ftp://ftp.analog.com/pub/cftl/ADI%20Classics/Mixed%20Signal%20and%20DSP%20Design%20Tec
hniques,%202000/Section_7_DSP_Hardware.pdf
• dspworkshop_part1_2007.pdf , dspworkshop_part2_2007.pdf
• …

Pr. A.JBARI DSP implementation 3

Outline
1. Data representation
2. Digital Signal Processing
a) Digital Systems and fundmantals
b) Fast Fourier Transform (FFT)
c) Digital Filters Architectures
3. DSP Architectures
4. Programmation of DSP TMS320C6713
5. Applications

Pr. A.JBARI DSP implementation 4

1. Data representation

Integer representation

Unsigned Integers: can represent zero and positive integers.

Signed Integers: can represent zero, positive and negative integers.
The most-significant bit (msb) is called the sign bit. The sign bit is used to represent
the sign of the integer - with 0 for positive integers and 1 for negative integers.

Representation Positive Negative

Sign-Magnitude representation [(MSB=0) N] [(MSB=1) magnitude(N)]

1's Complement representation [(MSB=0) N] [(MSB=1) 1’s complement(N)]

2's Complement representation [(MSB=0) N] [(MSB=1) 2’s complement(N)]

Excess (or bias) representation B

N+B
Pr. A.JBARI DSP implementation 5
Integer representation

Example:
27=0 0 0 1 1 0 1 1

Representation Positive Negative

Sign-Magnitude representation 27 = 00011011 -27= 10011011

1's Complement representation 27= 00011011 -27=11100100

2's Complement representation 27= 00011011 -27=11100101

Excess (or bias) representation 27 27+63=90 = -27 -27+63=36 =

63 01011010 00100100

Pr. A.JBARI DSP implementation 6

Integer representation

Types: Computer can manipulate integers of various lengths (format):

Format Type Range

8 bits byte (in Java, char type in C, C++) [ -127 128]

16 bits short (in Java, C and C++) [-32767 32768]

32 bits int (in Java, C and C++) [-231 231 - 1]

64 bits long (in Java, C and C++) [ -263 and 263 - 1]

128 bits long long (in C and C++)

Pr. A.JBARI DSP implementation 7

Sign extension

Sign extension (widening conversion):

– 8 bit 2s compl. repr. for 7 is: 00000111

– 16 bit 2s compl. repr. for 7 is: 0000000000000111

– 8 bit 2s compl. repr. for -7 is: 11111001

– 16 bit 2s compl. repr. for -7 is: 1111111111111001

Pr. A.JBARI DSP implementation 8

Fixed--point representation
Fixed

To define a fixed point type conceptually, all we need are two parameters:
width of the number representation, and binary point position within the
number
Real number= integer part
· Fractional part

Notation : fixed<w,b> : w denotes the number of bits used as a whole

(the Width of a number), and b denotes the position of binary point
counting from the least significant bit (counting from 0).
• Examples:
N = 15,75
Format fixed(8,3): 01111110
Format fixed(32,16): 00000000000011111100000000000000

Pr. A.JBARI DSP implementation 9

Floating--point representation
Floating
A floating-point number is typically expressed in the scientific notation with
a fraction (F) (mantissa), and an exponent (E) of a certain radix (r), in the form of

N= F× r ^ E:
Decimal numbers use radix of 10 (F×10^E);
while binary numbers use radix of 2 (F×2^E).
• Examples: N=48
r=10: N= 4,8 * 101 : F=4,8 ; E=1
r=2: N= 48 = 1,5*25 : F=1,5 ; E =5

Pr. A.JBARI DSP implementation 10

Floating--point representation
Floating
IEEE-754 32-bit Single Precision Floating-Point Numbers
The most significant bit is the sign bit (S), with 0 for positive numbers and 1 for negative
numbers.
The following 8 bits represent exponent (E): excess -127 (or bias -127)
The remaining 23 bits represents fraction (F).

N= -48 = -1,5*25
= 1 10000100 10000000000000000000000
= C2400000 h

The value (N) is calculated as follows:

Normalized form: For 1 ≤ E ≤ 254, N = (-1)^S × 1.F × 2^(E-127).
Denormalized form: For E = 0, N = (-1)^S × 0.F × 2^(-126). These are in the denormalized form.
For E = 255, N represents special values, such as ±INF (infinity), NaN (not a number).
Pr. A.JBARI DSP implementation 11
Floating--point representation
Floating
IEEE-754 64-bit Double-Precision Floating-Point Numbers
The most significant bit is the sign bit (S), with 0 for positive numbers and 1 for negative
numbers.
The following 11 bits represent exponent (E): excess-1023 (or bias -1023)
The remaining 52 bits represents fraction (F).

The value (N) is calculated as follows:

Normalized form: For 1 ≤ E ≤ 2046, N = (-1)^S × 1.F × 2^(E-1023).
Denormalized form: For E = 0, N = (-1)^S × 0.F × 2^(-1022). These are in the denormalized form.
For E = 2047, N represents special values, such as ±INF (infinity), NaN (not a number).
Pr. A.JBARI DSP implementation 12
Big Endian vs. Little Endian

Modern computers store one byte of data in each memory address or location,
i.e., byte addressable memory. An 32-bit integer is, therefore, stored in 4 memory
addresses.
The term "Endian" refers to the order of storing bytes in computer memory. In "Big
Endian" scheme, the most significant byte is stored first in the lowest memory
address (or big in first), while "Little Endian" stores the least significant bytes in
the lowest memory address.
Examples:
The 32-bit integer 12345678H (221505317010) is stored as:
– in big endian: 12H 34H 56H 78H
– in little endian: 78H 56H 34H 12H.
An 16-bit integer 00H 01H is interpreted as 0001H in big endian, and 0100H as
little endian.

Pr. A.JBARI DSP implementation 13

2. Digital Signal Processing

Analog & digital signals

Analog Digital
Discrete function Vk of discrete
Continuous function V of
sampling variable tk, with k =
continuous variable t (time,
Sampled integer: Vk = V(tk).
space etc) : V(t).
signal

Sampling
operation

Uniform (periodic) sampling.

Sampling frequency: fS = 1/ tS

Pr. A.JBARI DSP implementation 14

Digital vs analog processing
Digital Signal Processing (DSPing)
Advantages Limitations:
More flexible. A/D & signal processors speed:
Often easier system upgrade. wide-band signals still difficult
Data easily stored -memory. to treat (real-time systems).
Better control over accuracy Finite word-length effect.
requirements.
Reproducibility.
Linear phase
No drift with time and temperature
Pr. A.JBARI DSP implementation 15
A digital signal processing scheme

Analog filter: to limit the frequency range of analog signals prior to the sampling process
and attenuate aliasing distortion (Antialias filter).
Analog-to-digital conversion (ADC) unit: to sample and convert band-limited signal into
the digital signal, which is discrete both in time and in amplitude.
Digital signal (DS) processor: processes the digital data according to DSP rules such as
lowpass, highpass, and bandpass digital filtering, or other algorithms for different
applications.
Digital-to-analog conversion (DAC) unit: Converts the processed digital signal to an
analog output signal which is continuous in time and discrete in amplitude.
Reconstruction (anti-image) filter: to smooth the DAC output voltage levels back to the
analog signal for real-world applications.

Pr. A.JBARI DSP implementation 16

Digital signal processing

Digital filtering:

The DSP block operates as

a simple digital lowpass filter.

Pr. A.JBARI DSP implementation 17

Digital signal processing
Signal Spectral (Frequency ) Analysis (Spectrum)

Pr. A.JBARI DSP implementation 18

Digital signal processing

Interference Cancellation in Electrocardiography:

Pr. A.JBARI DSP implementation 19

Typical DSP applications

communication systems astronomy

modulation/demodulation, channel VLBI, speckle interferometry
equalization, echo cancellation experimental physics
consumer electronics sensor-data evaluation
perceptual coding of audio and video aviation
on DVDs, speech synthesis, speech radar, radio navigation
recognition
security
music steganography, digital watermarking,
synthetic instruments, audio effects, biometric identification, surveillance
noise reduction systems, signals intelligence, elec-
medical diagnostics tronic warfare
magnetic-resonance and ultrasonic engineering
imaging, computer tomography, control systems, feature extraction
ECG, EEG, MEG, AED, audiology for pattern recognition
geophysics
seismology, oil exploration
Pr. A.JBARI DSP implementation 20
Signal Sampling
Sampling process: Sample and hold circuit Architecture:

Open-Loop Architecture

Sample and hold:

Closed-Loop Architecture with

follower output

Pr. A.JBARI DSP implementation 21

Signal Sampling
The simplified sampling process The pulse train:
+∞

= −

=0

Spectral analysis:

= ∗

+∞

Continuous signal Sampled signal = −

=−∞

The sampled signal spectrum is the sum of the scaled original spectrum and
copies of its shifted versions, called replicas.
Pr. A.JBARI DSP implementation 22
Signal Sampling

Original baseband spectrum,

Sampled signal spectrum

> baseband spectrum and its

replicas are separated

= 2 baseband spectrum and its

replicas are just connected

< 2 baseband spectrum and its

replicas are overlapped

Pr. A.JBARI DSP implementation 23

Signal Sampling

To obtain exact reconstruction of the original signal spectrum by applying

a lowpass reconstruction filter, the following condition must be satisfied:
≥ 2
Shannon sampling theorem:
theorem:
For a uniformly sampled DSP system, an analog signal can be perfectly
recovered as long as the sampling rate is at least twice as large as the
highest-frequency component of the analog signal to be sampled.

Half of the sampling frequency fs=2 is usually called the Nyquist frequency
(Nyquist limit), or folding frequency.

Pr. A.JBARI DSP implementation 24

Signal Reconstruction
Recover analog signal from its sampled signal version.

Condition: the spectrum of the sampled signal ys(t) contains the same spectral
content as the original spectrum X( f ).

Recovered
signal spectrum

Pr. A.JBARI DSP implementation 25

Digital Systems

A discreet-time system is a device or algorithm that operates

on an input sequence according to some computational
procedure.
It may be:
– A general purpose computer
– A microprocessor
– dedicated hardware
– A combination of all these

Pr. A.JBARI DSP implementation 26

Digital filters

Digital Filter: numerical procedure or algorithm that transforms a given sequence of

numbers into a second sequence that has some more desirable properties.

" !

The linear time-invariant digital filter can then be described by the linear
difference equation:
& &

Finite Impulse Response (FIR) Filters:

Filters: ! = #$ " − $ *+ = #$ +−$
$='
$='
& )
Infinite Impulse Response (IIR) Filters:
Filters: ! = #$ " − $ − ($ ! − $
$=' $=

,&
$=' #$ +
−$
*+ =
+ ,)
$=' ($ +
−$

Pr. A.JBARI DSP implementation 27

Digital Hardware implementation
TRANSVERSAL IMPLEMENTATION OF AN FIR FILTER (Tapped Delay Line)

Requirements: N memory locations for storing previous input.

Complexity: N+1 multiplications and N additions.

Pr. A.JBARI DSP implementation 28

Digital Hardware implementation
TRANSPOSED FIR FILTER IMPLEMENTATION

Pr. A.JBARI DSP implementation 29

Digital Hardware implementation

IIR FILTER DIRECT FORM 1

Pr. A.JBARI DSP implementation 30

Digital Hardware implementation
IIR FILTER DIRECT FORM 2

Pr. A.JBARI DSP implementation 31

Digital Hardware implementation
Parallel multiplier/accumulator cell fir filter implementation

Pr. A.JBARI DSP implementation 32

Discrete Fourier Transform (DFT)

• The DFT provides uniformly spaced samples of the Discrete-

Time Fourier Transform (DTFT).
• DFT definition:

2πnk N−
−1
1 2πnk
N −1 −j 1 j
X [k ] = ∑ x[n]e N x[n] =
N
∑ X [k ]e
n =0
N

n =0

• Requires:
– Complex multiplications: N2
– Complex additions: N(N-1)
– Real Multiplications: 4*N2
– Real additions: 2N (2N-1)

Pr. A.JBARI DSP implementation 33

Discrete Fourier Transform (DFT)

• Total computation complexity (complex operations):

T=N2 +N(N-1) = 2N2 – N ≡ O(N2)

Example: If each operation requires 1μs:

N=1000 : T=2000000 operations = 2s
N=5000 : T= 50 000 000 operations=50s

Although DFT is an efficient technique of obtaining the

frequency response of a sequence, it requires more number
of complex operations like additions and
multiplications.

Pr. A.JBARI DSP implementation 34

Faster DFT computation?

Take advantage of the symmetry and periodicity of the

complex exponential (let WN=e-j2π/N)

– Symmetry: WNk [ N − n ] = WN− kn = (WNkn )*

– Periodicity: WNkn = WNk [ n + N ] = WN[ k + N ]n
– Recursion property: WN2 = WN / 2
Note that two length N/2 DFTs take less computation than
one length N DFT: 2(N/2)2<N2
Algorithms that exploit computational savings are
collectively called Fast Fourier Transforms.

Pr. A.JBARI DSP implementation 35

Decimation--in
Decimation in--Time Algorithm

FFT: Fast Fourier Transform (Cooley and Tukey, 1965)

Consider expressing DFT with even and odd input samples:
N −1
X [k ] = ∑ x[n]WNnk
n =0

= ∑ x[ n
n even
]W nk
N + ∑ x[
n odd
n ]W nk
N

N −1 N −1
2 2

= ∑ x[2r ](WN2 ) rk + WNk ∑ x[2r + 1](WN2 ) rk

r =0 r =0
N −1 N −1
2 2

= ∑ x[2r ]WNrk/ 2 + WNk ∑ x[2r + 1]WNrk/ 2

r =0 r =0

Pr. A.JBARI DSP implementation 36

FFT Algorithm
Result is the sum of two N/2 length DFTs
X [k ] = G
{[k ] + WNk ⋅ H
{ [k ]
N/2 DFT N/2 DFT
of even samples of odd samples

Then repeat decomposition of N/2 to N/4 DFTs, etc.

Cross feed of G[k] and H[k] in flow diagram is called a “butterfly”, due to shape:

WNr
Or simplify:
WN( r + N 2) WNr -1
(= −WNr )

Pr. A.JBARI DSP implementation 37

FFT Algorithm

For N=8 :

Pr. A.JBARI DSP implementation 38

Detail of “Butterfly”

Cross feed of G[k] and H[k] in flow diagram is called a

“butterfly”, due to shape:

WNr
Or simplify:
WN( r + N 2 ) WNr -1
(= −WNr )

Pr. A.JBARI DSP implementation 39

FFT Algorithm

Repeat same process, Divide N/2-point DFTs into :

- Two N/4-points DFT
- Combine outputs

Pr. A.JBARI DSP implementation 40

FFT Algorithm

• After two steps of decimation in:

Pr. A.JBARI DSP implementation 41

FFT Algorithm
Flow graph for 8-point decimation in time:

The flow-graph consists of 3 stages

First stage computes the four 2-point DFTs
Second stage computes the two 4-point DFTs
Last stage computes the desired 8-point DFT

Pr. A.JBARI DSP implementation 42

FFT Algorithm
How much computation?
For N= 2M points:
Total of stages: M= log2 N
Total of butterflies: N/2
Each
Each butterfly: 1 complex addition and 2 complex multiplications.
Total computational complexity (complex operations)
T = 3N/2 * log2 N ≡ O(N log2 N)

Algorithm Complex multiplication Complex addition

DFT O(N2) O(N (N-1))
FFT O(N/2 *log2 N) O(N log2 N)

Pr. A.JBARI DSP implementation 43

FFT Algorithm

Pr. A.JBARI DSP implementation 44

Computation on DSP

Input and Output data

– Real data in X memory
– Imaginary data in Y memory
Coefficients (“twiddle” factors)
– cos (real) values in X memory
– sin (imag) values in Y memory
Inverse computed with exponent sign change
and 1/N scaling

Pr. A.JBARI DSP implementation 45

Kernels for Digital Signal Processing

Filtering, convolution: MAC (multiplication-accumulation)

Adaptation: MAD (multiplication-addition)

Complex multiplication, FFT:

Viterbi decoding: ACS : Add Compare Select

Motion estimation

SAD : Sum of absolute difference

Pr. A.JBARI DSP implementation 46

3. DSP architectures and
features

Pr. A.JBARI DSP implementation 47

General Architectures
Accumulator architecture
Load-store architecture

Memory-register architecture
register on-chip
file memory

Pr. A.JBARI DSP implementation 48

Harvard architecture
VON NEUMANN Architecture:

• unified external memory for program

and data
• all operands in registers
HARVARD Architecture (IBM in 1944 at Harvard University):

• separate program and data memories

• operands also in memory
• concurrent access to
• instruction word
• one or several data words

Example: MPYF3 (AR0)++, (AR1)++, R0

instruction from data from data
memory memory store result
from in data
program (address in (address in
address address register R0
memory
register AR0 register AR1)
Pr. A.JBARI DSP implementation 49
Classic DSP characteristics
Explicit parallelism
– Harvard architecture for concurrent data access
– concurrent operations on data and addresses
Optimized control flow and background processing
– zero-overhead loops
– DMA controllers
Special addressing modes
– distinction of address, data and modifier registers
– versatile address computation for indirect addressing
Specialized instructions
– single-cycle hardware multiplier
– multiply accumulate instruction (MAC)

Pr. A.JBARI DSP implementation 50

Specialized addressing modes

many DSPs distinguish address registers from data registers

Additional ALUs for address computations
– useful for indirect addressing (register points to operand in memory)
ADDF3 *(AR0)++, R1, R1
– operations on address registers in parallel with operations on data
registers, no extra cycles
– behavior depends on instruction and contents of special purpose
registers (modifier registers)
Typical address update functions
– increment/decrement by 1 (AR0++, AR0--)
– increment/decrement by constant specified in modifier register
(AR0 += MR0, AR0 -= MR5)
– circular addressing (AR0 += 1 if AR0 < upper limit, else AR0 = base address),
– bit-reverse addressing, …

Pr. A.JBARI DSP implementation 51

Circular addressing
Goal: implementation of ring buffers in linear address space
– implementation variants
copy data with data access, or
use circular addressing (don’t copy data, wrap pointers)
– supported by addressing modes
data access and move operations
increment operators that wrap around at buffer boundaries

Pr. A.JBARI DSP implementation 52

Bit
Bit--reverse addressing
Goal: accelerate FFT operation
very important DSP operation
transforms signals between time and frequency representations

other method to compute

basic operation in many mirror bits addresses, add N/2 with
DSP algorithms (bit reverse) reverse carry arithmetic

Pr. A.JBARI DSP implementation 53

Zero-overhead loops

example:
Goal
add first 100 values in array a
– reduce overhead for executing loops
and store result in R1
– general purpose processors
initialize loop counter TMS320C3x-like assembler
execute loop body
check loop exit condition
LDI @a, AR0!
branch to loop start or exit loop
LDI 0.0, R1!
– digital signal processors
RPTS 99!
initialize loop counter
ADDF3 *(AR0)++, R1, R1!
execute loop body
…
check loop exit condition
branch to loop start or exit loop
RPTS N repeats next
instruction N-1 times

Pr. A.JBARI DSP implementation 54

DSP Architecture
The DSP can fetch the
program instruction and
data in parallel at the same
time.

The multiplier and

accumulator
(MAC), is used for the
digital
filtering operation.

The shift unit, is used for

the scaling operation for
fixed-point implementation
when the processor
performs digital
filtering.
Pr. A.JBARI DSP implementation 55
ARITHMETIC LOGIC UNIT (ALU) FEATURES

■ Add, Subtract, Negate, Increment, Decrement, Absolute Value, AND, OR, Exclusive
OR, NOT
■ Bitwise Operators, Constant Operators
■ Multi-Precision Math Capabilites
■ Divide Primitives
■ Saturation Mode for Overflow Support
■ Background Registers for Single-Cycle Context Switch
■ Example Instructions:
◆ IF EQ AR = AX0 + AY0;
◆ AF = MR1 XOR AY1;
◆ AR = TGLBIT 7 OF AX1;

Pr. A.JBARI DSP implementation 56

MULTIPLY--ACCUMULATOR (MAC) FEATURES
MULTIPLY

■ Single-Cycle Multiply, Multiply-Add, Multiply-Subtract

■ 40-Bit Accumulator for Overflow Protection (219x Adds Second 40-Bit Accumulator)

■ Saturation Instruction Performs Single Cycle Overflow Cleanup

■ Background Registers for Single-Cycle Context Switch

■ Example MAC Instructions:

◆ MR = MX0 * MY0(US);

◆ IF MV SAT MR;

◆ MR = MR - AR * MY1(SS);

◆ MR = MR + MX1 * MY0(RND);

◆ IF LT MR = MX0 * MX0(UU);

Pr. A.JBARI DSP implementation 57

Hardware Multiply
Multiply//Accumulate (MAC) Unit

Specialized data-path for DSP

MAC Instructions
[IF cond] MR|MF
= xop * yop ; Multiply
= MR + xop * yop ; Multiply/Accumulate
= MR – xop * yop ; Multiply/Subtract
= MR ; Transfer MR
= 0 ; Clear
IF MV SAT MR ; Conditional MR Saturation
Pr. A.JBARI DSP implementation 58
Hardware Multiply
Multiply//Accumulate (MAC) Unit
Two types of parallel MAC:

Componentwise accumulation. across-component accumulation.

Pr. A.JBARI DSP implementation 59

ADSP--2100 Family DSP Microcomputers
ADSP
16-Bit Fixed-Point DSP Microprocessors with On-Chip Memory
Enhanced Harvard Architecture for Three-Bus
Performance: Instruction Bus & Dual Data Buses
Independent Computation Units: ALU, Multiplier/Accumulator, and Shifter
Single-Cycle Instruction Execution & Multifunction Instructions
On-Chip Program Memory RAM or ROM & Data Memory RAM
Integrated I/O Peripherals: Serial Ports, Timer, Host Interface Port (ADSP-2111 Only)
25 MIPS, 40 ns Maximum Instruction Rate
Separate On-Chip Buses for Program and Data Memory
Program Memory Stores Both Instructions and Data
Dual Data Address Generators with Modulo and
Bit-Reverse Addressing
Efficient Program Sequencing with Zero-Overhead
Looping: Single-Cycle Loop Setup
Pr. A.JBARI DSP implementation 60
ADSP--2100 Family DSP Microcomputers
ADSP

Pr. A.JBARI DSP implementation 61

Basic architecture of TMS320C54x family.

The fixed-point TMS320C54x

families supporting 16-bit data
have on-chip program memory
and data memory in various sizes
and configurations

The typical TMS320C54x

fixed-point DSP
architecture.

Pr. A.JBARI DSP implementation 62

The typical TMS320C3x floating-
floating-point DSP.

Pr. A.JBARI DSP implementation 63

Block diagram of TMS320C67x floating-
floating-point
DSP.

Pr. A.JBARI DSP implementation 64

Registers of the TMS320C67x floating-point
DSP.

Pr. A.JBARI DSP implementation 65

DSP range of applications

Pr. A.JBARI DSP implementation 66

Data representation for different DSP

DSP device Word length (no. of bits) Representation format

Texas Instruments TMS320C30 32 Floating point
Texas Instruments TMS320C54x 16 Fixed point
Texas Instruments TMS320C62xxx 16 Fixed point
Texas Instruments TMS320C67xxx 32 Floating point
Analog Devices DSP-2110 16 Fixed point
Analog Devices SHARC-21061 32 Floating point
Motorola DSP56001/2/9 24 Fixed point
Motorola DSP96000 32 Floating point
Lucent Technologies DSP1600 16 Fixed point
Lucent Technologies DSP16000 32 Fixed point

Pr. A.JBARI DSP implementation 67

DSP 320C6713

Pr. A.JBARI DSP implementation 68

TMS320C6713

Highest-Performance Floating-Point Digital VelociTI Advanced Very Long Instruction

Signal Processor (DSP): TMS320C6713 Word (VLIW) TMS320C67x DSP Core
– Eight 32-Bit Instructions/Cycle – Eight Independent Functional Units:
– 32/64-Bit Data Word – Two ALUs (Fixed-Point)
– 225-MHz (GDP), 150-MHz (PYP) Clock – Four ALUs (Floating- and Fixed-Point)
Rates – Two Multipliers (Floating- and
– 4.4-, 6.7-ns Instruction Cycle Time Fixed-Point)
– 1800 MIPS/1350 MFLOPS, – Load-Store Architecture With 32 32-Bit
1200 MIPS /900 MFLOPS General-Purpose Registers
– Rich Peripheral Set, Optimized for Audio – Instruction Packing Reduces Code Size
– Highly Optimized C/C++ Compiler – All Instructions Conditional

Pr. A.JBARI DSP implementation 69

TMS320C6713
L1/L2 Memory Architecture Two Multichannel Buffered Serial Ports
– 4K-Byte L1P Program Cache Two 32-Bit General-Purpose Timers
(Direct-Mapped) Dedicated GPIO Module With 16 pins
– 4K-Byte L1D Data Cache (2-Way) (External Interrupt Capable)
– 256K-Byte L2 Memory Total: 64K-Byte Flexible Phase-Locked-Loop (PLL) Based
L2 Unified Cache/Mapped RAM, and Clock Generator Module
192K-Byte Additional L2 Mapped RAM IEEE-1149.1 (JTAG†)
Device Configuration Boundary-Scan-Compatible
– Boot Mode: HPI, 8-, 16-, 32-Bit ROM Boot Package Options:
– Endianness: Little Endian, Big Endian – 208-Pin PowerPAD Plastic (Low-Profile)
16-Bit Host-Port Interface (HPI) Quad Flatpack (PYP)
Two Multichannel Audio Serial Ports – 272-Ball, Ball Grid Array Package (GDP)
(McASPs) 0.13-μm/6-Level Copper Metal Process
Two Inter-Integrated Circuit Bus (I2C Bus) – CMOS Technology
Multi-Master and Slave Interfaces 3.3-V I/Os, 1.2-V Internal (PYP)
3.3-V I/Os, 1.26-V Internal (GDP)

Pr. A.JBARI DSP implementation 70

TMS320C67x Block Diagram
functional block
and CPU (DSP core)
diagram

Pr. A.JBARI DSP implementation 71

TMS320C67x Block Diagram
One instruction is 32
bits. Program bus is 256 bits
wide.
- Can execute up to 8
instructions per clock cycle
(225MHz->4.4ns clock cycle).
8 independent functional
units:
- 2 multipliers
- 6 ALUs
Code is efficient if all 8
functional units are always
busy.
Register files each have 16
general purpose registers,
each 32-bits wide (A0-A15,
B0-B15).
Data paths are each 64 bits
wide.

Pr. A.JBARI DSP implementation 72

C6713 Functional Units

Two data paths (A & B)

Data path A
Multiply operations (.M1)
Logical and arithmetic operations (.L1)
Branch, bit manipulation, and arithmetic operations (.S1)
Loading/storing and arithmetic operations (.D1)
Data path B
Multiply operations (.M2)
Logical and arithmetic operations (.L2)
Branch, bit manipulation, and arithmetic operations (.S2)
Loading/storing and arithmetic operations (.D2)
All data (not program) transfers go through .D1 and .D2

Pr. A.JBARI DSP implementation 73

Fetch & Execute Packets

C6713 fetches 8 instructions at a time (256 bits)

Definition: “Fetch packet” is a group of 8 instructions fetched at once.
Coincidentally, C6713 has 8 functional units.
Ideally, all 8 instructions would be executed in parallel.
Often this isn’t possible, e.g.:
3 multiplies (only two .M functional units)
Results of instruction 3 needed by instruction 4 (must wait for 3 to
complete)

Pr. A.JBARI DSP implementation 74

Execute Packets
Definition: “Execute Packet” is a group of (8 or less) consecutive instructions
in one fetch packet that can be executed in parallel.

C compiler provides a flag to indicate which instructions should be run in

parallel.
You have to do this manually in Assembly using “||”.

Pr. A.JBARI DSP implementation 75

C6713 Instruction Pipeline Overview
All instructions flow through the following steps:
1. Fetch
a) PG: Program address Generate
b) PS: Program address Send
c) PW: Program address ready Wait
d) PR: Program fetch packet Receive
2. Decode
a) DP: Instruction DisPatch
each step
b) DC: Instruction DeCode = 1 clock cycle
3. Execute
a) 10 phases labeled E1-E10
b) Fixed point processors have only 5 phases (E1-E5)

Pr. A.JBARI DSP implementation 76

Pipelining: Ideal Operation

Remarks:
• At clock cycle 11, the pipeline is “full”
• There are no holes (“bubbles”) in the pipeline in this example

Pr. A.JBARI DSP implementation 77

Pipelining: “Actual
“Actual”” Operation

Remarks:
• Fetch packet n has 3 execution packets
• All subsequent fetch packets have 1 execution packet
• Notice the holes/bubbles in the pipeline caused by lack of parallelization
Pr. A.JBARI DSP implementation 78
Execute Stage of C6713 Pipeline

C67x has 10 execute phases (floating point)

C62x/C64x have 5 execute phases (fixed point)

Different types of instructions require different numbers of these phases to
complete their execution
Anywhere between 1 and all 10 phases
Most instruction tie up their functional unit for only one phase (E1)

Pr. A.JBARI DSP implementation 79

Execution Stage Examples (1)

results available
after E1 (zero delay slots)

Functional unit free

after E1 (1 functional
unit latency)

Pr. A.JBARI DSP implementation 80

Execution Stage Examples (2)

results available after

E4 (3 delay slots)
Functional unit free after E1
(1 functional unit latency)

Pr. A.JBARI DSP implementation 81

Execution Stage Examples (3)

Results available after

E10 (9 delay slots)
Functional unit free after E4
(4 functional unit latency)
Pr. A.JBARI DSP implementation 82
Functional Latency & Delay Slots

Functional Latency: How long must we wait for the functional unit to be free?
Delay Slots: How long must we wait for the result?
General remarks:
Functional unit latency <= Delay slots
Strange results will occur in ASM code if you don’t pay attention to delay
slots and functional unit latency
All problems can be resolved by “waiting” with NOPs
Efficient ASM code tries to keep functional units busy all of the time.
Efficient code is hard to write (and follow).

Pr. A.JBARI DSP implementation 83

DSP TMS 320C6713
Floating-Point Digital Signal Processor Device nomenclature

Pr. A.JBARI DSP implementation 84

C6713 DSP Starter Kit (DSK)

The TMS320C6713 DSP Starter Kit (DSK)

developed jointly with Spectrum Digital
is a low-cost development platform
designed to speed the development of
high precision applications based on
TI´s TMS320C6000 floating point DSP
generation.
Pr. A.JBARI DSP implementation 85
C6713 DSP Starter Kit (DSK)
Hardware Feature:
Texas Instrument's TMS320C6713 DSP operating at 225 Mhz
Embedded USB JTAG controller with plug and play drivers,
USB cable included
TLV320AIC codec
2M x 32 on board SDRAM
512K bytes of on board Flash ROM
3 expansion connectors (Memory Interface, Peripheral Interface, and Host Port Interface)
On board IEEE 1149.1 JTAG connection for optional emulator debug
Four 3.5 mm. audio jacks (microphone, line-in, speaker, and line out)
4 user definable LEDs
4 position dip switch, user definable
+5 Volt operation only, power supply included
Size: 8.25" x 4.5" (210 x 115 mm), 0.062" thick, 6 layers
Compatible with Spectrum Digital's DSK Wire Wrap Prototype Card
Software Features:
TMS320C6713 DSK specific Code Composer Studio from Texas Instruments
Test/sample code provided to reduce coding time
Compatible with National Instruments LabView Embedded 2.0
Compatible with JTAG emulators from Spectrum Digital
Compatible with Win 2000/XP

Pr. A.JBARI DSP implementation 86

C6713 DSP Starter Kit (DSK)

Pr. A.JBARI DSP implementation 87

C6713 DSP Starter Kit (DSK)

Pr. A.JBARI DSP implementation 88

Is my DSK working?

DSK Power On Self Test

Power up DSK and watch LEDs
Power On Self Test (POST) program stored in FLASH memory automatically
executes
POST takes 10-15 seconds to complete
All DSK subsystems are automatically tested
During POST, a 1kHz sinusoid is output from the AIC23 codec for 1 second
Listen with headphones or watch on oscilloscope
If POST is successful, all four LEDs blink 3 times and then remain on.

Pr. A.JBARI DSP implementation 89

Interfacing with the Real World

TMS320C6713 DSK:
digital inputs = 4 DIP switches
digital outputs = 4 LEDs
ADC and DAC = AIC23 codec

Pr. A.JBARI DSP implementation 90

DSP Based Electrical Lab: Gokaraju Rangaraju Institute of Engineering & Technology (Autonomous)
No ratings yet
DSP Based Electrical Lab: Gokaraju Rangaraju Institute of Engineering & Technology (Autonomous)
77 pages
Mtech DSP MC Lab Manual Gr22
No ratings yet
Mtech DSP MC Lab Manual Gr22
88 pages
Common Commands in ICC2 2 Place Stage
No ratings yet
Common Commands in ICC2 2 Place Stage
5 pages
Here: Design of Steel Structure by Subramanian PDF
100% (1)
Here: Design of Steel Structure by Subramanian PDF
2 pages
DSP - Mini Projects List PDF
0% (1)
DSP - Mini Projects List PDF
7 pages
3-Embedded Software Development-10-08-2023
No ratings yet
3-Embedded Software Development-10-08-2023
32 pages
Unit 5
No ratings yet
Unit 5
14 pages
V-BLAST AB v1
100% (1)
V-BLAST AB v1
15 pages
TP DSP
100% (1)
TP DSP
16 pages
DSP
No ratings yet
DSP
190 pages
Q Choose Pic
100% (2)
Q Choose Pic
72 pages
A Source Book in APL
100% (1)
A Source Book in APL
146 pages
Introduction To Microprocessor
No ratings yet
Introduction To Microprocessor
23 pages
Presentation (Internet Intranet Extranet)
0% (1)
Presentation (Internet Intranet Extranet)
22 pages
Spectrogram Using Short-Time Fourier Transform - MATLAB Spectrogram PDF
No ratings yet
Spectrogram Using Short-Time Fourier Transform - MATLAB Spectrogram PDF
25 pages
Arduino Signal Processing
No ratings yet
Arduino Signal Processing
7 pages
Mp3 Reference
No ratings yet
Mp3 Reference
45 pages
Vultar JDBC Driver For T24/jBase
100% (1)
Vultar JDBC Driver For T24/jBase
30 pages
Real-Time DSP: ECE 5655/4655 Lecture Notes
No ratings yet
Real-Time DSP: ECE 5655/4655 Lecture Notes
34 pages
DSP Lab Manual
100% (2)
DSP Lab Manual
84 pages
Direct Memory Access
100% (1)
Direct Memory Access
9 pages
Bca 240 CD
100% (2)
Bca 240 CD
67 pages
11 - Serial Communication
No ratings yet
11 - Serial Communication
45 pages
TMS320C6748 DSP BOARD Package Content
100% (1)
TMS320C6748 DSP BOARD Package Content
22 pages
Cuñado, Jeaneth T. Gequinto, Leah Jane P. Mangaring, Meleria S
No ratings yet
Cuñado, Jeaneth T. Gequinto, Leah Jane P. Mangaring, Meleria S
17 pages
Time Delay in PIC
100% (2)
Time Delay in PIC
2 pages
S 4 Introduction Architecture ARM Processeurs Embarqués PDF
No ratings yet
S 4 Introduction Architecture ARM Processeurs Embarqués PDF
9 pages
VHDL 4 RTL Models
No ratings yet
VHDL 4 RTL Models
33 pages
TP1 DSP
No ratings yet
TP1 DSP
6 pages
FPGA Seven-Segment-Display by Using Altera DE2-115 Board With Practice and Implementation
No ratings yet
FPGA Seven-Segment-Display by Using Altera DE2-115 Board With Practice and Implementation
4 pages
Lab DSP
No ratings yet
Lab DSP
82 pages
Digital Signal Processing Lab Mannual
100% (4)
Digital Signal Processing Lab Mannual
37 pages
EE Lab Manuls Fast Nu
No ratings yet
EE Lab Manuls Fast Nu
70 pages
DSK 6713 CCS V5.5 Procedure
No ratings yet
DSK 6713 CCS V5.5 Procedure
7 pages
DDS XIlinx Syssgen
No ratings yet
DDS XIlinx Syssgen
6 pages
Digital Sound Processing Using Arduino and MATLAB
No ratings yet
Digital Sound Processing Using Arduino and MATLAB
9 pages
Audio Processing Using Matlab
No ratings yet
Audio Processing Using Matlab
51 pages
Developing Csound Plugins With Cabbage PDF
No ratings yet
Developing Csound Plugins With Cabbage PDF
11 pages
L9 Understanding Atmega328P 2
100% (2)
L9 Understanding Atmega328P 2
15 pages
c6x Assembly Programming 1
No ratings yet
c6x Assembly Programming 1
20 pages
DSP Notes
0% (1)
DSP Notes
26 pages
Using C6713 With CCSv5 Revised20Oct
No ratings yet
Using C6713 With CCSv5 Revised20Oct
13 pages
Digital Signal Processing Lab
No ratings yet
Digital Signal Processing Lab
36 pages
MmwaveSensing FMCW Offlineviewing 0
No ratings yet
MmwaveSensing FMCW Offlineviewing 0
70 pages
DSP Lab Manual
No ratings yet
DSP Lab Manual
105 pages
Discovering Computers 2012: Your Interactive Guide To The Digital World
No ratings yet
Discovering Computers 2012: Your Interactive Guide To The Digital World
52 pages
8051 Microcontroller Programs: Addressing Modes
No ratings yet
8051 Microcontroller Programs: Addressing Modes
8 pages
DSP Lab Demo
No ratings yet
DSP Lab Demo
37 pages
DSP Project Report
No ratings yet
DSP Project Report
14 pages
Low Power Nine-Bit Sigma-Delta ADC Design Using TSMC 0.18micron Technology
100% (1)
Low Power Nine-Bit Sigma-Delta ADC Design Using TSMC 0.18micron Technology
5 pages
CS QP Xii 2020
100% (1)
CS QP Xii 2020
10 pages
Smith McMillan Forms
No ratings yet
Smith McMillan Forms
10 pages
Lab 1 Slides
No ratings yet
Lab 1 Slides
28 pages
ضغط الصوت
No ratings yet
ضغط الصوت
31 pages
Matlab 6713
No ratings yet
Matlab 6713
15 pages
PIC16F877 Timer Modules Tutorials - Timer1
100% (1)
PIC16F877 Timer Modules Tutorials - Timer1
4 pages
TMS320C6713 Digital Signal Processor
100% (1)
TMS320C6713 Digital Signal Processor
4 pages
DSK-6713 Audio Coder
No ratings yet
DSK-6713 Audio Coder
6 pages
Exercise 1
100% (1)
Exercise 1
3 pages
VHDL FPGA Applications
No ratings yet
VHDL FPGA Applications
18 pages
ECC Foreign Trade Vs - Sap GTS
100% (1)
ECC Foreign Trade Vs - Sap GTS
8 pages
Bangladesh University of Professionals Department of Information and Communication Technology Course No.: Communication Theory Laboratory (ICT 2208)
No ratings yet
Bangladesh University of Professionals Department of Information and Communication Technology Course No.: Communication Theory Laboratory (ICT 2208)
2 pages
CC Module 4
No ratings yet
CC Module 4
35 pages
Block Diagram Codec TLV320AIC3
No ratings yet
Block Diagram Codec TLV320AIC3
1 page
6416 DSK Quickstartguide PDF
No ratings yet
6416 DSK Quickstartguide PDF
4 pages
Python Interview Questions India
No ratings yet
Python Interview Questions India
5 pages
ECS 50 8086 Instruction Set Opcodes Operation Operands Opcode
No ratings yet
ECS 50 8086 Instruction Set Opcodes Operation Operands Opcode
2 pages
Project in DSP c6713
No ratings yet
Project in DSP c6713
2 pages
AN-1525 Single Supply Operation of The DAC0800 and DAC0802: Application Report
No ratings yet
AN-1525 Single Supply Operation of The DAC0800 and DAC0802: Application Report
6 pages
A Database of Romanian Love Charms
No ratings yet
A Database of Romanian Love Charms
2 pages
003 S4HANA-EvolutionRoadmap-EPM-18APRIL2024 - Michael
No ratings yet
003 S4HANA-EvolutionRoadmap-EPM-18APRIL2024 - Michael
30 pages
Open Sees Command Language Manual June 2006
No ratings yet
Open Sees Command Language Manual June 2006
465 pages
Teachers Weekly Accomplishment Report Format
No ratings yet
Teachers Weekly Accomplishment Report Format
2 pages
AMD Accelerated Parallel Processing OCL Programming Guide-2013!06!21
No ratings yet
AMD Accelerated Parallel Processing OCL Programming Guide-2013!06!21
288 pages
Programming PIC16F877A
No ratings yet
Programming PIC16F877A
2 pages
A5-R5: Data Structure Through Object Oriented Programming Language
No ratings yet
A5-R5: Data Structure Through Object Oriented Programming Language
8 pages
ALGORITHMS
No ratings yet
ALGORITHMS
16 pages
COS 101 Revision
No ratings yet
COS 101 Revision
7 pages
1) WAP To Demostrate Making of Thread To Print Numbers From 1 To 10
No ratings yet
1) WAP To Demostrate Making of Thread To Print Numbers From 1 To 10
5 pages
SQRRL Reservior RSAC 2016 1
No ratings yet
SQRRL Reservior RSAC 2016 1
13 pages
Resume Mayank
No ratings yet
Resume Mayank
1 page
Coverage Analysis For Model Based Design Tools: William Aldrich The Mathworks, Inc. 3 Apple Hill Dr. Natick, Ma 02478
No ratings yet
Coverage Analysis For Model Based Design Tools: William Aldrich The Mathworks, Inc. 3 Apple Hill Dr. Natick, Ma 02478
17 pages
Supported Devices: List of Special Relay
No ratings yet
Supported Devices: List of Special Relay
9 pages
Advatnages of Intranet Disadvantages of Intranet
No ratings yet
Advatnages of Intranet Disadvantages of Intranet
2 pages
Commands
No ratings yet
Commands
4 pages
IF (C2 B2,"Yes","No") : 1) IF Function To Return Text
No ratings yet
IF (C2 B2,"Yes","No") : 1) IF Function To Return Text
3 pages
Yeshwanth Pandeti Resume
No ratings yet
Yeshwanth Pandeti Resume
2 pages
GameRanger Launch Log
No ratings yet
GameRanger Launch Log
2 pages
Block-Diagram Tour Late Model (5100 Series) EF Johnson 700/800 MHZ 2-Way Radio RF Deck
No ratings yet
Block-Diagram Tour Late Model (5100 Series) EF Johnson 700/800 MHZ 2-Way Radio RF Deck
7 pages

DSP Cours V2 PDF

Uploaded by

DSP Cours V2 PDF

Uploaded by

2016/2017

Filière : ING en Génie Electrique

Higher School of Technical Education

Pr. A.JBARI DSP implementation 1

Become familiar with

Pr. A.JBARI DSP implementation 2

Pr. A.JBARI DSP implementation 3

Pr. A.JBARI DSP implementation 4

Unsigned Integers: can represent zero and positive integers.

Representation Positive Negative

1's Complement representation [(MSB=0) N] [(MSB=1) 1’s complement(N)]

2's Complement representation [(MSB=0) N] [(MSB=1) 2’s complement(N)]

Excess (or bias) representation B

Representation Positive Negative

1's Complement representation 27= 00011011 -27=11100100

2's Complement representation 27= 00011011 -27=11100101

Excess (or bias) representation 27 27+63=90 = -27 -27+63=36 =

Pr. A.JBARI DSP implementation 6

Types: Computer can manipulate integers of various lengths (format):

Format Type Range

16 bits short (in Java, C and C++) [-32767 32768]

32 bits int (in Java, C and C++) [-231 231 - 1]

128 bits long long (in C and C++)

Pr. A.JBARI DSP implementation 7

Sign extension (widening conversion):

– 8 bit 2s compl. repr. for 7 is: 00000111

– 16 bit 2s compl. repr. for 7 is: 0000000000000111

– 8 bit 2s compl. repr. for -7 is: 11111001

– 16 bit 2s compl. repr. for -7 is: 1111111111111001

Pr. A.JBARI DSP implementation 8

Notation : fixed<w,b> : w denotes the number of bits used as a whole

Pr. A.JBARI DSP implementation 9

Pr. A.JBARI DSP implementation 10

The value (N) is calculated as follows:

The value (N) is calculated as follows:

Pr. A.JBARI DSP implementation 13

Analog & digital signals

Uniform (periodic) sampling.

Pr. A.JBARI DSP implementation 14

Pr. A.JBARI DSP implementation 16

The DSP block operates as

Pr. A.JBARI DSP implementation 17

Pr. A.JBARI DSP implementation 18

Interference Cancellation in Electrocardiography:

Pr. A.JBARI DSP implementation 19

communication systems astronomy

Sample and hold:

Closed-Loop Architecture with

Pr. A.JBARI DSP implementation 21

  =  ∗ 

Original baseband spectrum,

Sampled signal spectrum

 >  baseband spectrum and its

 = 2 baseband spectrum and its

 < 2 baseband spectrum and its

Pr. A.JBARI DSP implementation 23

To obtain exact reconstruction of the original signal spectrum by applying

Pr. A.JBARI DSP implementation 24

Pr. A.JBARI DSP implementation 25

A discreet-time system is a device or algorithm that operates

Pr. A.JBARI DSP implementation 26

Digital Filter: numerical procedure or algorithm that transforms a given sequence of

Finite Impulse Response (FIR) Filters:

Pr. A.JBARI DSP implementation 27

Requirements: N memory locations for storing previous input.

Pr. A.JBARI DSP implementation 28

Pr. A.JBARI DSP implementation 29

IIR FILTER DIRECT FORM 1

Pr. A.JBARI DSP implementation 30

Pr. A.JBARI DSP implementation 31

Pr. A.JBARI DSP implementation 32

• The DFT provides uniformly spaced samples of the Discrete-

Pr. A.JBARI DSP implementation 33

• Total computation complexity (complex operations):

Example: If each operation requires 1μs:

Although DFT is an efficient technique of obtaining the

= ∗

> baseband spectrum and its

= 2 baseband spectrum and its

< 2 baseband spectrum and its

Example: MPYF3 (AR0)++, (AR1)++, R0