0% found this document useful (0 votes)

35 views43 pages

02 Architecture of Arm

Uploaded by

V Jacky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views43 pages

02 Architecture of Arm

Uploaded by

V Jacky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Applications of signal processors

ARCHITECTURE OF
DIGITAL SIGNAL
PROCESSORS

Author: Grzegorz Szwoch

Gdańsk University of Technology, Department of Multimedia Systems
Digital signal processor

Our definition:
digital signal processor (DSP)
is a chipset specialized in optimal processing of digital signal
samples, performing repeated operations on each sample.

In this lecture we will answer the question:

which features of a digital signal processor architecture allow for
optimal processing of digital signals, better than a CPU?
Main features of DSP architecture

The most important features of DSP architecture, that separate

them from general purpose processors, are:
▪ Harward architecture,
▪ pipelining,
▪ circular addressing,
▪ special instructions (MAC, vectorization, etc.).
DSP components

The most important components of a DSP:

▪ ALU – arithmetic-logic unit, operations: + – AND OR NOT XOR
▪ multiplier (*)
▪ FPU – floating point processing unit
▪ registers – memory cells holding data on which the processor
operates,
▪ accumulator – a special register which holds intermediate
results of the operations,
▪ address generator
▪ buses – lines for data exchange between registers
and memory.
Performing operations

Example calculations in a program:

y = 0.5 * a + 0.3 * b + 0.2 * c
A typical sequence of operations on a processor:
▪ read data from memory (a, b, c, constants), write them
to registers,
▪ execute operations (* * + * +), save intermediate results
in the accumulator,
▪ copy the result to the memory (y).
Accumulator

▪ Accumulator is a special register which holds results of most

arithmetic and logic operations.
▪ On a 16-bit processor, multiplication of two 16-bit numbers
yields a 32-bit result, so the accumulator must be at least
32 bits long.
▪ When the intermediate results are accumulated, the number
can exceed 32 bits.
▪ Therefore, the accumulator has additional “guard bits”.
▪ On 16-bit processors, the accumulator can be 40-bit long.
▪ A programmer can write to and read from registers,
including the accumulator.
Processor architectures

▪ von Neumann architecture

– common memory for program and data,
– used in general-purpose CPUs, e.g. in PC.
▪ Harward architecture
– separate memory for program and data,
– independent access to each memory,
– used e.g. in DSPs.
▪ Harward architecture extensions in DSPs:
– dual memory access (dual data buses),
– instruction cache,
– input/output controller.
Processor architectures
Von Neumann Architecture (single memory)
Von Neumann Architecture (single memory)

Memory address bus

Memory address bus
data and CPU
data and CPU
instructions instructions
data bus data bus

Harvard(dual
Harvard Architecture Architecture
memory) (dual memory)

Program Program
PM address busPM address bus
Memory DM address bus DM address
Data bus
Memory Memory M
CPU CPU
instructions instructions
only PM data bus PM data bus
only DM data bus DMdata
data bus
only d

Super Harvard Architecture

Super Harvard(dual memory, instruction
Architecture cache,
(dual memory, I/O controller)
instruction cache, I/O controll
Block diagram of a DSP

CPU
PM Data DM Data
PM address bus Address Address DM address bus
Program Generator Generator Data
Memory Memory

instructions Program Sequencer

and
Instruction
secondary data only
Cache
data

PM data bus DM data bus

Data
Registers I/O
Controller
Multiplier (DMA)
ALU

Shifter High speed I/O

(serial, parallel,
ADC, DAC, etc.)
Processor cycles

▪ Processor is controlled by a clock, frequency of which is set

by a phase-locked loop (PLL) circuit.
▪ Each clock pulse starts a processor cycle.
▪ Each program instruction requires one or more cycles.
▪ Clock frequency determines the number of cycles available
per second for program execution. For example, a 100 MHz
clock means 100 million cycles per second.
▪ For an audio signal sampled with 48 kHz, we have 2083 cycles
to process one sample.
Instruction execution

Execution of a single instruction may be divided into stages:

▪ F (fetch) – get the instruction from memory or cache,
▪ D (decode) – decode the instruction
▪ E (execute) – run the instruction
▪ A (access) – open the memory
▪ S (store) – write the result to memory
Often, only F, D, E stages are considered.
Sequential processing

In sequential processing, a new instruction can be started only

after the previous one has completed.

Przetwarzanie niepotokowe

Clock cycle 1 2 3 4 5 6 7 8 9 10

Instr. 1 F1 D1 E1 A1 S1

Instr. 2 F2 D2 E2 A2 S2


Pipelining

Pipelining is performed as follows.

▪ First stage (F) of the first instruction is performed.
▪ When the processor proceeds to the second stage (D),
the first stage (F) of another instruction is started.
▪ Instruction stages are performed with overlapping,
which speeds up the program execution.
▪ Pipelining is used on digital signal processors.
▪ Conflict cases (hazards), such as jump to another instruction
in the code, break the pipeline, result in reverting partially
performed instructions and restarting the pipeline.
Pipelining

Przetwarzanie potokowe

Clock cycle 1 2 3 4 5 6 7 8 9 10

Instr. 1 F1 D1 E1 A1 S1

Instr. 2 F2 D2 E2 A2 S2

Instr. 3 F3 D3 E3 A3 S3

Instr. 4 F4 D4 E4 A4 S4

Instr. 5 F5 D5 E5 A5 S5

Instr. 6 F6 D6 E6 A6 S6


Linear buffer

A common case in digital signal processing (e.g. FFT filter).

▪ N latest samples are processed.
▪ Samples are stored in a buffer in memory.
▪ A new sample arrives:
– the oldest sample is removed,
– the remaining samples are shifted by one position,
– a new sample is written at the end of the buffer.
▪ This is a linear buffer.
▪ Processor cycles are wasted for moving samples in memory.
Circular buffer

▪ A circular buffer may be visualized as a ring.

▪ Pointer (index) indicates the current write position
(the oldest sample).
▪ A new sample is written at the pointer.
▪ The pointer is advanced to the next position.
▪ No data is moved, the other samples remain in position.
M =7
b0 x7 x8 x7
b1 b7 x6 x0 x7 x1 x6 x8

b2 b6  x5 x1 x6 x2 x5 x9 

b3 b5 x4 x2 x5 x3 x4 x2
b4 x3 x4 x3
Circular addressing

▪ In practice, the circular buffer is implemented as a normal

linear buffer and a pointer (index).
▪ The pointer is moved, it indicates the order in which the
samples are processed.
▪ Once the pointer reaches the buffer end, it wraps
to the beginning.
▪ A linear buffer in memory uses circular addressing.
▪ Digital signal processors have circular addressing
implemented in hardware.
Circular and linear buffer – an illustration

Circular buffer Linear buffer

13 14 10 11 12 10 11 12 13 14

15 13 14 15 11 12 15 11 12 13 14 15

16 13 14 15 16 12 16 12 13 14 15 16

17 13 14 15 16 17 17 13 14 15 16 17

18 18 14 15 16 17 18 14 15 16 17 18

19 18 19 15 16 17 19 15 16 17 18 19
Circular addressing

On a standard CPU, we have to wrap the pointer manually.

buffer[index] = new_sample; // write

// … do the processing
index = index + 1; // advance the index
if (index == N) // end of buffer
index = 0; // wrap index

On DSP, we use circular addressing

▪ in Assembler – by turning on the circular mode,
▪ in C – by using a special instruction (intrinsic):
buffer[index] = new_sample; // write
// … do the processing
index = _circ_incr(index, 1, N) // advance the index
// with wrapping
MAC

▪ A common operation is signal processing: multiply numbers

and accumulate the results
y←y+a*x
▪ MAC = multiply and accumulate.
▪ On a standard CPU, we must do multiplication and addition
separately.
▪ Signal processors have MAC implemented in hardware,
as a single processor instruction.
▪ This reduces the number of used cycles and speeds up
program execution.
▪ Many modern DSPs can do two MAC operations at the same
time (dual MAC).
MAC in practice

Standard CPU:

for (i = 0; i < N; i++) {

result += buffer[index] * coeff[index];
index = index + 1;
if (index == N) index = 0;
}

DSP with MAC

for (i = 0; i < N; i++) {
result = _smac(result, buffer[index], coeff[index]);
index = _circ_incr(index, 1, N);
}
SIMD (vectorization)

▪ Another common case: multiplication of two vectors.

▪ Requires N multiplications (N = vector length).
▪ Floating point numbers may be written with single (4 bytes)
or double (8 bytes) precision.
▪ A processor can multiply two 4B or two 8B numbers.
▪ Two 4B numbers can be packed into one 8B number.
▪ One 8B multiplication instead of two 4B multiplications.
▪ The number of multiplications is reduced to N/2.
▪ This is vectorization, or SIMD (single instruction, multiple
data) – the same operations on different data.
▪ CPUs also allow vectorization (SSE extensions).
Vectorization – an example

Without vectorization:

for (i = 0; i < N; i++) {

y[i] = a[i] * b[i];
}

With vectorization – more code, less operations:

for (i = 0; i < N; i+=2) {
_amem8_f2(&y[i]) =
_dmpysp(_amem8_f2(&a[i]), _amem8_f2(&b[i]));
}
Memory organization

DSP memory is divided physically and logically into several levels.

Each successive level has slower access time.
▪ L1 – cache memory, on DSP
– for internal use of the processor.
▪ L2 – DSP internal memory (on chip)
– can be used by a programmer (program and data),
– usually small (e.g. 1 MB).
▪ L3 – external memory (off chip)
– a separate memory module, e.g. DDR3
– much slower than L2 but can be bigger (GBs).
ROM memory, often flashable – program and constants.
SARAM and DARAM

▪ SARAM (single access random access memory)

– a standard memory, one memory read or write at a time.
▪ DARAM (dual access RAM) – dual data bus, two operations
at a time (two writes, two reads or read + write).
▪ On a DSP: the whole memory may be DARAM, or part
of memory may be DARAM and the rest is SARAM.
▪ Memory is divided into banks (pages) – concurrent access
to different banks.
▪ A programmer must consider which data to put into DARAM
and which may be in SARAM.
Memory map

▪ A memory map is specific for a DSP model, it is determined

by the manufacturer.
▪ Each memory area is assigned to a range of addresses.
▪ Address is a number which determines the position
in memory.
▪ A memory map assigns the logical addresses to physical
memory sections.
▪ It is required in each DSP program – the compiler must
have it to compile a program.
Memory map

An example of a memory map, from C5535 documentation:

Memory map

An example of memory map definition (C5535):

MEMORY
{
PAGE 0: /* ---- Unified Program/Data Address Space ---- */

MMR (RWIX): origin = 0x000000, length = 0x0000c0 /* MMRs */

DARAM0 (RWIX): origin = 0x0000c0, length = 0x00ff40 /* 64KB - MMRs */
SARAM0 (RWIX): origin = 0x010000, length = 0x010000 /* 64KB */
SARAM1 (RWIX): origin = 0x020000, length = 0x020000 /* 128KB */
SARAM2 (RWIX): origin = 0x040000, length = 0x00FE00 /* 64KB */
VECS (RWIX): origin = 0x04FE00, length = 0x000200 /* 512B */
PDROM (RIX): origin = 0xff8000, length = 0x008000 /* 32KB */

PAGE 2: /* -------- 64K-word I/O Address Space -------- */

IOPORT (RWI) : origin = 0x000000, length = 0x020000

}
Memory sections

Logical memory sections are assigned to addresses.

The main sections of a compiled program are:
▪ .text – program code
▪ .stack – stack (locally declared variables)
▪ .data – initialized variables
▪ .bss – global and static variables
▪ .const – constants
▪ .sysmem – stack (dynamically allocated memory)
A programmer may create custom sections.
Memory sections

An example of memory sections definition for a compiler (C5535):

SECTIONS
{
.text >> SARAM1|SARAM2|SARAM0 /* Code */
.stack > DARAM0 /* Primary system stack */
.sysstack > DARAM0 /* Secondary system stack */
.data >> DARAM0|SARAM0|SARAM1 /* Initialized vars */
.bss >> DARAM0|SARAM0|SARAM1 /* Global & static vars */
.const >> DARAM0|SARAM0|SARAM1 /* Constant data */
.sysmem > DARAM0|SARAM0|SARAM1 /* Dynamic memory (malloc) */
.switch > SARAM2 /* Switch statement tables */
.cinit > SARAM2 /* Auto-initialization tables */
.pinit > SARAM2 /* Initialization fn tables */
.cio > SARAM2 /* C I/O buffers */
.args > SARAM2 /* Arguments to main() */
vectors > VECS /* Interrupt vectors */
.ioport > IOPORT PAGE 2 /* Global & static ioport vars */
.fftcode > SARAM0 /* Custom sections */
.input > DARAM0, align(4)
}
Using sections in C code

This is how we can create a buffer in DARAM or SARAM,

in a default .bss section:
int buffer[8192];

If we declare an external memory in a memory map:

.ddr > DDR3

we can create a buffer in DDR memory using a compiler pragma

(example for a TI processor):
#pragma DATA_SECTION(bufor, "ddr");
int buffer[8192];
Internal and external memory

Which data in internal L2 memory (DARAM/SARAM)?

▪ program code, stack, heap
▪ most variables
▪ buffers that are often accessed
Which data in external L3 memory?
▪ large buffers that don’t fit in L2
▪ rarely accessed data
▪ archived processing results
Memory sections - remarks

In C programs:
▪ Globally declared variables (in main code, outside functions)
and static variables – in .bss.
▪ Local variables, declared inside functions (including main)
– in .stack.
▪ Dynamically created variables (with malloc) – in.sysmem.
▪ Constants (e.g. filter coefficients) – in .const.
Practical implications:
▪ do not declare large buffers inside functions
– the stack is small, and it may overflow
▪ constant data, such as filter coefficients, should be declared
as const.
Stack overflow

Stack overflow occurs when memory allocation for a variable

on the stack goes outside of a stack space.

Large buffer
allocated on
the stack
Used stack

Free stack

Overflow!
Used data memory

Free data memory

Stack overflow

What happens when a stack overflow occurs?

▪ On a desktop OS (e.g. Windows), the program crashes.
▪ There is no protection on a DSP! The memory is overwritten
without notice. What may happen:
– if the overwritten memory section was unused,
the program continues normally (for now),
– if data were stored in this section, they are lost;
the program can hang, or it may continue, but generate
incorrect results!
▪ It’s very hard to debug stack overflow errors.
▪ Therefore, it’s best to use the rule: all buffers (tables) are
declared in the main program section, not inside functions!
Direct memory access

▪ Signal samples that are received by the interfaces must be

written to memory.
▪ DMA (direct memory access) – interfaces have direct access
to the memory, without the processor.
▪ Data are transferred to/from memory without executing any
code by the processor, they do not use processor cycles.
▪ The program executes much faster, it is not blocked by
the interfaces.
▪ Almost all DSPs use DMA.
Direct memory access

CPU
PM Data DM Data
PM address bus Address Address DM address bus
Program Generator Generator Data
Memory Memory

instructions Program Sequencer

and
Instruction
secondary data only
Cache
data

PM data bus DM data bus

Data
Registers I/O
Controller
Multiplier (DMA)
ALU

Shifter High speed I/O

(serial, parallel,
ADC, DAC, etc.)
Interrupts

How should a program know that new data are available?

▪ Polling – continuously checking for new data
– uses processor cycles for checking
– introduces delays
▪ Interrupts – a better approach:
– when the DMA controller writes new data to memory,
it generates an interrupt – an informational message
– a programmer writes an interrupt handling procedure
that is called when new data are available
– interrupts have higher priority than a normal code
– they interrupt program execution
– lower delays, no processor cycles are wasted
Remarks on a C compiler (1)

▪ Each variable type takes a defined number of bytes,

e.g. float typically uses 4 bytes.
▪ Variable address is an integer that defines the position
of a variable in memory.
▪ Alignment – requirement that the address is divisible
by the type size (float: by 4).
▪ In some cases (dual operations), alignment requires that
the address is divisible by 2 × type size (float: by 8).
▪ Alignment is often a requirement from a compiler
to generate an optimized code.
Remarks on a C compiler (1)

Alignment must be forced by using compiler pragmas.

Example for a TI processor – alignment to 8 bytes:

#pragma DATA_ALIGN(buffer, 8);

int buffer[8192];
Remarks on a C compiler (2)

▪ We want to generate code for a loop using dual MAC

– two MACs in each loop iteration.
▪ By default, compiler usually won’t do that, because it doesn’t
know if all the conditions are fulfilled:
– a loop will be executed even number of times,
– the loop won’t break at some point,
– there is no overlap of buffers in memory.
▪ Therefore, the compiler plays safe (according to Murphy’s
law) and it generates suboptimal code.
▪ If we write program in Assembler, we have full control over
the code and we can optimize it ourselves.
Remarks on a C compiler (2)

Again, we have to use pragmas to inform the compiler:

▪ how many times the loop will iterate (MUST_ITERATE),
▪ how to unroll the loop (UNROLL),
▪ that buffers do not overlap (restrict)
Example (TI processor):

void vecmul(int* restrict y, int* restrict a,

int* restrict b, int n)
{
int i;
#pragma MUST_ITERATE(2,,2)
#pragma UNROLL(2)
for (i = 0; i < n; i++)
y[i] = a[i] * b[i];
}
Remarks on a C compiler (conclusion)

▪ In Assembler, we can generate optimal code, but it’s our duty

to ensure that it works correctly.
▪ The C compiler must ensure that the program always works
correctly, even if it works slower. In case of any “risk”,
optimizations are disabled.
▪ A programmer must use “magic pragmas” to inform
the compiler that the code can be optimized.
▪ However, in many cases, the compiler decides that it knows
better ☺. It doesn’t generate the code we want.
▪ In such cases, we can only write the code in Assembler
(or maybe the compiler is right?).

Important Questions For Intelligent Instrumentation PDF
No ratings yet
Important Questions For Intelligent Instrumentation PDF
20 pages
Embedded System: Shibu K V
No ratings yet
Embedded System: Shibu K V
29 pages
SCILAB DSP Lab Manual Scilab (12-13)
No ratings yet
SCILAB DSP Lab Manual Scilab (12-13)
70 pages
XMOS vs. FPGA
100% (1)
XMOS vs. FPGA
6 pages
DSP Notes
0% (1)
DSP Notes
26 pages
Digital Signal Processors and Architectures (DSPA) Unit-2
No ratings yet
Digital Signal Processors and Architectures (DSPA) Unit-2
92 pages
Digital Signal Processing Unit V: DSP Processor
No ratings yet
Digital Signal Processing Unit V: DSP Processor
20 pages
Advanced Processors: Overview of DSP Unit-5 Unit-6
No ratings yet
Advanced Processors: Overview of DSP Unit-5 Unit-6
58 pages
DSP Architecture - Part 1
No ratings yet
DSP Architecture - Part 1
36 pages
DSP C16 - UNIT-6 (Ref-2)
No ratings yet
DSP C16 - UNIT-6 (Ref-2)
26 pages
5.1. Unit V - DSP Processor
No ratings yet
5.1. Unit V - DSP Processor
83 pages
Introduction To DSP Processors: K. Vijaya Kumar Asst. Prof. Usharama College of Engineering & Technology
No ratings yet
Introduction To DSP Processors: K. Vijaya Kumar Asst. Prof. Usharama College of Engineering & Technology
45 pages
DSP Architecture
100% (1)
DSP Architecture
71 pages
DSP Presentation Overview For Class
100% (1)
DSP Presentation Overview For Class
71 pages
DSP
0% (1)
DSP
3 pages
DSP Notes Unit1 and 2
No ratings yet
DSP Notes Unit1 and 2
45 pages
Combine Syllabus of BE SEVENTH SEM and Eight Sem EN PDF
No ratings yet
Combine Syllabus of BE SEVENTH SEM and Eight Sem EN PDF
45 pages
Design and Implementation of High Speed 32 Bit Vedic Arithmetic Unit On FPGA
100% (2)
Design and Implementation of High Speed 32 Bit Vedic Arithmetic Unit On FPGA
25 pages
Urdhva Triyakbhyam Sutra: Application of Vedic Mathematics For A High Speed Multiplier R. Senapati and B. K. Bhoi Volume - 1, Number - 1 Publication Year: 2012, Page(s) : 59 - 66
No ratings yet
Urdhva Triyakbhyam Sutra: Application of Vedic Mathematics For A High Speed Multiplier R. Senapati and B. K. Bhoi Volume - 1, Number - 1 Publication Year: 2012, Page(s) : 59 - 66
8 pages
DSP Processors
100% (1)
DSP Processors
24 pages
MTech CSIT Syllabus
No ratings yet
MTech CSIT Syllabus
44 pages
Characteristics of DSP
100% (1)
Characteristics of DSP
15 pages
Es Notes Unit 1
No ratings yet
Es Notes Unit 1
20 pages
1993 TMS320C5x Users Guide PDF
No ratings yet
1993 TMS320C5x Users Guide PDF
654 pages
DSP Processors
No ratings yet
DSP Processors
114 pages
Unit 1: Fundamentals of Programmable DSPS: Bhooshan Humane
No ratings yet
Unit 1: Fundamentals of Programmable DSPS: Bhooshan Humane
60 pages
DSP Architectures
No ratings yet
DSP Architectures
71 pages
Introduction To Microprocessor
No ratings yet
Introduction To Microprocessor
38 pages
Sevya CV ASIC Verification - Susanth Gudiwada 2019 June
No ratings yet
Sevya CV ASIC Verification - Susanth Gudiwada 2019 June
6 pages
DSP Processor Fundamentals
No ratings yet
DSP Processor Fundamentals
58 pages
Digital Signal Processing Advanced
No ratings yet
Digital Signal Processing Advanced
14 pages
01 Introduction
No ratings yet
01 Introduction
29 pages
Architecture of DSP Processors
No ratings yet
Architecture of DSP Processors
13 pages
Implementation of DSP Algorithms
No ratings yet
Implementation of DSP Algorithms
20 pages
Unit 1
No ratings yet
Unit 1
44 pages
Computer Architecture Note by Redwan (UptoMemorySystem)
100% (1)
Computer Architecture Note by Redwan (UptoMemorySystem)
64 pages
Structure and Function of The Processor
No ratings yet
Structure and Function of The Processor
9 pages
DSP-8 (DSP Processors)
No ratings yet
DSP-8 (DSP Processors)
8 pages
INTRODUCTION TO DSP PROCESSORS Unit-5
No ratings yet
INTRODUCTION TO DSP PROCESSORS Unit-5
43 pages
Unit-5 DSP Processor
No ratings yet
Unit-5 DSP Processor
28 pages
Chap 15
No ratings yet
Chap 15
60 pages
EE412/CS455 Principles of Digital Audio and Video
No ratings yet
EE412/CS455 Principles of Digital Audio and Video
40 pages
Tiger SHARC Processor
No ratings yet
Tiger SHARC Processor
36 pages
Gpus For Ofdm Based SDR Prototyping: A Comparative Research Study
No ratings yet
Gpus For Ofdm Based SDR Prototyping: A Comparative Research Study
4 pages
DSP 5th Unit
No ratings yet
DSP 5th Unit
26 pages
VLSI
No ratings yet
VLSI
22 pages
Emmanuel Bejide
No ratings yet
Emmanuel Bejide
3 pages
PFC Boost Calculation TI PDF
No ratings yet
PFC Boost Calculation TI PDF
15 pages
Digital Signal Processor: Architecture
No ratings yet
Digital Signal Processor: Architecture
3 pages
Typical Characteristics: Microprocessor Digital Signal Processing
No ratings yet
Typical Characteristics: Microprocessor Digital Signal Processing
3 pages
Imp 22
No ratings yet
Imp 22
31 pages
Introduction To DSP Processors
No ratings yet
Introduction To DSP Processors
9 pages
Blackfin® Processor Instruction Set
No ratings yet
Blackfin® Processor Instruction Set
518 pages
CH 03
No ratings yet
CH 03
48 pages
DSP Unit-6
No ratings yet
DSP Unit-6
26 pages
Organization CH 2
No ratings yet
Organization CH 2
102 pages
DSP Module 5 Notes
No ratings yet
DSP Module 5 Notes
35 pages
Embedded System Notes
No ratings yet
Embedded System Notes
66 pages
Sem 8 Dec 2014 Tentative
No ratings yet
Sem 8 Dec 2014 Tentative
3 pages
Embedded Systems
No ratings yet
Embedded Systems
9 pages
1 DSP Processor
No ratings yet
1 DSP Processor
34 pages
Introduction To Digital Signal Processors (DSPS) - Student
No ratings yet
Introduction To Digital Signal Processors (DSPS) - Student
24 pages
DSP Lecture Notes 6th Sem Etc Complete 1588251170
No ratings yet
DSP Lecture Notes 6th Sem Etc Complete 1588251170
61 pages
Chap 15
No ratings yet
Chap 15
61 pages
CTE 433 Computer Architecture II
No ratings yet
CTE 433 Computer Architecture II
28 pages
7.7 Sectioion 7 Architecture, Data Communication & Networking
No ratings yet
7.7 Sectioion 7 Architecture, Data Communication & Networking
21 pages
Chapter 4
No ratings yet
Chapter 4
8 pages
Unit6 3
No ratings yet
Unit6 3
24 pages
Mil Sil Pil
No ratings yet
Mil Sil Pil
87 pages
VND - Openxmlformats Officedocument - Presentationml.presentation&rendition 1
No ratings yet
VND - Openxmlformats Officedocument - Presentationml.presentation&rendition 1
56 pages
DSP Arch
No ratings yet
DSP Arch
10 pages
Instructions and Addressing
No ratings yet
Instructions and Addressing
61 pages
J.operation of PLC Including Its Programming in Detail
No ratings yet
J.operation of PLC Including Its Programming in Detail
32 pages
DSP - Presentation - Sumit 5
No ratings yet
DSP - Presentation - Sumit 5
45 pages
DSP - Presentation - Sumit 1
No ratings yet
DSP - Presentation - Sumit 1
71 pages
Index
No ratings yet
Index
185 pages
DSP - Presentation - Sumit 2
No ratings yet
DSP - Presentation - Sumit 2
68 pages
DSP - Presentation - Sumit 3
No ratings yet
DSP - Presentation - Sumit 3
63 pages
DCP Previous QP
No ratings yet
DCP Previous QP
3 pages
DSP - Presentation - Sumit 4
No ratings yet
DSP - Presentation - Sumit 4
55 pages
DSP Unit-5
No ratings yet
DSP Unit-5
28 pages
Unit 5
No ratings yet
Unit 5
71 pages
A-Level - 4 - Processor Fundamentals
No ratings yet
A-Level - 4 - Processor Fundamentals
125 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
From Everand
Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
Rodrigo Copetti
No ratings yet
Master System Architecture: Architecture of Consoles: A Practical Analysis, #15
From Everand
Master System Architecture: Architecture of Consoles: A Practical Analysis, #15
Rodrigo Copetti
2/5 (1)
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
From Everand
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
Rodrigo Copetti
No ratings yet
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
Nintendo DS Architecture: Architecture of Consoles: A Practical Analysis, #14
From Everand
Nintendo DS Architecture: Architecture of Consoles: A Practical Analysis, #14
Rodrigo Copetti
No ratings yet

02 Architecture of Arm

Uploaded by

02 Architecture of Arm

Uploaded by

Applications of signal processors

Author: Grzegorz Szwoch

In this lecture we will answer the question:

The most important features of DSP architecture, that separate

The most important components of a DSP:

Example calculations in a program:

▪ Accumulator is a special register which holds results of most

▪ von Neumann architecture

Memory address bus

Super Harvard Architecture

instructions Program Sequencer

PM data bus DM data bus

Shifter High speed I/O

▪ Processor is controlled by a clock, frequency of which is set

Execution of a single instruction may be divided into stages:

In sequential processing, a new instruction can be started only

Pipelining is performed as follows.

A common case in digital signal processing (e.g. FFT filter).

▪ A circular buffer may be visualized as a ring.

▪ In practice, the circular buffer is implemented as a normal

Circular buffer Linear buffer

On a standard CPU, we have to wrap the pointer manually.

buffer[index] = new_sample; // write

On DSP, we use circular addressing

▪ A common operation is signal processing: multiply numbers

for (i = 0; i < N; i++) {

DSP with MAC

▪ Another common case: multiplication of two vectors.

for (i = 0; i < N; i++) {

With vectorization – more code, less operations:

DSP memory is divided physically and logically into several levels.

▪ SARAM (single access random access memory)

▪ A memory map is specific for a DSP model, it is determined

An example of a memory map, from C5535 documentation:

An example of memory map definition (C5535):

MMR (RWIX): origin = 0x000000, length = 0x0000c0 /* MMRs */

PAGE 2: /* -------- 64K-word I/O Address Space -------- */

IOPORT (RWI) : origin = 0x000000, length = 0x020000

Logical memory sections are assigned to addresses.

An example of memory sections definition for a compiler (C5535):

This is how we can create a buffer in DARAM or SARAM,

If we declare an external memory in a memory map:

we can create a buffer in DDR memory using a compiler pragma

Which data in internal L2 memory (DARAM/SARAM)?

Stack overflow occurs when memory allocation for a variable

Free data memory

What happens when a stack overflow occurs?

▪ Signal samples that are received by the interfaces must be

instructions Program Sequencer

PM data bus DM data bus

Shifter High speed I/O

How should a program know that new data are available?

▪ Each variable type takes a defined number of bytes,

Alignment must be forced by using compiler pragmas.

#pragma DATA_ALIGN(buffer, 8);

▪ We want to generate code for a loop using dual MAC

Again, we have to use pragmas to inform the compiler:

void vecmul(int* restrict y, int* restrict a,

▪ In Assembler, we can generate optimal code, but it’s our duty

You might also like