ADSP-BF70x Blackfin Programming Reference
ADSP-BF70x Blackfin Programming Reference
Programming Reference
Copyright Information
© 2016 Analog Devices, Inc., ALL RIGHTS RESERVED. This document may not be reproduced in any form
without prior, express written consent from Analog Devices, Inc.
Printed in the USA.
Disclaimer
Analog Devices, Inc. reserves the right to change this product without prior notice. Information furnished by Ana-
log Devices is believed to be accurate and reliable. However, no responsibility is assumed by Analog Devices for its
use; nor for any infringement of patents or other rights of third parties which may result from its use. No license is
granted by implication or otherwise under the patent rights of Analog Devices, Inc.
Introduction
Core Architecture........................................................................................................................................... 1–1
Glossary......................................................................................................................................................... 1–7
Register Names........................................................................................................................................... 1–7
Functional Units......................................................................................................................................... 1–8
Arithmetic Status Bits................................................................................................................................. 1–8
Fractional Convention ................................................................................................................................ 1–9
Saturation................................................................................................................................................. 1–10
Rounding and Truncating......................................................................................................................... 1–11
Automatic Circular Addressing ................................................................................................................. 1–12
Computational Units
Using Data Formats ....................................................................................................................................... 2–2
Binary String .............................................................................................................................................. 2–2
Unsigned Numbers..................................................................................................................................... 2–2
Program Sequencer
Introduction .................................................................................................................................................. 4–1
Sequencer-Related Registers ....................................................................................................................... 4–3
Memory
Memory Architecture..................................................................................................................................... 7–1
Overview of On-Chip Level-1 (L1) Memory .............................................................................................. 7–2
Overview of Other On-Chip (L2) and Off-Chip (L3) Memories ................................................................ 7–3
Debug
Watchpoint Unit............................................................................................................................................ 9–1
Instruction Watchpoints ............................................................................................................................. 9–2
WPIAx Registers ........................................................................................................................................ 9–3
WPIACNTx Registers ................................................................................................................................ 9–4
WPIACTL Register .................................................................................................................................... 9–4
Data Address Watchpoints.......................................................................................................................... 9–4
WPDAx Registers....................................................................................................................................... 9–5
References.................................................................................................................................................... 10–3
Numeric Formats
Unsigned or Signed: Two's-complement Format.......................................................................................... 11–1
Thank you for purchasing and developing systems using an ADSP-BF70x Blackfin+ processor from Analog Devices,
Inc.
Intended Audience
The primary audience for this manual is a programmer who is familiar with Analog Devices processors. The manual
assumes the audience has a working knowledge of the appropriate processor architecture and instruction set. Pro-
grammers who are unfamiliar with Analog Devices processors can use this manual, but should supplement it with
other texts, such as programming reference books and data sheets, that describe their target architecture.
Manual Contents
This manual consists of the following chapters:
• Introduction - provides a general description of the processor core architecture, memory architecture, instruc-
tion syntax, and notation conventions.
• Computational Units - describes the arithmetic/logic units (ALUs), multiplier/accumulator units (MACs),
shifter, and the set of video ALUs. The chapter also discusses data formats, data types, and register files.
• Operating Modes and States - describes the operating modes of the processor, as well as the Idle and Reset
states.
• Program Sequencer - describes the operation of the program sequencer, which controls program flow by pro-
viding the address of the next instruction to be executed. The chapter also discusses loops, subroutines, jumps,
interrupts, and exceptions.
• Address Arithmetic Unit - describes the Address Arithmetic Unit (AAU), including Data Address Generators
(DAGs), addressing modes, how to modify DAG and pointer registers, memory address alignment, and DAG
instructions.
Supported Processors
The following is the list of Analog Devices, Inc. processors supported by the CrossCore Embedded Studio® develop-
ment tools suite.
Product Information
Product information can be obtained from the Analog Devices Web site and CrossCore Embedded Studio online
Help system.
EngineerZone
EngineerZone is a technical support forum from Analog Devices. It allows you direct access to ADI technical sup-
port engineers. You can search FAQs and technical information to get quick answers to your embedded processing
and DSP design questions.
Use EngineerZone to connect with other DSP developers who face similar design challenges. You can also use this
open forum to share knowledge and collaborate with the ADI support team and your peers. Visit http://
ez.analog.com to sign up.
Example Description
File > Close Titles in reference sections indicate the location of an item within the CrossCore
Embedded Studio IDE's menu system (for example, the Close command appears
on the File menu).
{this | that} Alternative required items in syntax descriptions appear within curly brackets and
separated by vertical bars; read the example as this or that. One or the other is
required.
[this | that] Optional items in syntax descriptions appear within brackets and separated by
vertical bars; read the example as an optional this or that.
[this, …] Optional item lists in syntax descriptions appear within brackets delimited by
commas and terminated with an ellipse; read the example as an optional comma-
separated list of this.
.SECTION Commands, directives, keywords, and feature names are in text with Letter
Gothic font.
NOTE: To ensure upward compatibility with future implementations, write back the value that is read for re-
served bits in a register, unless otherwise specified.
1 Introduction
This Blackfin+ Processor Programming Reference provides details on the assembly language instructions used by Black-
fin+ processors. The Blackfin+ architecture extends the Micro Signal Architecture (MSA) core developed jointly by
Analog Devices, Inc. and Intel Corporation. This manual applies to all ADSP-BF7xx processor derivatives. All devi-
ces provide an identical core architecture and instruction set. Additional architectural features are only supported by
some devices and are identified in the manual as being optional features. A read-only memory-mapped register,
FEATURE0, enables run-time software to query the optional features implemented in a particular derivative. Some
details of the implementation may vary between derivatives. This is generally not visible to software, but system and
test code may depend on very specific aspects of the memory microarchitecture. Differences and commonalities at a
global level are discussed in the Memory chapter. For a full description of the system architecture beyond the Black-
fin+ core, refer to the specific hardware reference manual for your derivative. This section points out some of the
conventions used in this document.
The Blackfin+ processor combines a dual-MAC signal processing engine, an orthogonal RISC-like microprocessor
instruction set, flexible Single Instruction, Multiple Data (SIMD) capabilities, and multimedia features into a single
instruction set architecture.
Core Architecture
The Blackfin+ processor core contains two 16-bit multipliers (MACs), one 32-bit MAC, two 40-bit accumulators,
one 72-bit accumulator, two 40-bit Arithmetic Logic Units (ALUs), four 8-bit video ALUs, and a 40-bit shifter,
shown in the Processor Core Architecture figure. The Blackfin+ processors work on 8-, 16-, or 32-bit data from the
register file.
SP
I3 L3 B3 M3 FP
I2 L2 B2 M2 P5
I1 L1 B1 M1 DAG1 P4
I0 L0 B0 M0 P3
DAG0
P2
DA1 32
P1
DA0 32
P0
TO MEMORY
32 32
RAB PREG
SD 32
LD1 32 32 ASTAT
LD0 32
32
SEQUENCER
R7.H R7.L
R6.H R6.L
R5.H R5.L ALIGN
16 32 16
R4.H R4.L
8 8 8 8
R3.H R3.L
R2.H R2.L DECODE
R1.H R1.L BARREL
R0.H R0.L 40 SHIFTER 40 LOOP BUFFER
72
40 A0 A1 40 CONTROL
UNIT
32 32
The address arithmetic unit provides two addresses for simultaneous dual fetches from memory. It contains a multi-
ported register file consisting of four sets of 32-bit index, modify, length, and base registers (for circular buffering)
and eight additional 32-bit pointer registers (for C-style indexed stack manipulation).
Blackfin+ processors support a modified Harvard architecture in combination with a hierarchical memory structure.
Level 1 (L1) memories typically operate at the full processor speed with little or no latency. At the L1 level, the
instruction memory holds instructions only. The two data memories hold data, and a dedicated scratchpad data
memory can be used to store stack and local variable information.
In addition, multiple L1 memory blocks are provided, which may be configured as a mix of SRAM and cache. The
Memory Management Unit (MMU) provides memory protection for individual tasks that may be operating on the
core.
The architecture provides three modes of operation: User, Supervisor, and Emulation. User mode has restricted ac-
cess to a subset of system resources, thus providing a protected software environment. Supervisor and Emulation
modes have unrestricted access to the system and core resources.
The Blackfin+ processor instruction set is optimized so that 16-bit opcodes represent the most frequently used in-
structions. Complex DSP instructions are encoded into 32-bit opcodes as multi-function instructions, and some in-
structions with very large immediate values are encoded into 64-bit opcodes. Blackfin+ products support a limited
multi-issue capability, where a 32-bit instruction can be issued in parallel with two 16-bit instructions. This allows
the programmer to use many of the core resources in a single instruction cycle.
The Blackfin+ processor assembly language uses an algebraic syntax. The architecture is optimized for use with the
C compiler.
Memory Architecture
The Blackfin+ processor architecture structures memory as a single, unified 4 GB address space using 32-bit address-
es. All resources, including internal memory, external memory, and I/O control registers, occupy separate sections of
this common address space. The memory portions of this address space are arranged in a hierarchical structure to
provide a good cost/performance balance of some very fast, low-latency on-chip memory (as cache or SRAM) and
larger, lower-cost and lower-performance off-chip memory systems.
The memory DMA controller provides high-bandwidth data movement capabilities. It can perform block transfers
of code or data between the internal and external memory spaces.
Internal Memory
The L1 memory system is the primary, highest-performance memory available to the core. At a minimum, each
Blackfin+ processor has two blocks of on-chip memory that provide high-bandwidth access to the core:
• L1 instruction memory, consisting of SRAM and/or an instruction cache. This memory is accessed at the full
core clock rate.
• L1 data memory, consisting of SRAM and/or a data cache. This memory block is also accessed at the full core
clock rate.
On-chip Level 2 (L2) memory forms an on-chip memory hierarchy with L1 memory and provides much more ca-
pacity, but the latency is higher. The on-chip L2 memory may be made cacheable in L1 and is capable of storing
both instructions and data.
External Memory
External (off-chip) memory is accessed via on-chip memory peripherals such as DDR controllers.
Event Handling
The event controller on the Blackfin+ processor handles all asynchronous and synchronous events in the system. It
supports both nesting and prioritization. Nesting allows multiple event service routines to be active simultaneously,
and prioritization ensures that servicing a higher-priority event takes precedence over servicing a lower-priority
event. The controller provides support for five different types of events:
• Emulation - causes the processor to enter Emulation mode, allowing command and control of the processor via
the JTAG interface.
• Reset - resets the processor.
• Non-Maskable Interrupt (NMI) - the software watchdog timer or the NMI input signal to the processor can
generate this event. The NMI event is frequently used as a power-down indicator to initiate an orderly shut-
down of the system.
• Exceptions - synchronous to program flow, an exception is taken before the instruction is allowed to complete.
Conditions such as data alignment violations and undefined instructions cause exceptions.
• Interrupts - asynchronous to program flow. These events can be caused by input pins, timers, other peripherals,
and software.
Each event has an associated register to hold the return address and an associated return-from-event instruction.
When an event is triggered, the state of the processor is saved on the supervisor stack.
The processor event controller consists of two stages, the Core Event Controller (CEC) and the System Event Con-
troller (SEC). The CEC works with the SEC to prioritize and control all system events. Conceptually, interrupts
from the peripherals arrive at the SEC and are routed directly into a general-purpose interrupt of the CEC.
Syntax Conventions
The Blackfin+ processor instruction set supports several syntactical conventions that appear throughout this docu-
ment. These conventions relate to case sensitivity, free format, instruction delimiting, and comments.
Case Sensitivity
The instruction syntax is case insensitive. The assembler treats register names and instruction keywords in a case-
insensitive manner (i.e., R3.l, R3.L, r3.l, and r3.L are all valid, equivalent input to the assembler).
In explanations and descriptions throughout this manual, upper case is used to help the register names and keywords
stand out among normal text.
Free Format
Assembler input is free format and may appear anywhere on the line. One instruction may extend across multiple
lines, or more than one instruction may appear on the same line, and white space (e.g., space, tab, or a new line)
may appear anywhere between tokens. A token must not have embedded spaces. Tokens include numbers, register
names, keywords, user identifiers, and also some multi-character special symbols like "+=", "/*", or "||".
Instruction Delimiting
A semicolon must terminate every instruction. Several instructions can be placed together on a single line at the
programmer's discretion, provided each instruction ends with a semicolon.
Each complete instruction must end with a semicolon. Sometimes, a complete instruction will consist of more than
one operation. There are two cases where this occurs.
• Two general operations are combined to be issued across multiple computation units. In this case, a comma
separates the different parts:
a0 = r3.h * r2.l , a1 = r3.l * r2.h ;
• A general instruction is combined with one or two memory accesses as a multi-issue instruction. The latter
portions of instructions like these are separated by the parallel-issue "||" token. For example:
a0 = r3.h * r2.l || r1 = [p3++] || r4 = [i2++] ;
Comments
The assembler supports various kinds of comments, including:
• End of line: A double forward slash token ("//") indicates the beginning of a comment that concludes at the
next new line character.
• General comment: A general comment begins with the "/*" token and ends with the "*/" token. It may con-
tain any characters and extend over multiple lines.
Comments are not recursive; if the assembler sees a "/*" within a general comment, it issues an assembler warning
at build-time.
Notation Conventions
This manual and the assembler use the following conventions:
• Register names are alphabetical, followed by a number in cases where there are more than one register in a logi-
cal group. Thus, examples include ASTAT, FP, M2, and R3.
• Register names are reserved and may not be used as program identifiers.
• Some operations (such as the Move Register instruction) require a register pair. Register pairs are always Data
Registers or Accumulators and are denoted using a colon, for example, R3:2. The larger number must be writ-
ten first. Note that the hardware supports only odd-even pairs, for example, R7:6, R5:4, R3:2, R1:0 and
A1:0.
• Some instructions (such as the --SP (Push Multiple) instruction) require a group of adjacent registers. Adjacent
registers are syntactically denoted with the range enclosed in parentheses and separated by a colon. For exam-
ple, the range of data registers comprised of R3, R4, R5, R6, and R7 is written in this instruction as R7:3. Again,
the larger number appears first.
• Portions of a particular register may be individually specified by using the dot (".") syntax following the register
name, followed by a letter denoting the desired portion. For 32-bit registers, ".H" denotes the most-significant
("High") 16 bits, whereas ".L" denotes the least-significant 16 bits. Similar access control is available for the
40-bit accumulator registers, which is discussed later.
• any alignment requirements are designated by an optional "m" suffix followed by a number (e.g.,
pcrel13m2 is a 13-bit integer that must be an even number).
• Loop PC-relative, signed values are designated as "lppcrel" with the following modifiers:
• the decimal number indicates how many bits are required to represent the value in binary (e.g.,
lppcrel5 is a 5-bit value).
• any alignment requirements are designated by an optional "m" suffix followed by a number (e.g.,
lppcrel11m2 is an 11-bit integer that must be an even number).
Glossary
The following terms appear throughout this document. Without trying to explain the Blackfin+ processor, here are
the terms used along with their definitions.
Register Names
The architecture includes the registers shown in the Registers table.
Loop Count LC0 and LC1 contain the 32-bit counter for the zero-overhead loop iterations. These registers are initialized dur-
ing loop setup and decrement when the loop bottom is reached. The loop exits when the count reaches 0.
Loop Bottom LB0 and LB1 contain the 32-bit address of the instruction at the bottom of a zero-overhead loop.
Index Register The set of four 32-bit registers (I0, I1, I2, and I3) that normally contain byte addresses of data structures
(abbreviated to Ireg when referenced in this manual).
Modify Registers The set of four 32-bit registers (M0, M1, M2, and M3) that normally contain offset values that modify (add to or
subtract from) one of the Iregs (abbreviated to Mreg when referenced in this manual).
Register Description
Length Registers The set of four 32-bit registers (L0, L1, L2, and L3) that normally contain the length (in bytes) of a circular
buffer (abbreviated to Lreg when referenced in this manual). When a Lreg is 0, circular addressing for the
corresponding Ireg is disabled (e.g., if L3=0, circular addressing for the I3 index is disabled).
Base Registers The set of four 32-bit registers (B0, B1, B2, and B3) that normally contain the base address of a buffer in memo-
ry (abbreviated as Breg when referenced in this manual).
Functional Units
The architecture includes the three processor sections shown in the Units table.
Bit Description
AV0 Accumulator 0 Overflow
AVS0 Accumulator 0 Sticky Overflow
Set when AV0 is set, but remains set until explicitly cleared by software.
AV1 Accumulator 1 Overflow
AVS1 Accumulator 1 Sticky Overflow
Set when AV1 is set, but remains set until explicitly cleared by software.
AZ Zero
CC Control Code bit
Multipurpose bit set, cleared and tested by specific instructions.
V Overflow (for data register results)
V_COPY Copy of Overflow (for data register results)
VS Sticky Overflow (for data register results)
Set when V is set, but remains set until explicitly cleared by software.
Fractional Convention
Fractional numbers include subinteger components less than 1. Whereas decimal fractions appear to the right of a
decimal point, binary fractions appear to the right of a binal point.
In instructions that assume placement of a binal point (e.g., in computing sign bits for normalization or alignment
purposes), the binal point convention depends on the size of the register being used, as shown in the Fractional
Notation table and the Conventional Placement of Binal Point figure.
40-BIT ACCUMULATOR
S 8-BIT EXTENSION 31-BIT FRACTION
32-BIT REGISTER
S 31-BIT FRACTION
Saturation
When the result of an arithmetic operation exceeds the range of the destination register, important information can
be lost.
Saturation is a technique used to contain the quantity within the values that the destination register can represent.
When a value is computed that exceeds the capacity of the destination register, then the value written to the register
is the largest value that the register can hold with the same sign as the original value.
• If an operation would otherwise cause a positive value to overflow and become negative, the saturation instead
limits the result to the maximum positive value for the register size being used.
• Conversely, if an operation would otherwise cause a negative value to overflow and become positive, saturation
limits the result to the maximum negative value for the register's size.
The maximum positive value in a 16-bit register is 0x7FFF, whereas the maximum negative value is 0x8000. For
signed two's-complement fractional data in 1.15 format, the range of values that can be represented is -1 through
(1-2-15).
The maximum positive value in a 32-bit register is 0x7FFF_FFFF, whereas the maximum negative value is
0x8000_0000. For signed two's-complement fractional data in 1.31 format, the range of values that can be repre-
sented is -1 through (1-2-31).
The maximum positive value in a 40-bit register is 0x7F_FFFF_FFFF, whereas the maximum negative value is
0x80_0000_0000. For signed two's-complement fractional data in 9.31 format, the range of values that can be rep-
resented is -256 through (256-2-31).
The maximum positive value in a 64-bit register pair is 0x7FFF_FFFF_FFFF_FFFF, whereas the maximum negative
value is 0x8000_0000_0000_0000. For signed two's-complement fractional data in 1.63 format, the range of values
that can be represented is -1 through (1-2-63).
A real value held in the 80-bit accumulator pair, A1:0, only has 72 bits of useful data; therefore, the maximum
positive value in the 80-bit accumulator pair is 0x7F_FFFF_FFFF_FFFF_FFFF, and the maximum negative value is
0x80_0000_0000_0000_0000. For signed two's-complement fractional data in 9.63 format, the range of values that
can be represented is -256 through (256-2-63).
For example, if a 16-bit register containing 0x1000 (decimal integer +4096) was shifted left 3 places without satura-
tion, it would overflow to 0x8000 (decimal -32,768). With saturation, however, a left shift of 3 or more places
would always produce the largest positive 16-bit number, 0x7FFF (decimal +32,767).
Another common example is copying the lower half of a 32-bit register into a 16-bit register. If the 32-bit register
contains 0xFEED_0ACE and the lower half of this negative number is copied into a 16-bit register without satura-
tion, the result is 0x0ACE, which changes the sign to represent a positive number. With saturation, however, the 16-
bit result maintains its negative sign and becomes 0x8000.
The Blackfin+ architecture implements 40-bit saturation for all arithmetic operations that write a single accumulator
destination register except as noted in the individual instruction descriptions when an optional 32-bit saturation
mode can constrain a 40-bit accumulator to the 32-bit register range. The Blackfin+ architecture performs 32-bit
saturation for 32-bit destination registers only where noted in the instruction descriptions.
Overflow is the alternative to saturation. The number is allowed to simply exceed its bounds and lose its most signifi-
cant bit(s), retaining only the lowest (least-significant) portion of the number. Overflow can occur when a 40-bit
value is written to a 32-bit destination or when a 72-bit value is written to a 64-bit or 32-bit destination. If there
was any useful information in the upper eight bits of the 40-bit value, then information is lost in the process. Some
processor instructions report overflow conditions in the Arithmetic Status (ASTAT) register bits, as noted in the in-
struction descriptions.
Some instructions for this processor support biased and unbiased rounding, as governed by the Rounding Mode bit
in bit in the Arithmetic Status register (ASTAT.RND_MOD).
Another common way to reduce the significant bits representing a number is to simply mask off the N-M lower bits.
This process is known as truncation and results in a relatively large bias.
The 8-Bit Number Reduced to 4 Bits of Precision figure shows other examples of rounding and truncation methods.
• Four sets of circular buffer addressing registers, comprised of one of each of the Ireg, Breg, and Lreg register
groups (specifically, I0/B0/L0, I1/B1/L1, I2/B2/L2, and I3/B3/L3)
To qualify for circular addressing, the indexed address instruction must explicitly modify an index register. Some
indexed address instructions use a modify register (Mreg) to increment the Ireg value. In that case, any Mreg can
be used to increment any Ireg. The Ireg used in the instruction specifies which of the four circular buffer sets to
use.
The circular buffer registers define the length (Lreg) of the data block in bytes and the base (Breg) address to re-
initialize the Ireg when a wrap condition is encountered at the end of the buffer.
Some instructions, such as Add Immediate and Modify–Decrement, modify an index register without using it for ad-
dressing; however, even these instructions are still affected by circular addressing, if enabled.
Disable circular addressing for an Ireg by setting the corresponding Lreg to 0. For example, set L2 = 0 to disable
circular addressing for register I2. Any non-zero value in an Lreg enables circular addressing for its corresponding
DAG buffer registers.
2 Computational Units
The processor's computational units perform numeric processing for DSP and general control algorithms. The seven
computational units are two arithmetic/logic units (ALUs), three multiplier/accumulator (MAC) units, a shifter, and
a set of video ALUs, all of which get data from registers in the data register file. Computational instructions for these
units provide fixed-point operations, and each computational instruction can execute every cycle.
The computational units handle different types of operations. The ALUs perform arithmetic and logic operations.
The MACs perform multiplication and execute multiply/add and multiply/subtract operations. The shifter executes
logical and arithmetic shifts and performs bit packing and extraction. The video ALUs perform Single Instruction,
Multiple Data (SIMD) logical operations on specific 8-bit data operands.
Data moving into and out of the computational units goes through the data register file, which consists of eight 32-
bit registers. In operations requiring 16-bit operands, the registers are paired, providing sixteen possible 16-bit regis-
ters.
The processor's assembly language provides access to the data register file. The syntax allows programs to move data
to and from these registers while simultaneously specifying a computation's data format.
The Processor Core Architecture figure provides a graphical guide to the other topics in this chapter. An examination
of each computational unit provides details about its operation and is followed by a summary of computational in-
structions. Studying the details of the computational units, register files, and data buses leads to a better understand-
ing of proper data flow for computations. Next, details about the processor's advanced parallelism reveal how to take
advantage of multifunction instructions.
The Processor Core Architecture figure also shows the relationship between the data register file and the computa-
tional units (multipliers, ALUs, and shifter).
Single function MAC, ALU, and shifter instructions have unrestricted access to the data registers in the data register
file. Multifunction operations may have restrictions that are described in the section for that particular operation.
Two additional 40-bit registers, A0 and A1, provide accumulator results. These registers are dedicated to the ALUs
and are used primarily for multiply-and-accumulate functions.
The traditional modes of arithmetic operations, such as fractional and integer, are specified directly in the instruc-
tion. Rounding modes are set from the ASTAT register, which also records status and conditions for the results of the
computational operations.
SP
I3 L3 B3 M3 FP
I2 L2 B2 M2 P5
I1 L1 B1 M1 DAG1 P4
I0 L0 B0 M0 P3
DAG0
P2
DA1 32
P1
DA0 32
P0
TO MEMORY
32 32
RAB PREG
SD 32
LD1 32 32 ASTAT
LD0 32
32
SEQUENCER
R7.H R7.L
R6.H R6.L
R5.H R5.L ALIGN
16 32 16
R4.H R4.L
8 8 8 8
R3.H R3.L
R2.H R2.L DECODE
R1.H R1.L BARREL
R0.H R0.L 40 SHIFTER 40 LOOP BUFFER
72
40 A0 A1 40 CONTROL
UNIT
32 32
Binary String
The binary string format is the least complex binary notation, within which 16- or 32-bit data is treated as a bit
pattern. Examples of computations using this format are the logical operations NOT, AND, OR, and XOR. These ALU
operations treat their operands as binary strings with no provision for sign bit or binal point placement.
Unsigned Numbers
Unsigned binary numbers may be thought of as positive and having nearly twice the magnitude of a signed number
of the same length. The processor treats the least significant words of multiple precision numbers as unsigned num-
bers.
0x0001 0.000031
0x7FFF 0.999969
0xFFFF –0.000031
0x8000 –1.000000
–2 0 2–1 2–2 2–3 2–4 2–5 2–6 2–7 2–8 2–9 2–10 2–11 2–12 2–13 2–14 2–15
0x00000001 0.00000000047
0x7FFFFFFF 0.99999999953
0xFFFFFFFF –0.00000000047
0x80000000 –1.00000000000
–16 2–17 2–18 2–19 2–20 2–21 2–22 2–23 2–24 2–25 2–26 2–27 2–28 2–29 2–30 2–31
–2 0 2–1 2–2 2–3 2–4 2–5 2–6 2–7 2–8 2–9 2–10 2–11 2–12 2–13 2–14 2–15 2
Figure 2-2: Bit Weighting for 1.15 Numbers and 1.31 Numbers
Complex Numbers
Complex numbers are represented as two 16-bit fixed-point numbers, in either fractional 1.15 or 16-bit signed inte-
ger format. A complex operand is stored in a 32-bit data register with the real part stored in the least significant bits
(Rx.L) and the imaginary part in the most significant bits (Rx.H).
Register Files
The processor's computational units have three defined register groups, a data register file, a pointer register file, and
a set of Data Address Generation (DAG) registers.
• The data register file receives operands from the data buses for the computational units and stores computa-
tional results.
• The pointer register file has pointers for addressing operations.
• The DAG registers are dedicated registers that manage zero-overhead circular buffers for DSP operations.
The AAU Register Files figure provides more information about the pointer and DAG registers.
Pointer
Data Address Registers Registers
I0 L0 B0 M0 P0
I1 L1 B1 M1 P1
I2 L2 B2 M2 P2
I3 L3 B3 M3 P3
P4
P5
User SP
Supervisor SP
FP
Three separate 32-bit buses (two load, one store) connect the register file to L1 data memory. Transfers between the
data register file and data memory can move up to two 32-bit words of valid data per core clock cycle. Often, these
represent four 16-bit words.
Accumulator Registers
In addition to the data register file, the processor has two dedicated, 40-bit accumulator registers, A0 and A1. Each
can be referred to by its 16-bit low half (An.L) or 16-bit high half (An.H) or its 8-bit extension (An.X). Each can
also be referred to as a 32-bit register (An.W) consisting of the lower 32 bits, or as a complete 40-bit result register
(An). For more information, see the following registers:
• Accumulator 0 Register
• Accumulator 0 Extension Register
• Accumulator 1 Register
• Accumulator 1 Extension Register
These examples illustrate this convention:
A0 = A1; /* 40-bit move */
The accumulator registers may be used together to hold an 80-bit complex result or a 72-bit fixed-point result. The
combined accumulator register is called A1:0. A 72-bit fixed-point result is held in the combined register with the
least significant bits in A0.W, the middle bits in A1.W, and the most significant bits in A1.X.
39 0 39 0
A1 A0
39 32 31 0 39 32 31 0
39 32 31 16 15 0 39 32 31 16 15 0
IMAGINARY REAL
• Dreg_byte denotes the least significant byte of the data register file register (R[7:0]).
• Option (X) denotes sign-extended data into the uppermost bits of the destination register.
• Option (Z) denotes zero-extended data into the uppermost bits of the destination register.
• * indicates the status bit may be set or cleared, depending on the result of the instruction.
• ** indicates the status bit is cleared.
• - indicates no effect.
allreg = allreg ; *1 - - - - - - -
Ax = Ax ; - - - - - - -
Ax = Dreg ; - - - - - - -
Ax = Dreg (X) ; - - - - - - -
Ax = Dreg (Z) ; - - - - - - -
A1 = Dreg (X), A0 = Dreg (X) ; - - - - - - -
A1 = Dreg (Z), A0 = Dreg (Z) ; - - - - - - -
A1 = Dreg (X), A0 = Dreg (Z) ; - - - - - - -
A1 = Dreg (Z), A0 = Dreg (X) ; - - - - - - -
Dreg_even = A0 ; * * - - - - *
Dreg_odd = A1 ; * * - - - - *
Dreg_even = A0, * * - - - - *
Dreg_odd = A1 ;
Dreg_odd = A1, * * - - - - *
Dreg_even = A0 ;
IF CC DPreg = DPreg ; - - - - - - -
IF ! CC DPreg = DPreg ; - - - - - - -
Dreg = Dreg_lo (Z) ; * ** ** - - - **/-
Dreg = Dreg_lo (X) ; * * ** - - - **/-
Ax.X = Dreg_lo ; - - - - - - -
Dreg_lo = Ax.X ; - - - - - - -
*1 Warning: Not all register combinations are allowed. For details, see the functional description of the Move Register instruction.
Data Types
The Blackfin+ processor supports 32-bit words, 16-bit half-words, and bytes. The 32- and 16-bit words can be inte-
ger or fractional, and bytes are always integers. Integer data types can be signed or unsigned, whereas fractional data
types are always signed. 32-bit words can also be complex numbers comprised of 16-bit real and imaginary parts,
with the real part in the least significant bits and the imaginary part in the most significant bits of the data register.
The Data Types and Representation in Memory/Register table illustrates the formats for data that resides in memory,
the register file, and the accumulators. In the table, the letter d represents one bit, and the letter s represents one
signed bit.
Some instructions manipulate data in the registers by sign-extending or zero-extending the data to 32 bits:
• Instructions zero-extend unsigned data
• Instructions sign-extend signed 16-bit half-words and 8-bit bytes
Other instructions manipulate data as 32-bit numbers. In addition, two 16-bit half words or four 8-bit bytes can be
manipulated as 32-bit values.
In the table, note the meaning of these symbols:
• s = sign bit(s)
• d = data bit(s)
• "." = binary point in the format column. Bits to the left are the whole part of the data, and bits to the right are
the fractional part of the data. Where applicable, it is also inserted in the representation columns, though the
binary point itself is not part of the data.
Endianness
Both internal and external memory are accessed in little endian byte order. For more information, see the memory
transaction model.
The logic of the carry bits (AC0 and AC1) is based on unsigned magnitude arithmetic. The bit is set if a carry is
generated from bit 16 (the MSB). The carry bits are most useful for the lower word portions of a multiword opera-
tion.
ALU results generate status information. For more information about using ALU status, see Using Computational
Status.
For 32-bit signed fractional operands, the 64-bit product output is format adjusted (sign-extended and shifted one
bit to the left-before being applied to the A1:0 accumulator pair). Bit 63 of the product lines up with bit 64 of
A1:0 (which is bit 0 of A1.X), and bit 0 of the product lines up with bit 1 of A1:0 (which is bit 1 of A0.W). The
Least Significant Bit (LSB) is zero-filled. Note A0.X is not used when the combined register A1:0 holds a 72-bit
accumulation result of 32-bit operands.
For other 32-bit integer and unsigned fractional multiplier operands, the 64-bit product is not shifted before being
applied to A1:0.
The result of multiplying 32-bit complex operands consisting of a 16-bit imaginary part and a 16-bit real part is a
pair of 40-bit signed fractions or signed integers. The accumulation proceeds as for fractional or integer multiplica-
tion with the real part of the accumulation performed in A0 and the imaginary part in A1.
MAC results generate status information when they update accumulators or when they are transferred to a destina-
tion register in the register file. For more information, see Using Computational Status.
SHIFTED ZERO
OUT FILLED
P SIGN,
7 BITS MULTIPLIER P OUTPUT
31 31 31 31 31 31 31 31 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
7 6 5 4 3 2 1 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
A0.X A0.W
31 31 31 31 31 31 31 31 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
7 6 5 4 3 2 1 0 31 30 29 28 27 26 25 24 23 22 21 20 1 1 1 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
A0.X A0.W
Shifter results generate status information. For more information about using shifter status, see Using Computation-
al Status.
Unbiased Rounding
The convergent rounding method returns the number closest to the original number. In cases where the original
number lies exactly halfway between two numbers, this method returns the nearest even number (the one containing
an LSB of 0). For example, when rounding the three-bit, two's-complement fraction 0.25 (binary 0.01) to the near-
est two-bit, two's-complement fraction, the result would be 0.0 because that is the even-numbered choice between
0.5 and 0.0. Since it rounds up and down based on the surrounding values, this method is called unbiased rounding.
Unbiased rounding uses the ALU's capability of rounding 72-bit results at the boundary between bit 31 and bit 32
and 40-bit results at the boundary between bit 15 and bit 16. Rounding can be specified as part of the instruction
syntax. When rounding is selected, the output register contains the rounded 16-bit result. The accumulator is never
rounded.
The accumulator uses an unbiased rounding scheme. The conventional method of biased rounding adds a one into
bit position 31 or 15 of the adder chain. This method causes a net positive bias because the midway value is always
rounded upward.
The accumulator eliminates this bias by forcing bit 32 or bit 16 in the result output to 0 when it detects this mid-
way point. Forcing this bit to 0 has the effect of rounding odd values in the discarded part of the result upward and
even values downward, yielding a large sample bias of 0, assuming uniformly distributed values.
The following examples use x to represent any bit pattern (not all zeros). The example in the Unbiased Multiplier
Rounding figure shows a typical rounding operation for A0, but the example also applies to A1.
UNROUNDED VALUE:
X X X X X X X X X X X X X X X X 0 0 1 0 0 1 0 1 1 X X X X X X X X X X X X X X X
. . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . .
ROUNDED VALUE:
X X X X X X X X X X X X X X X X 0 0 1 0 0 1 1 0 0 X X X X X X X X X X X X X X X
A0.X A0.W
X X X X X X X X X X X X X X X X 0 1 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
. . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . .
A0 BIT 16 = 1:
X X X X X X X X X X X X X X X X 0 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ROUNDED VALUE:
X X X X X X X X X X X X X X X X 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
A0.X A0.W
Biased Rounding
The round-to-nearest method also returns the number closest to the original. However, by convention, an original
number lying exactly halfway between two numbers always rounds up to the larger of the two. For example, when
rounding the three-bit, two's-complement fraction 0.25 (binary 0.01) to the nearest two-bit, two's-complement
fraction, this method returns 0.5 (binary 0.1). The original fraction lies exactly midway between 0.5 and 0.0 (binary
0.0), so this method rounds up. Because it always rounds up, this method is called biased rounding.
The RND_MOD bit in the ASTAT register enables biased rounding. When the RND_MOD bit is cleared, the RND option
in multiplier instructions uses the normal, unbiased rounding operation, as discussed in Unbiased Rounding.
When the RND_MOD bit is set (=1), the processor uses biased rounding instead of unbiased rounding. When operat-
ing in biased rounding mode, all rounding operations performed on 72-bit results with A0.W set to 0x80000000
round up, rather than only rounding odd values up. Similary, all rounding operations performed on 40-bit results
with A0.L/A1.L set to 0x8000 round up. For an example of biased rounding, see Table 2-7 Biased Rounding in
Multiplier Operation.
Biased rounding affects 40-bit results only when the A0.L/A1.L register contains 0x8000 and 72-bit results only
when A0.W contains 0x80000000. All other rounding operations work normally. This mode allows more efficient
implementation of bit-specified algorithms that use biased rounding, such as the Global System for Mobile Com-
munications (GSM) speech compression routines.
Truncation
Another common way to reduce the significant bits representing a number is to simply mask off the N - M lower
bits. This process is known as truncation and results in a relatively large bias. Instructions that do not support round-
ing revert to truncation. The ASTAT.RND_MOD bit has no effect on truncation.
R3.L = R4 (RND) ;
performs biased rounding at bit 16, depositing the result in a half word (R3.L).
R3.L = R4 + R5 (RND12) ;
performs an addition of two 32-bit numbers, does biased rounding at bit 12, and deposits the result in a half word
(R3.L).
R3.L = R4 + R5 (RND20) ;
performs an addition of two 32-bit numbers, does biased rounding at bit 20, and deposits the result in a half word
(R3.L).
ASTAT Register
The Arithmetic Status register (ASTAT) provides information about the result of an operation. The processor up-
dates the status bits in ASTAT, indicating the status of the most recent ALU, MAC, or shifter operation. For more
information, see Arithmetic Status Register .
ALU Operations
Primary ALU operations occur on ALU0, while parallel operations occur on ALU1, which performs a subset of
ALU0 operations.
The Inputs and Outputs of Each ALU table describes the possible inputs and outputs of each ALU.
Combining operations in both ALUs can result in four 16-bit results, two 32-bit results, or two 40-bit results gener-
ated in a single instruction.
adds the 16-bit contents of R1.H (R1 high half ) to the contents of R2.L (R2 low half ) and deposits the result in
R3.H (R3 high half ) with no saturation ((NS)).
adds the 16-bit contents of R2.H (R2 high half ) to the contents of R1.H (R1 high half ) and deposits the result in
R3.H (R3 high half ) with saturation ((S)). The instruction also subtracts the 16-bit contents of R2.L (R2 low half )
from the contents of R1.L (R1 low half ) and deposits the result in R3.L (R3 low half ) with saturation. For more
information, see 16-bit MAC Data Flow Details.
adds the 32-bit contents of R2 to the 32-bit contents of R1 and deposits the result in R3 with no saturation.
P3 = P1 + SP ;
adds the 32-bit contents of P1 to the 32-bit contents of SP and deposits the result in P3. Notice that the saturation
qualifier is not supported when using the pointer register file, as ASTAT bits are not affected by ALU operations on
pointer register file registers.
P3 = R1 + R2 (NS) ;
is an illegal instruction, with or without the saturation qualifier, as it attempts to source the data register file for an
operation that is writing to the pointer register file.
adds the 32-bit contents of R2 to the 32-bit contents of R1 and deposits the result in R3 with no saturation ((NS)).
It also subtracts the 32-bit contents of R2 from that of R1 and deposits the result in R4 with no saturation.
A specialized form of this instruction uses the 40-bit ALU result registers as input operands, creating the sum and
difference of the A0 and A1 registers. For example:
R3 = A0 + A1, R4 = A0 - A1 (S);
transfers the saturated 32-bit sum of the accumulators to the R3 register and the saturated 32-bit difference between
the accumulators to the R4 register.
AC0 V
AC0 V
AC0 V
MAC0 and MAC1 execute fixed-point instructions which operate on 16-bit fixed-point data and produce 32-bit
results that may be added to or subtracted from a 40-bit accumulator.
Inputs are treated as fractional, fractional complex, integer, or integer complex unsigned or two's-complement data.
Multiplier instructions include:
• Multiplication
• Multiply-accumulate with addition (with optional rounding)
• Multiply-accumulate with subtraction (with optional rounding)
• Dual versions of the above operations using 16-bit operands
MAC Operation
Each of the 16-bit multipliers, MAC0 and MAC1, has two 32-bit inputs from which it derives the two 16-bit oper-
ands. For single multiply-accumulate instructions, these operands can be any of the R[n] data registers. Each mul-
tiplier accumulates results in its accumulator register, A1 or A0, which can be saturated to 32 or 40 bits. The multi-
plier result can also be written directly to a 16- or 32-bit destination register with optional rounding.
The 32-bit multiplier, MAC10, has two 32-bit operands which can be any of the R[n] data registers. The multipli-
er accumulates results in the register pair A1:0, which are saturated to 72 bits. The 64-bit multiplier result can also
be written directly to a data register pair or it can be rounded or truncated to 32-bits and written to an individual
data register.
Each multiplier instruction has options that specify whether the inputs are in integer or fractional format, with the
format of the result matching that of the inputs. In MAC0, the inputs are either both signed or both unsigned,
whereas MAC10 and MAC1 each support a mixed-mode option.
Complex multiplication instructions executed by MAC10 also have options that specify whether the format of both
of the inputs is complex signed integer or complex signed fractional. For signed fractional inputs, the multiplier au-
tomatically left shifts the result by one bit to remove the redundant sign bit. Unsigned fractional, integer, and mixed
modes do not perform a shift for sign bit correction.
For more information regarding multiplier instructions, see MAC Instruction Options.
When MAC0 or MAC1 write to its respective accumulator register, the 32-bit result is deposited into the lower bits
of the combined accumulator register (A0.H/A0.L and A1.H/A1.L), and the MSB is sign-extended into the upper
eight bits of the register (A0.X/A1.X).
The results of 32-bit fixed-point and complex multiplication instructions utilize the A1:0 accumulator register pair
as a 64-bit meta-register. These MAC10 operations write the lower 32 bits of the 64-bit fixed-point result to A0.W
and the upper 32 bits to A1.W. Because MAC1 supplies the most significant 32 bits of the result, the resulting value
can be either sign- or zero-extended into A1.X. However, because MAC0 is providing only the lower 32 bits of the
result, A0.X is not used and is always set to zero. MAC10 writes the real part of complex results to A0 and the
imaginary part to A1.
The accumulator pair can be initialized by transferring data from a data register pair using a dual-register move. For
example, these instructions transfer 64-bit values into the 72-bit accumulator:
A1 = R1 (X), A0 = R0 (X); /* sign-extend R1:0 into A1:0 */
A1 = R1 (Z), A0 = R0 (Z); /* zero-extend R1:0 into A1:0 */
A1 = A0 = 0; /* A1:0 = 0 */
• If an overflow or underflow occurs during the operation, the saturate operation sets the specified result register
to the maximum positive or negative value. For more information, see Saturating MAC Results on Overflow.
• The NS option prevents saturation. When an overflow or underflow has occurred, the specified result register is
set to the low-order bits of the full result. The NS option is only supported for integer multiplications of 32-bit
operands.
• ASTAT.V (bit 24) and ASTAT.VS (bit 25) are set if the overflow occurs in extracting the accumulator result to
a register, with ASTAT.VS being the sticky version of the ASTAT.V bit which must be cleared explicitly by
application code.
The 32-bit MAC has two 32-bit inputs, performs a 32-bit fixed-point or complex multiplication, and either stores
the result to the 72-bit meta-register comprised of the accumulator register pair A1:0 or extracts to a 32-bit register
or to a 64-bit register pair.
For complex calculations, the 32-bit real and imaginary results are passed to 40-bit adder/subtracters, which may be
used to modify the values in the accumulator registers, with the real part in A0 and the imaginary part in A1. Alter-
natively, the result may be written directly to a R[n] data register.
For fixed-point calculations, the 64-bit product is passed to a 72-bit adder/subtracter, which may be used to modify
the values in the 72-bit meta-register comprised of the accumulator register pair A1:0, which is a concatenation of
two 32-bit registers (A0.W and A1.W) and an 8-bit register (A1.X). For example:
A1:0 += R2 * R3 ;
In this instruction, the multiply is performed, and the results are added to the previous value in the 72-bit A1:0
accumulator register pair. Alternatively, the new product could also be passed directly to any of the R[n] data regis-
ters.
TO MEMORY
32b 32b
R0
MAC
R1
(32 Bit)
R2
R3
A1:0
R4
R5
R6
R7
FROM MEMORY
The MAC may operate without the accumulation function. If accumulation is not used, the result can be directly
stored to any of the R[n] data registers or to an accumulator register. The destination can be an individual 32-bit
register or a 64-bit register pair. If the destination register is 32 bits, then the data that is extracted from the multipli-
er is the most useful information, which is dependent on the data type of the input:
• Fractional operands - the upper half of the result, which contains the sign information and the most significant
bits of the fractional data, is extracted and stored in the 32-bit destination register.
• Integer operands - the lower half of the result is extracted and stored in the 32-bit destination register.
• Complex fractional operands - the upper half of the real result is extracted and stored in the lower half of the
32-bit destination register, and the upper half of the imaginary result is extracted and stored in the upper half
of the 32-bit destination register.
• Complex integer operands - the lower half of the real result is extracted and stored in the lower half of the 32-
bit destination register, and the lower half of the imaginary result is extracted and stored in the upper half of
the 32-bit destination register.
For example:
R0 = R1 * R2 (FU) ;
The (FU) qualifier indicates that the inputs are unsigned fractional data. This instruction deposits the upper 32 bits
of the multiplication result (by default, with rounding and saturation) into R0.
This instruction uses unsigned integer operands, as designated by the (IU) qualifier:
R1 = R2 * R3 (IU, NS) ;
The lower 32 bits of the multiplication result are deposited into R1 (without saturation, as designated by the (NS)
qualifier).
This instruction is an example of a multiply being stored to a 64-bit register pair:
R1:0 = R1 * R2 ;
Regardless of the operand type, this instruction computes a 64-bit multiplication result (by default, with saturation)
and deposits the upper 32 bits into R1 and the lower 32 bits into R0.
This instruction uses complex fractional operands:
R1 = cmul(R2, R3) ;
The upper 16 bits of the real part of the multiplication result are stored to R1.L, and the upper 16 bits of the
imaginary part of the result go to R1.H.
This instruction uses complex signed integer operands, as designated by the (IS) qualifier:
R1 = cmul(R2, R3)(IS);
The lower 16 bits of the real part of the multiplication result are stored to R1.L, and the lower 16 bits of the imagi-
nary part of the result go to R1.H.
This instruction is an example of a complex multiply being stored to a 64-bit register pair:
R1:0 = cmul(R2, R3) ;
The full 32 bits of the real part of the multiplication result are stored to R0, and the full 32 bits of the imaginary
part of the result go to R1.
For backward compatibility, the Blackfin+ processor also supports a two-operand version of the 32-bit multiply in-
struction:
R0 *= R1 ;
Note that the assumptions are that the input operands are signed integers ((IS)) and that the result will not be
saturated ((NS)).
ALUs
OPERAND OPERAND
R0 R0.H R0.L SELECTION SELECTION
R1 R1.H R1.L
MAC1 MAC0
R2 R2.H R2.L
R3 R3.H R3.L
A1 A0
R4 R4.H R4.L
R7 R7.H R7.L
32b 32b
FROM MEMORY
A 31 B 31
Rm Rm
Rp Rp
M AC 0 M AC 0
39 39
A0 A0
C 31 D 31
Rm
Rm
Rp
Rp
MAC 0 M AC 0
39 39
A0 A0
NOTE: As shown in the figure, the inputs to the MAC must come from two different registers. If the two 32-bit
registers contain the same data, the equivalent of squaring a half-register or multiplying the upper half and
lower half of the same register becomes possible.
The 32-bit product is passed to a 40-bit adder/subtracter, which can modify the contents of the accumulator register
by the computed result or pass the product directly to the data register file. The 40-bit A0 and A1 accumulator
registers are each comprised of a 32-bit register (A0.W and A1.W, respectively) and an 8-bit register (A0.X and
A1.X, respectively). For example:
A1 += R3.H * R4.H ;
In this instruction, MAC1 performs a multiply of two 16-bit inputs and modifies the current 40-bit A1 accumulator
content by the computed product.
Destination
Register XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX
Destination
Register XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX
The upper 16 bits of the MAC0 multiplication result (by default, with rounding and saturation) are deposited into
the lower half of R0.
A0.X A0.H A0.L
A0 0000 0000 XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX
Destination
Register XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX
Destination
Register XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX
The lower 16 bits of the MAC1 multiplication result (with saturation) are deposited into the upper half of R0.
Finally, for a 16-bit multiply storing to a 32-bit register:
R0 = R1.L * R2.L ;
Regardless of the input operand type, 32 bits of the MAC0 multiplication result (with saturation) are stored to R0,
using MAC0.
input operands. Dual-MAC operations are frequently referred to as vector operations because data can be arranged
and stored to registers such that vector computations are possible.
An example of a dual multiply-accumulate instruction is:
A1 += R1.H * R2.L, A0 += R1.L * R2.H ;
This instruction represents two multiply-accumulate operations performed simultaneously by the core:
• In the first operation, the A1 accumulator denotes use of MAC1. The high half of R1 is multiplied by the low
half of R2, and the product is then used to modify the previous content of the A1 accumulator.
• In the second operation, the A0 accumulator denotes use of MAC0. The low half of R1 is multiplied by the
high half of R2, and the product is then used to modify the previous content of the A0 accumulator.
The results of the MAC operations may be written to registers in a number of ways:
• as a pair of 16-bit halves
• as a pair of 32-bit registers
• as an independent 16-bit register half
• as an independent 32-bit register
For example, consider the case of writing to a pair of 16-bit destination register halves:
R3.H = (A1 += R1.H * R2.L), R3.L = (A0 += R1.L * R2.L) ;
Each 40-bit accumulator is packed into a 16-bit register half. The result from MAC1 (A1) must be transferred to a
high half of a destination register (R3.H), and the result from MAC0 (A0) must be transferred to the low half of the
same destination register (R3.L).
The data format of the input operands determines the correct bits to extract from the accumulator to deposit into
the 16-bit destination register. See 16-bit Multiply Without Accumulate.
When writing to a pair of 32-bit destination registers:
R3 = (A1 += R1.H * R2.L), R2 = (A0 += R1.L * R2.L) ;
In this case, the same multiply-accumulate results in the 40-bit accumulators are instead packed into two 32-bit data
registers, with the MAC1 results going to R3 and the MAC0 results going to R2). The destination registers must be
in defined pairs (R[1:0], R[3:2], R[5:4], or R[7:6]), with the MAC1 result targeting the higher register in
the pair and the MAC0 result targeting the lower register in the pair.
Mixed modes are also supported. For example:
R3.H = (A1 += R1.H * R2.L), A0 += R1.L * R2.L ;
This instruction is an example of one accumulator being transferred to a data register while the other is used to
compute the multiply-accumulate without being transferred to a data register. Either a 16- or a 32-bit register may
be specified as the destination register.
default
No option; input data format is signed fractional.
(IS)
(FU)
(IU)
(T)
Input data operand format is signed fractional. When copying to the destination register half, the lower 16 bits
of the accumulator content is truncated.
(TFU)
Input data operand format is unsigned fractional. When copying to the destination register half, the lower 16
bits of the accumulator content is truncated.
(ISS2)
(IH)
This option indicates integer multiplication with high half-word extraction. The accumulator is saturated at
32 bits, and bits 31:16 of the accumulator are rounded and copied into the destination register half.
(W32)
Input data operand format is signed fractional with no extension bits in the accumulators at 32 bits. Left-shift
correction of the product is performed, as required. This option is used for legacy GSM speech vocoder algo-
rithms written for 32-bit accumulators. For this option only, this special case applies:
0x8000 x 0x8000 = 0x7FFF
(M)
Operation uses mixed-multiply mode. Valid only for MAC1 versions of the instruction. Multiplies a signed
fractional operand by an unsigned fractional operand with no left-shift correction, where the first operand is
signed and the second is unsigned. MAC0 performs an unmixed multiplication of signed fractional operands
unless another format as specified (i.e., MAC0 executes the specified signed/signed or unsigned/unsigned mul-
tiplication). The (M) option can be used alone or be paired with another format option.
(NS)
Operation is non-saturating. When copying the accumulator contents to a destination register, the low-order
bits are copied if the whole value will not fit in the destination. The (NS) option can only be used in conjunc-
tion with integer format options.
Shifter Operations
The shifter instructions (>>>, >>, <<, ASHIFT, LSHIFT, ROT) can be used various ways, depending on the underly-
ing arithmetic requirements. The ASHIFT and >>> instructions represent an arithmetic shift, while the LSHIFT,
<<, and >> instructions represent a logical shift.
The arithmetic shift and logical shift operations can be further broken into subsections. Instructions that are intend-
ed to operate on 16-bit single or paired numeric values (as would occur in many DSP algorithms) can use the in-
structions ASHIFT and LSHIFT. These are typically three-operand instructions.
Instructions that are intended to operate on a 32-bit register value and use two operands, such as those instructions
frequently generated by the compiler, use the >>> and >> instructions.
Arithmetic shift, logical shift, and rotate instructions can obtain the shift argument from a register or directly from
an immediate value in the instruction. For details about shifter instructions, see Shifter Instruction Summary.
Two-Operand Shifts
Two-operand shift instructions shift an input register and deposit the result into the same register.
Immediate Shifts
An immediate shift instruction shifts the input bit pattern to the right (downshift) or left (upshift) by a given num-
ber of bits. Immediate shift instructions use the data value in the instruction itself to control the magnitude and
direction of the shift operation.
For example, consider the case where R0 contains the value 0x0000_B6A3. A 4-bit downshift operation could be as
follows:
R0 >>= 0x04 ;
Register Shifts
Register-based shifts use a register to hold the magnitude of the shift. The entire 32-bit register is used as the shift
magnitude, and if this value exceeds 31, the shift result is 0.
For example, the following sequence performs a 4-bit upshift:
R0 = 0x0000B6A3 ; // Load value to be shifted
R2 = 0x4 ; // Set shift magnitude to 4
R0 <<= R2 ; // Perform left shift by 4
As a result of this sequence, the 0x0000B6A3 value in R0 is upshifted 4 bits and stored back to R0 as 0x000B_6A30.
Three-Operand Shifts
Three-operand shifter instructions shift an input register and deposit the result into a destination register.
Immediate Shifts
Immediate shift instructions use the data value in the instruction itself to control the magnitude and direction of the
shifting operation.
The following is an example of a 4-bit downshift applied to a 32-bit value:
R0 = 0x0000B6A3 ; // Load value to be shifted
R1 = R0 >> 0x04 ; // Perform right shift by 4
As a result of this sequence, the 32-bit 0x0000_B6A3 value in R0 is right-shifted by four and stored to R1 as
0x0000_0B6A.
Register Shifts
Register-based shifts use a data register to hold the shift magnitude. For ASHIFT, LSHIFT and ROT instructions
performing register-based shifts, the shift magnitude must be in the lowest six bits of a low data register half
(R[n].L). The upper 10 bits of R[n].L are masked off and ignored.
The following is an example of a register-based 4-bit logical upshift:
R0 = 0x0000B6A3 ; // Load value to be shifted
R2.L = 0x4 ; // Load shift magnitude
R1 = LSHIFT R0 by R2.L ; // Perform shift
As the shift magnitude is positive, this sequence results in a logical left shift of the 0x0000_B6A3 input data by four,
zero-filling the vacated lower four bits and storing the 0x000B_6A30 result to R1.
For a register-based 4-bit arithmetic downshift, an example sequence is:
R0 = 0xB6A30000 ; // Load value to be shifted
R2.L = -0x4 ; // Load shift magnitude
R1 = ASHIFT R0 by R2.L ; // Perform shift
As the shift magnitude is negative, this sequence results in an arithmetic right shift of the 0xB6A3_0000 input data
by four, sign-extending through the vacated upper four bits and storing the 0xFB6A_3000 result to R1.
The ROT instruction uses the shifter to shift the input operand through the Arithmetic Status register's Condition
Code bit (ASTAT.CC). When the shift is performed, the ASTAT.CC bit in inserted into the data between bit 0 and
bit 31 of the original data. For example:
R0 = 0xABCDEF12 ; // Load value to rotate through CC
R2.L = 0x4 ; // Set rotate magnitude
R1 = ROT R0 by R2.L ; // Perform rotation
Assuming that ASTAT.CC was 0 entering the above sequence, the positive magnitude in R2 results in a 4-bit left
rotation being applied to the 0xABCD_EF12 input data, with the ASTAT.CC bit value of 0 appearing between bit 0
and bit 31 of the input data. The resulting data pattern of 0xBCDE_F125 is stored to R1. Note that the ASTAT.CC
bit is included in the result at bit 3, followed by the b#101 from bits 31:29 of the input data. The input data bit 28
(which is 0) is now in ASTAT.CC.
When programming, header files containing #define statements provide constant definitions for specific bits in
memory-mapped registers. It is important to examine the definition techniques used in these header files because the
constant definitions do not contain the position of the bit; rather, these header files define bit masks. A constant
definition in a header file working with bit masks might be set to 0x20 to describe bit five of a register. The BITPOS
macro provided by the Blackfin processor assembler helps when working with bit mask definitions and bit manipu-
lation instructions. For example, the following assembly code uses a BITPOS macro with a BITTST instruction:
#define BITFIVE 0x20
CC = BITTST ( R5, BITPOS ( BITFIVE ) ) ;
The BITPOS macro parses the BITFIVE definition at program build-time, identifying the lowermost set bit to be
the fifth bit and changing the instruction passed to the assembler to:
CC = BITTST ( R5, 5 ) ;
This will result in the ASTAT.CC bit being set to the value of bit 5 of the R5 register. For detailed information about
BITPOS, see the CrossCore Embedded Studio Assembler and Preprocessor Manual.
R0 contains the source data for the operation. Some field within this register will be the source or destination of the
operation prior to the result being transferred to the instruction's 32-bit destination register, as governed by the R1
calibration register, which is structured as follows:
• R1[7:0] - length of the bit-field of interest. In this case, the field is 12 bits wide (0x0C).
• R1[15:8] - bit location where the field of interest begins. In this case, the 12-bit field begins in the R0 source
data register at bit 16 (0x10), thus defining the field of interest to be R0[27:16].
• R1[31:16] - up to a 16-bit right-justified data field, required by the DEPOSIT operation (0x0333).
The EXTRACT instruction simply reads the field of interest from the source data and places it into the destination
register with a mandatory zero-extension ((Z)) or sign-extension ((X)) applied to it. One of these qualifiers must
always be associated with the operation, as follows:
R3 = EXTRACT ( R0 , R1.L ) ( Z ) ; // R3 = 0x00000ABB
R3 = EXTRACT ( R0 , R1.L ) ( X ) ; // R3 = 0xFFFFFABB
As described, the syntax includes the 32-bit destination (R3), the 32-bit source data argument (R0), and the 16-bit
calibration argument (R1.L). In both cases, the 12-bit 0xABB bit-field of interest contained in R0[27:16] is read
from the R0 source register and is then either zero-extended (0x00000ABB) or sign-extended (0xFFFFFABB) before
being stored to the R3 destination register.
The DEPOSIT instruction takes the R0 source register data and replaces the field of interest defined by the lower
portion of the calibration register (R1[15:0]) with the data in the upper portion of the calibration register
(R1[31:16]) before storing the result into the destination register. There is both a non-extending and a sign-ex-
tending version of the instruction, as follows:
R3 = DEPOSIT ( R0 , R1 ) ; // R3 = 0xA333CCDD
R3 = DEPOSIT ( R0 , R1 ) ( X ) ; // R3 = 0x0333CCDD
As described, the syntax includes the 32-bit destination (R3), the 32-bit source data argument (R0), and the 32-bit
calibration argument (R1). In both cases, the 12-bit 0x333 bit-field defined in the upper half of the R1 calibration
register replaces the bit-field of interest in R0[27:16]. The field is deposited in place without extension
(0xA333CCDD) or zero-extended (0x0333CCDD) before being stored to the R3 destination register.
Packing Operation
The shifter also supports a series of packing and unpacking instructions. Consider the case where:
• R0 contains 0x11223344
• R1 contains 0x55667788
Packing operations return:
R2 = PACK(R0.L, R0.H); /* R2 = 0x33441122 */
R3 = PACK(R1.L, R0.H); /* R3 = 0x77881122 */
R4 = BYTEPACK(R0, R1); /* R4 = 0x66882244 */
The value of the I0 register determines what is returned to the R6 and R7 destination registers, as follows:
• I0 = 0: R6 = 0x00110022, R7 = 0x00330044
• I0 = 1: R6 = 0x00880011, R7 = 0x00220033
• I0 = 2: R6 = 0x00770088, R7 = 0x00110022
• I0 = 3: R6 = 0x00660077, R7 = 0x00880011
For more details, see the Spread 8-Bit to 16-Bit (ByteUnPack) instruction description and the Pack 8-Bit to 32-Bit
(BytePack) instruction description.
Name Description
ASTAT Arithmetic Status Register
Data Registers
There are eight 32-bit R[n] data registers for use in computations and data moves. Each may be accessed as a 32-bit
entity or as a pair of independent 16-bit registers, denoted as the low register half (R[n].L) or the high register half
(R[n].H). See the appropriate instruction reference pages for details.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
X X X X X X X X X X X X X X X X
DATA[15:0] (R/W)
Generic Data
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
X X X X X X X X X X X X X X X X
DATA[31:16] (R/W)
Generic Data
Accumulator 0 Register
The processor has two dedicated, 40-bit accumulator registers, A0 and A1. A0 may be accessed via its 16-bit low half
(A0.L), its 16-bit high half (A0.H), or its 8-bit extension (A0.X) register (see the associated A0X register documenta-
tion for details). A0 can also be accessed as a 32-bit register (A0.W), which extracts the lower 32 bits of the accumu-
lator.
The A0 and A1 accumulator registers may be combined to hold an 80-bit complex result or a 72-bit fixed-point
result. The combined accumulator register is defined to be A1:0, where the least significant 32 bits are in A0.W, the
next 32 bits are in A1.W, and the 8-bit overflow from a 32-bit fixed-point operation or the most significant eight
bits from a complex operation are in A1.X (see the associated A1X register documentation for details).
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
DATA[15:0] (R/W)
Accumulator 0 Data
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
DATA[31:16] (R/W)
Accumulator 0 Data
Accumulator 1 Register
The processor has two dedicated, 40-bit accumulator registers, A0 and A1. A1 may be accessed via its 16-bit low half
(A1.L), its 16-bit high half (A1.H), or its 8-bit extension (A1X) register (see the associated Accumulator 1 Extension
register documentation for details). A1 can also be accessed as a 32-bit register (A1.W), which extracts the lower 32
bits of the accumulator.
The accumulator registers may be combined to hold an 80-bit complex result or a 72-bit fixed-point result. The
combined accumulator register is defined to be A1:0, where the least significant 32 bits are in A0.W, the next 32 bits
are in A1.W, and the 8-bit overflow from a 32-bit fixed-point operation or the most significant eight bits from a
complex operation are in A1X.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
DATA[15:0] (R/W)
Accumulator 1 Data
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
DATA[31:16] (R/W)
Accumulator 1 Data
7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0
DATA (R/W)
Accumulator 0 Extension Data
7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0
DATA (R/W)
Accumulator 1 Extension Data
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CC (R/W)
Conditional Code Status Bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
IDLE instruction
USER
Application
Level Code
Wakeup
IDLE
instruction SUPERVISOR Emulation RTE
Event
RTE Emulation
Event
IDLE Interrupt
RST
Active RST Inactive
RESET
EMULATION
Emulation Event (1)
(1) Normal exit from Reset is to Supervisor mode. However, emulation hardware may
have initiated a reset. If so, exit from Reset is to Emulation.
User Mode
The processor is in User mode when it is not in Reset or Idle state, when it is not servicing an interrupt, NMI,
exception, or emulation event, and when the SACC bit of the SYSCFG register is zero. User mode is used to process
application level code that does not require explicit access to system registers. Any attempt to access restricted system
registers causes an exception event. The Registers Accessible in User Mode table lists the registers that may be accessed
in User mode.
Pointer and DAG Registers P[5:0], SP, FP, I[3:0], M[3:0], L[3:0], B[3:0]
Sequencer and Status Registers RETS, LC[1:0], LT[1:0], LB[1:0], ASTAT, CYCLES,
CYCLES2
Protected Memory
Additional memory locations can be protected from User mode access. A Cacheability Protection Lookaside Buffer
(CPLB) entry can be created and enabled. See Memory Management Unit in the Memory chapter for further infor-
mation.
NOTE: Return instruction will only cause the processor to enter User mode if the SACC bit of the SYSCFG regis-
ter is set to zero. When this bit is set to one, the processor remains in Supervisor mode after all event
handlers have been exited.
The processor remains in User mode until one of these events occurs:
• An interrupt, NMI, or exception event invokes Supervisor mode.
• An emulation event invokes Emulation mode.
• A reset event invokes the Reset state.
Supervisor Mode
Supervisor mode has full, unrestricted access to all processor system resources, including all emulation resources, un-
less a CPLB has been configured to prevent it and is enabled. See Memory Management Unit in the Memory chapter
for further details.
The processor services all interrupt, NMI, and exception events in Supervisor mode and remains in Supervisor mode
on return from all event handlers if the SACC bit of the SYSCFG register is set to one.
The stack pointer referenced by the SP register alias and modified by stack push/pop instructions is the Supervisor
Stack Pointer while an event is being serviced (IPEND is non-zero), and the User Stack Pointer when executing at
task level (IPEND is zero). This is the case irrespective of whether the processor is running in Supervisor or User
mode.
Only Supervisor mode can use the register alias USP, which always references the User Stack Pointer. There is no
unique alias for the Supervisor Stack Pointer, so this register can only be be referenced within an event handler using
the general stack pointer alias, SP.
ADSP-BF7xx Blackfin+ Processor 3–5
Supervisor Mode
Normal processing begins in Supervisor mode from the Reset state. The processor transitions from the Reset state to
Supervisor mode, servicing the reset event, where it remains until an emulation event or return instruction occurs to
change the mode. Before the return instruction is issued, the RETI register must be loaded with a valid return ad-
dress.
Non-OS Environments
For non-OS environments, application code should remain in Supervisor mode so that it can access all core and
system resources. On leaving the Reset state, the processor initiates operation by servicing the reset event. Emulation
is the only event that can preempt this activity; therefore, lower priority events cannot be processed.
The simplest method of keeping the processor in Supervisor mode and allowing lower priority events to be process-
ed is to set the SACC bit of the SYSCFG register before returning from the reset event with an RTI instruction. Prior
to executing the RTI instruction, RETI must be loaded with the address of the code to be executed after leaving all
event handlers.
Earlier Blackfin processors did not have a SACC bit, so it was necessary to execute all code in the lowest priority
interrupt (IVG15). Events and interrupts are described further in the Events and Interrupts section of the Program
Sequencer chapter. The interrupt handler for IVG15 is set to the application code's starting address, and then the
low-priority interrupt is forced using the RAISE 15 instruction. The IVG15 interrupt is not serviced until the re-
turn from the reset event and any pending interupts with intermediate priorities have been serviced. Therefore, be-
fore executing the RTI instruction to return from the reset event, RETI is loaded with the address of a loop that
executes in User mode until the IVG15 interrupt is serviced.
START:
/* Task level code executes in Supervisor mode */
Code written for older Blackfin processors must remain at the lowest interrupt level (IVG15) in order to stay in
Supervisor mode as shown in theStaying in Supervisor Mode Coming Out of Reset (Legacy) example.
/* Staying in Supervisor Mode Coming Out of Reset (Legacy) */
P0.L = lo(EVT15) ; /* Point to IVG15 in Event Vector Table */
P0.H = hi(EVT15) ;
P1.L = lo(START) ; /* Point to start of User code */
P1.H = hi(START) ;
[P0] = P1 ; /* Place the address of START in IVG15 of EVT */
P0.L = lo(IMASK) ;
R0 = [P0] ;
R1.L = lo(EVT_IVG15) ;
R0 = R0 | R1 ;
[P0] = R0 ; /* Set (enable)IVG15 bit in IMASK register */
RAISE 15 ; /* Invoke IVG15 interrupt */
P0.L = lo(WAIT_HERE) ;
P0.H = hi(WAIT_HERE) ;
RETI = P0 ; /* RETI loaded with return address */
RTI ; /* Return from Reset Event */
Emulation Mode
The processor enters Emulation mode if Emulation mode is enabled and either of these conditions is met:
• An external emulation event occurs.
• The EMUEXCPT instruction is issued.
The processor remains in Emulation mode until the emulation service routine executes an RTE instruction. If the
SACC bit of the SYSCFG register is zero and no interrupts are pending when the RTE instruction executes, the pro-
cessor switches to User mode. Otherwise, the processor switches to Supervisor mode.
Idle State
Idle state stops all processor activity at the user's discretion, usually to conserve power during lulls in activity. No
processing occurs during the Idle state. The Idle state is invoked by an IDLE instruction or an STI IDLE instruc-
tion. The IDLE instruction notifies the processor hardware that the Idle state is requested, whereas STI IDLE also
enables interrupts in a manner that avoids race conditions.
The processor remains in the Idle state until a peripheral or external device, such as a SPORT or the Real-Time
Clock (RTC), generates an interrupt that requires servicing.
In Example Code for Transition to Idle State, core interrupts are disabled before the device intended to wake the
core from Idle is programmed, and the STI IDLE instruction is executed. When all the pending processes have
completed, the core re-enables interrupts and disables its clocks. The use of the combined STI IDLE instruction to
enter Idle state and enable interrupts ensures that any interrupt will bring the core out of the Idle state and termi-
nate the idle instruction, rather than interrupting before the idle instruction has begun execution.
Reset State
Reset state initializes the core logic. During Reset state, application programs and the operating system do not exe-
cute, and clocks are stopped.
The core remains in the Reset state as long as system logic asserts the RESET signal. Upon deassertion, the core
completes the reset sequence and switches to Supervisor mode with event system priority 1, where it executes code
found at an address supplied by the system. Refer to theReset Control Unit (RCU) chapter in the Hardware Refer-
ence Manual for details.
The only method to enter the Reset state is by recieving a RESET signal from the system. The RAISE 1 instruction
will execute the code addressed by EVT1 at event system priority 1, but it does not actually reset the core. In both
cases, an RTI instruction will exit the priority 1 event. In neither case is a return address automatically saved in
RETI, so the register must be explicitly loaded within the event handler prior to executing the RTI instruction.
The Core State Upon Reset table summarizes the state of the core upon reset.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
STRICT (R/W)
Strict Supervisor Access
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 Program Sequencer
This chapter describes the Blackfin+ processor program sequencing and interrupt processing modules. For informa-
tion about instructions that control program flow, see the program flow control instruction reference pages. For in-
formation about instructions that control interrupt processing, see the external event management chapter. Discus-
sion of derivative-specific interrupt sources can be found in the hardware reference for the specific part.
Introduction
In the processor, the program sequencer controls program flow, constantly providing the address of the next instruc-
tion to be executed. Program flow in the chip is mostly linear, with the processor executing program instructions
sequentially.
The linear flow varies occasionally when the program uses non-sequential program structures, such as those illustrat-
ed in the program flow variations figure. Non-sequential structures direct the processor to execute an instruction
that is not at the next sequential address. These structures include:
• Loops - one sequence of instructions executes several times with zero overhead (no latency between the loop
bottom instruction and the loop top instruction).
• Subroutines - an intentional vector from sequential flow to execute instructions from another part of memory
before resuming from where the vector was placed.
• Jumps - an intentional vector from sequential flow to execute instructions from another part of memory that
does not return to where the vector was placed.
• Interrupts and Exceptions - run-time events that trigger a vector to a specified subroutine that gets executed
before returning flow to the application at the point at which the vector occurred.
• Idle - this instruction causes the core to stop executing the application and hold its current state until an inter-
rupt occurs, at which point the programmed vector for that interrupt is taken to service the interrupt before
returning flow to after the IDLE; instruction and resuming execution of the application code.
IRQ
CALL INSTRUCTION IDLE
RTS RTI
CCLK
CORE EVENT CONTROLLER
EMULATION
ILAT RESET
EVT13
EVT12
EVT14
EVT15
EVT10
EVT11
EVT7
EVT6
EVT1
EVT0
EVT8
EVT5
EVT2
EVT4
EVT9
EVT3
IMASK NMI
IPEND EXCEPTIONS
HARDWARE ERRORS
CORE TIMER
RAB 32
PROGRAM SEQUENCER
PREG 32
ADDRESS
ARITHMETIC
SYSCFG RETS LC0 LT0 LB0 PROGRAM UNIT
SEQSTAT RETI LC1 LT1 LB1 COUNTER
RETX
RETN
RETE IAB 32
CYCLES LOOP FETCH
CYCLES2 COMPARATORS COUNTER
L1
INSTRUCTION
MEMORY
INSTRUCTION ALIGNMENT IDB 64
DECODER UNIT
LOOP
BUFFERS
JTAG TEST
DEBUG AND
EMULATION
Sequencer-Related Registers
The Non-Memory-Mapped Sequencer Registers table lists core registers associated with the sequencer. Except for
the PC register, all sequencer-related registers are directly readable and writable by move instructions, for example:
SYSCFG = R0 ;
P0 = RETI ;
Manually pushing or popping registers to or from the stack is done using the explicit instructions:
[--SP] = Rn ; /* for push */
Similarly, all non-memory-mapped sequencer registers can be pushed to and popped from the system stack:
[--SP] = CYCLES ; /* for push */
In addition to these core sequencer registers, there is a set of memory-mapped registers that interact closely with the
program sequencer, such as the Event Vector Table (EVTx) registers. For information about these interrupt control
registers, see Events and Interrupts. Although the registers of the Core Event Controller are memory-mapped, they
still connect to the same 32-bit Register Access Bus (RAB) and perform in the same way; however, the System Event
Controller (SEC) registers reside in the SYSCLK domain. For debug and test registers, see the processor hardware
reference manual.
Instruction Pipeline
The program sequencer determines the next instruction address by examining both the current instruction being
executed and the current state of the processor. If no conditions require otherwise, the processor executes instruc-
tions from memory in sequential order by incrementing the look-ahead address.
The processor has a 10-stage instruction pipeline, shown in the Stages of Instruction Pipeline table.
The program sequencer also controls stalling and invalidating the instructions in the pipeline. Multi-cycle instruc-
tion stalls occur between the IF3 and DEC stages. DAG and sequencer stalls occur between the DEC and AC
stages. Computation and register file stalls occur between the DF2 and EX1 stages. Data memory stalls occur be-
tween the EX1 and EX2 stages.
NOTE: The sequencer ensures that the pipeline is fully interlocked and that all the data hazards are hidden from
the programmer.
Multi-cycle instructions behave as multiple single-cycle instructions being issued from the decoder over several clock
cycles. For example, the Push Multiple or Pop Multiple instruction can push or pop from 1 to 14 Dregs and/or
Pregs, and the instruction remains in the decode stage for a number of clock cycles equal to the number of registers
being accessed.
Multi-issue instructions are 64 bits wide and consist of one 32-bit instruction and two 16-bit instructions. All three
instructions execute in the same number of cycles as the slowest of the three.
Any non-sequential program flow can potentially decrease the processor's instruction throughput. Non-sequential
program operations include:
• Jumps
• Subroutine calls and returns
• Interrupts and returns
• Loops
Branches
One type of non-sequential program flow that the sequencer supports is branching. A branch occurs when a JUMP
or CALL instruction begins execution at a new location other than the next sequential address. For descriptions of
how to use the JUMP and CALL instructions, see the program flow control instruction reference pages. Briefly:
• A JUMP or a CALL instruction transfers program flow to another memory location. The difference between a
JUMP and a CALL is that a CALL automatically loads the return address into the RETS register. The return
address is the next sequential address after the CALL instruction. This load makes the address available for the
CALL instruction's matching return instruction (RTS;), allowing easy return from the subroutine.
• A return instruction causes the sequencer to fetch the instruction at the return address, which is stored in one
of the RETx registers. The types of return instructions include:
• return from subroutine (RTS), associated with the RETS register
• return from interrupt (RTI), associated with the RETI register
• return from exception (RTX), associated with the RETX register
• return from emulation (RTE), associated with the RETE register
• return from non-maskable interrupt (RTN), associated with the RETN register
• A JUMP instruction can be conditional, depending on the status of the ASTAT.CC bit. These instructions are
immediate and may not be delayed. The program sequencer can evaluate the ASTAT.CC bit to decide whether
or not to execute a branch. If no condition is specified, the branch is always taken. Conditional JUMP instruc-
tions use static branch prediction to reduce the branch latency caused by the length of the pipeline, when dy-
namic branch prediction is not enabled.
Branches can be direct or indirect. A direct branch address is embedded in the instruction itself (e.g., JUMP 0x30),
whereas an indirect branch gets its address from the contents of a Preg (e.g., JUMP(P3)). Both direct and indirect
branches can be PC-relative or absolute.
Each of the supported jump lengths has its own version of the JUMP instruction, the exact form of which is differen-
tiated by appending the appropriate modifier and argument:
• Short jumps (requires 13-bit offset): JUMP.S 0xnnnn;
• Long jumps (requires 25-bit offset): JUMP.L 0xnnnnnnn;
• Extra-long jumps (requires a 32-bit offset): JUMP.XL 0xnnnnnnnn;
The assembler will replace JUMP instructions with the appropriate JUMP.S, JUMP.L or JUMP.XL instructions.
Rather than hard-coding jump target addresses, symbolic addresses can be used in assembly source files. Symbolic
addresses are called labels and are marked by a trailing colon. See the CrossCore Embedded Studio Assembler and Pre-
processor Manual for details.
JUMP mylabel ;
/* skip any code placed here */
mylabel:
/* continue to fetch and execute instructions beginning here */
Direct jumps to an absolute address are also supported. The target address is taken directly from the 32-bit immedi-
ate value in the instruction. For an absolute jump, use JUMP.A 0xnnnnnnnn.
Subroutines
Subroutines are code sequences that are invoked by a CALL instruction. Assuming the stack pointer SP has been
initialized properly, a typical scenario could look like the following:
/* parent function */
R0 = 0x1234 (Z); /* CCES compiler passes 1st argument in R0 */
CALL my_function;
Due to the syntax of the push-multiple and pop-multiple instructions requiring that the higher registers appear first,
the CCES compiler uses the upper data and pointer registers for local purposes and the lower registers to pass argu-
ments and store return values. See the address arithmetic unit chapter for more details on stack management, as well
as the CrossCore Embedded Studio Compiler and Library Manual for register usage.
The CALL instruction not only redirects the program flow to the my_function routine, it also writes the address of
the instruction following the CALL instruction into the RETS register such that the RETS register holds the address
where program execution resumes after the RTS instruction executes. In the above example, this is the address of the
[P0]=R0; instruction. The return address is not passed to any stack in the background; rather, the RETS register
functions as a single-entry hardware stack. This scheme enables " leaf functions" (subroutines that do not contain
further CALL instructions) to execute with less overhead, as no bus transfers need to be performed. If a subroutine
calls other functions, it must save the content of the RETS register explicitly, most likely via stack operations:
/* parent function */
CALL function_a;
/* continue here after the call */
JUMP somewhere_else;
NOTE: It is recommended that assembly programs meet the same conventions used by the C/C++ compiler.
The CCES compiler passes up to three arguments in registers and utilizes the stack to pass any arguments beyond
three that are defined in the function prototype . The following assembly example shows how to pass and return
arguments using the stack:
_parent:
R0 = 1; /* load argument 1 */
R1 = 3; /* load argument 2 */
[--SP] = R0; /* push argument 1 to stack */
[--SP] = R1; /* push argument 2 to stack */
CALL _sub; /* call subroutine */
R1 = [SP++]; /* R1 = 4 */
R0 = [SP++]; /* R0 = 2 */
_parent.end:
_sub:
[--SP] = FP; /* save frame pointer */
FP = SP; /* create new frame */
[--SP] = (R7:5); /* save clobbered registers R7, R6, and R5 */
Since the stack pointer SP is modified inside the subroutine for local stack operations, the frame pointer FP is used
to save the original state of SP. Because the 32-bit frame pointer itself must be pushed onto the stack first, the FP is
four bytes beyond the original SP address.
The Blackfin+ instruction set features a pair of instructions that provides cleaner and more efficient functionality
than the above example, LINK and UNLINK. These multi-cycle instructions perform multiple operations that can be
best explained by the equivalent code sequences shown in the table.
The following subroutine is similar to the previous example, except the LINK and UNLINK instructions are used.
This means that the RETS register is also saved to the stack to enable nested subroutine calls, and the SP can option-
ally be adjusted to allow for local information to be stored on the stack. Because of this, another 32-bit value is being
pushed to the stack, which means the value stored to FP is now eight bytes from the original SP address instead of
four. Additionally, since no local frame is required to accommodate local variables on the stack, the LINK instruction
gets the parameter "0", as the SP does not get adjusted:
_sub2:
LINK 0; /* creates new frame, saves RETS and SP */
[--SP] = (R7:5); /* save clobbered registers R7, R6, and R5 */
R6 = [FP+8]; /* R6 = 3, from the stack push of R1 in _parent */
R7 = [FP+12]; /* R7 = 1, from the stack push of R0 in _parent */
R5 = R6 + R7; /* processing */
R6 = R6 - R7;
If subroutines require local, private, and/or temporary variables, the stack can be used. The LINK instruction takes a
parameter that specifies the size of the stack memory required for this. The following example provides two local 32-
bit variables and initializes them to zero when the routine is entered:
_sub3:
LINK 8; /* save FP/RETS, allocate 8 stack bytes for local data */
[--SP] = (R7:0, P5:0); /* save all potentially clobbered registers */
R7 = 0 (Z); /* set initialization value to 0 */
[FP-4] = R7; /* initialize 1st local variable on stack */
[FP-8] = R7; /* initialize 2nd local variable on stack */
/* code goes here */
(R7:0, P5:0) = [SP++]; /* restore preserved registers */
UNLINK; /* restore SP, FP, and RETS */
RTS;
_sub3.end:
For more information, see the LINK and UNLINK instruction reference pages.
Conditional Processing
The Blackfin+ processors support conditional processing through conditional jump and move instructions. Condi-
tional processing is described in the following sections:
• Conditional Code Status Bit
• Conditional Branches
• Branch Prediction
• Speculative Instruction Fetches
• Conditional Register Move
AV0 = CC;
These nine ways of accessing the ASTAT.CC bit are used to control program flow. The branch is explicitly separated
from the instruction that sets the arithmetic status bits. A single bit resides in the instruction encoding that specifies
the interpretation for the value of ASTAT.CC. The interpretation is to "branch on true" or "branch on false".
The comparison operations have the form CC = expr, where expr involves a pair of registers of the same type (e.g.,
Dreg or Preg) or a single register and a small immediate constant. The small immediate constant is a 3-bit signed
number (-4 through 3) for signed comparisons and a 3-bit unsigned number (0 through 7) for unsigned compari-
sons.
The sense of ASTAT.CC is determined by equal (==), less than (<), and less than or equal to (<=) operators. There
are also bit test operations that test whether or not a bit in a 32-bit data register is set.
Conditional Branches
The sequencer supports conditional branches. Conditional branches are JUMP instructions whose execution branch-
es or continues linearly, depending on the value of the CC bit. The target of the branch is a PC-relative address from
the location of the instruction, plus or minus an offset. The PC-relative offset is an 11-bit immediate value that
must be a multiple of two (imm11m2), thus providing an effective dynamic range of -1024 to +1022 bytes.
For example, the following instruction tests the CC status bit and, if it is positive, jumps to a location identified by
the label dest_address:
IF CC JUMP dest_address ;
Similarly, a branch can also be taken when the CC bit is not set:
IF !CC JUMP other_addr ;
NOTE: Take care when conditional branches are followed by load operations. For more information, see the
Load/Store instruction reference pages.
Branch Prediction
Branches can be accelerated if the processor predicts the target of an upcoming branch instruction before it has been
committed and fetches instructions from the target sooner, thus reducing the number of instructions that need to be
aborted. Prediction can be performed dynamically based upon the location and direction of the branches the pro-
cessor has previously executed, or prediction may be static based upon information supplied by the programmer.
Blackfin+ processors support dynamic branch prediction. Software can check for this capability by testing the
BPRED bit of the FEATURE0 MMR. Older Blackfin implementations which do not have a FEATURE0 MMR do not
support dynamic branch prediction.
Dynamic branch prediction is enabled by setting the SYSCFG.BPEN bit. Once enabled, the branch latency of all
branches, whether taken or not taken, is significantly reduced. The latency is never longer than the latency of the
branch had it been statically mispredicted.
The dynamic predictor comes up in a state in which it can immediately start executing. There may be some delay
before it can improve the latency of branches, but software does not need to do anything other than set
SYSCFG.BPEN to enable it. The static prediction bit in conditional branch instructions is used to initialize the dy-
namic prediction. Prediction is validated before the instruction at the branch address is committed, so self-modify-
ing code and code overlays will work as expected with the predictor enabled.
Additional control registers are provided for test, verification, and tuning of the the dynamic branch predictor. Soft-
ware should not need to use these registers to achieve reasonable performance in the general case. For more informa-
tion on the dynamic branch predictor see Dynamic Branch Prediction.
Statistically, dynamic branch prediction is far more successful than static prediction and can lead to significant im-
provements in program performance without code modification. However, the predictor consumes power and is a
feature that is a candidate to be disabled in low-power applications.
When dynamic branch prediction is disabled or not available, the sequencer supports static branch prediction to
accelerate execution of conditional branches. These branches are executed based on the state of the ASTAT.CC bit.
In the EX2 stage, the sequencer compares the actual ASTAT.CC bit value to the predicted value. If the value was
mispredicted, the branch is corrected, and the correct address is available for the WB stage of the pipeline.
The branch latency for statically predicted conditional branches is as follows:
• Correctly predicts a non-taken branch: 0 CCLK cycles
• Mispredicts a non-taken branch: 8 CCLK cycles
• Correctly predicts a taken branch: 4 CCLK cycles.
• Mispredicts a taken branch: 8 CCLK cycles.
For all unconditional branches, the branch target address computed in the AC stage of the pipeline is sent to the
Instruction Fetch Address (IFA) bus at the beginning of the DF1 stage. All statically predicted unconditional
branches have a latency of 4 CCLK cycles. Consider the example in the Branch Prediction table:
BP RAM
The Blackfin+ fetch unit operates on instruction data that is 64 bits wide and 64-bit-aligned in memory. Each 64-
bit segment is referred to as a line, with one line being fetched for each instruction fetch. The BP is designed to
predict two possible branches for each line.
The BP Table RAM can be viewed as two-way set associative. Each row will hold the data for two branches or one
line. The first branch learned for a line will be associated with Set 0 and the second branch with Set 1. Additional
branches will be added by alternating and overwriting Set 0 or Set 1. The data in each set can be divided into two
32-bit parts, the TAG and the TARG. The TAG section contains the branch source information, and the TARG
contains the branch destination (target address).
The BP uses 64 bits for each table entry in Set 1 or Set 0. The LRU bit points to the oldest branch table entry that
was accessed for each line. This data is used to determine which set to overwrite when a new branch is to be learned.
It is written to both the TAG0 and TAG1 portions of the row whenever either Set 0 or Set 1 is written. The other
bit assignments for the branch TAG and TARG portions of table entries can be seen in the BP Table Entry Struc-
ture table.
NOTE: Once the BP_CFG.CLRBP bit is set, flushing the BP Table requires 150 core clock (CCLK) cycles, which
must be accounted for in software before re-enabling branch prediction (i.e., setting the SYSCFG.BPEN
bit again).
The BP_CFG MMR also provides the enable bits for each of the branch types that is supported. When the enable
bits are set to 1, the BP will execute requests to learn branches of that specific type and add them to the BP Table.
When set to 0, new branches for the specified type will not be learned; however, branches of this type that are al-
ready in the BP Table will continue to be predicted and updated.
The Branch Instructions Supported by BP table shows the branch instructions supported by the BP. The BP type
code which is stored with the branch TAG data is shown following each instruction type.
Prediction for all branch types should be enabled for best average performance. However, prediction for individual
branch types may be disabled to fine tune BP operation for a specific application. Empirical testing may identifty a
more optimal configuration for specific applications, but experimentation towards that goal should begin by measur-
ing program performance with all the enable bits set.
BP Store Buffers
The BP receives information from the sequencer pipeline control logic which tells it when to learn and update data
about branches that it is executing. This information comes to the BP at several places in the pipe and on different
phases of the clock. This requires the BP to align and store data in order to enter it into the BP Table RAM in a
single access. The strategy of executing management table operations between instruction fetches, which are not
easily predictable, also requires the BP to buffer data before it is entered into the table.
To meet these requirements, the BP has two data store buffers. These buffers store the data coming from the se-
quencer that is used to load the TAG and TARG portions of the BP Table entry. They also store information such as
whether to learn or update, as well as data required for updating prediction states.
Each store buffer is managed by a three-value state machine: idle, waiting for additional data, or full. The store buf-
fer enters the full state once it has all of the data that it needs to complete each type of table access operation. When
the buffer is full, it generates a request to the table state control machine to move its data to the table. It waits in the
full state until the table state control machine accepts its data and begins writing it to the table. The store buffer
machine can then move to idle, waiting, or full with a new buffer full of data.
The BP has additional store buffer logic which controls the next buffer to load and the order that buffer requests are
fed to the table state control machine.
BP Table Control
The table control state machine manages four types of table accesses that can be requested by the store buffers:
• Learn access - creates a new entry in the table and writes TAG and TARG data to either Set 0 or Set 1 of the
table row indicated by bits 9:3 of the branch source address found in the TAG.
• Update access - changes the prediction values in the TAG fields of Set 0 or Set 1 based on the branch source
address and information provided by the sequencer when the predicted branch is executed and an update is
requested.
• Instruction mispredict access - occurs when the type of a prediction does not match the type expected by the
sequencer. When this occurs, the sequencer requests an instruction mispredict access and provides the offend-
ing source address. The table control state machine then executes an instruction mispredict access, which sets
the Valid bit to 0 in the TAG field of the appropriate entry in the table. This prevents further predictions from
this entry.
• Address mispredict access - occurs when the type of prediction matches what the sequencer expects, but the
target address does not. When this occurs, the sequencer requests an address mispredict access and provides the
offending source and correct target address. The table control state machine then executes an address mispre-
dict access, which updates the TARG field of the appropriate entry in the table with the correct target address.
This insures that future predictions to this source address will point to the correct target address.
NOTE: While the learn and update accesses are all that should be necessary for static code, self-modifying code
and code overlays can cause the BP Table to contain outdated branch entries which will cause the BP to
send incorrect predictions to the sequencer. Supporting these use cases requires the instruction and ad-
dress mispredict accesses.
NOTE: The LRU bit is toggled and updated for each of the four types of table accesses.
To support the four types of table accesses, the table control state machine has seven states:
• Idle - occurs when there are no access requests from the store buffers.
• Check - moves the data from a requesting store buffer to a local table control buffer and executes a RAM read
using bits 9:3 of the source address from the local buffer for the RAM address. The state machine may remain
in the check state for several cycles until a non-fetch cycle is available for the read to execute in.
• Process - entered once the check state read completes. Data from the check state read is used during this state to
determine if the branch in the local buffer was found in the table, calculate new prediction values if the access
is an update, set the next valid bit to 0 if the access is an instruction mispredict, and set the next value for the
LRU bit. The process state always requires only one cycle, and instruction fetches and predictions can be per-
formed while it is executing.
Following the process state, the state machine will move into one of the remaining four write states (one for each of
the learn, update, instruction mispredict, and address mispredict access types contained in the local buffer, as de-
tailed above). Each of these access types will execute a RAM write using data obtained from the local buffer or calcu-
lated during the process cycle. The four different states enable different RAM write control lines, depending on the
type of data which needs to be written for each access type. The state machine may remain in any of these four write
states for several cycles until a non-fetch cycle is available for the write to execute in. Upon completion of the write,
the state machine will return to idle or can move to the check state and begin processing the next store buffer request
immediately.
The local control buffer managed by the table control state machine stores the same data as a store buffer. The re-
questing store buffer is freed as soon as the check state is entered, therefore three table access requests may be in-
flight at any given time.
The table control state machine waits in the check and write states until a non-fetch cycle is available. To ensure this
wait is not indefinite, the number of sequential fetch cycles are counted. If a threshold is exceeded, the sequencer is
requested to hold off an instruction fetch for one cycle. The STMOUTVAL field of the BP_CFG register holds the
threshold value, and the STMOUTCNTR field of the BP_STAT register holds the current value of the counter.
All types of table accesses perform a table read when in the check state. If the access is a learn request, a matching
entry is not expected to be found in the table. If a matching entry is found, the DFL bit in the BP_STAT register is
set, and the new data is not written to the table. This condition can occur when multiple learn requests are issued by
the sequencer as a result of a delayed entry of the first request into the branch table. If the access is an update,
instruction mispredict, or address mispredict and the entry is not found, the NFL bit in the BP_STAT register is set,
and no data is written to the table. This can occur when a table entry is overwritten with a new branch entry just
after it is used to make a prediction. Both these conditions can occur during normal BP operation. The DFL and
NFL bits are sticky and are reset by writing-1-to-clear the CLRDFL and CLRNFL bits of the BP_CFG register.
Table Initialization
The BP Table RAM contains LRU bits, valid bits, and prediction value bits that are used for control purposes. These
bits must be in a known state before the BP Table can be used for predictions, so the entire table is initialized when-
ever the core is reset. The initialization process for the BP Table is triggered whenever RESET is asserted. All of the
entries in the BP Table are written with 0s, one row at a time, which requires approximately 150 core clock cycles.
During this interval, all other types of accesses to the table are blocked; hence, no prediction, learning, or register
table access is possible.
During initialization, the BP Table is unavailable, but the store buffers are allowed to operate, which means that
learning and update requests will be loaded to the store buffers. They will remain in the buffers until the BP Table is
done initializing and is ready to accept store buffer requests, at which point the last two BP operations requested by
the sequencer during the initialization period will be loaded to the BP Table. The operations requested by the se-
quencer prior to the last two requests are lost.
The BP also provides the capability to re-initialize the BP Table while the core is not in the Reset state. This feature
may be useful in situations where code overlays are being swapped, where it may be more cycle-efficient to remove
stale branches from the table before learning branches associated with the new code block. While this would pre-
vents the need to individually correct stale branches through instruction or address mispredict operations, the useful-
ness of this approach must be evaluated on a case-by-case basis.
To reinitialize the BP Table, set the self-clearing BP_CFG.CLRBPbit. Writing this BP Table Clear bit will trigger the
same initialization process that occurs when RESET is asserted, and the BP will begin predicting and learning with a
clean BP Table after the required delay.
NOTE: Software must accommodate the required delay such that no branch instruction associated with any of the
enabled BP_CFG bits gets executed while the BP Table is resetting.
Sequencer BP Requests
The sequencer produces four types of requests which are loaded to the store buffers and then used by the BP to
modify the BP Table: learn, update, instruction mispredict, and address mispredict. The sequencer provides the data
for these requests at two different points during pipe execution, which is described next.
The BP receives the earliest requests in pipe stage "E", which are referred to as mid-pipe requests, and the targets for
these branches are received in pipe stages "F" and "G". The second group of requests occurs in pipe stage “J”, which
are referred to as late-pipe requests, and the targets for these branches are received at the same time (in pipe stage
“J”).
Mid-pipe requests include learns for most static branches, including jumps, calls, and returns from subroutines. Dy-
namic branches or conditional branches (BRCCs) which are predicted taken (BP argument) are also learned at the
4–20 ADSP-BF7xx Blackfin+ Processor
Conditional Processing
mid-pipe point. BRCCs which are not predicted taken (no BP argument) are not learned mid-pipe. The Instruction
Mispredict request is also asserted at mid-pipe.
Update requests for all branch types occur late-pipe. BRCCs which are not predicted taken (no BP argument) and
are taken (mispredicted) are learned at this point. BRCCs which are predicted taken (BP argument) and are not
taken (mispredicted) were learned at mid-pipe and are not re-learned at this point. The first prediction for these
branches will be incorrect but will be updated to the proper value in the update following the first prediction. Ad-
dress Mispredict requests are asserted by the sequencer in pipe stage “H”. The new target value for address mispre-
dicts isn’t asserted until pipe stage “J”, so the BP treats Address Mispredicts as late-pipe requests.
The sequencer performs update requests to modify the prediction values in the table based on whether the BP dy-
namic branch predictions were correct or not. Updates are also used to monitor the number of predictions made and
to modify the LRU bit in the BP Table entry so that older entries are overwritten first. By default, the sequencer will
generate an update for every BP prediction it receives, including predictions for static branches such as uncondition-
al jumps, calls and returns from subroutine. The BP has two alternative modes in which the table is updated less
frequently, which are selected by setting the SKUPD or SKUPDLRU bits in the BP_CFG register. Only one of these bits
should be set at a time.
• Skip Update mode is enabled when the SKUPD bit of the BP_CFG register is set. This mode causes updates to
be skipped if the prediction code for a predicted branch is strongly taken or strongly not taken. Since the pre-
diction code is set to strongly taken for all static branches, updates will be eliminated for all static branches.
Updates for BRCCs will be eliminated if the prediction code is strongly taken or strongly not taken and the
prediction was not mispredicted. Mispredicted BRCCs generate an update request regardless of the update
mode so the prediction value can be updated.
• Skip Update LRU mode is enabled when the SKUPDLRU bit of the BP_CFG register is set. This mode causes
updates to be skipped if the prediction code for a predicted branch is strongly taken or strongly not taken and
the predicted branch is not the oldest (LRU) in the table for a specific line. The additional LRU qualification
results in the newest accessed branch in the table being kept longer. This should increase the frequency of pre-
dictions for branches located near the current PC.
The Skip Update and Skip Update LRU modes sacrifice the accuracy of prediction to reduce the frequency of up-
dates. An excessive number of updates might increase the delay before branches are actually learned and impact the
frequency of instruction fetches, so this can be a net benefit; however, the default settings are expected to perform
best in general.
Idle to Full because all of the required data is available in the same cycle. When the state moves to Full, the interac-
tion with the BP Table Control state machine is the same as in the mid-pipe request case. The store buffer data is
moved, and the store buffer state machine can return to Idle.
There are several exceptions to this basic operation due to sequencer operation or to reduce the number of cycles
that it takes for a request to move through the store buffers and be incorporated into the BP Table.
1. When a late-pipe request is sent to the BP before or at the same time as the target for a mid-pipe request - the
state machine would have moved from Idle to Wait when the mid-pipe request was received. If a late-pipe re-
quest is received at this point, it is given a higher priority because late-pipe requests will typically cause a change
of flow (COF), which means the mid-pipe request will be killed and its target never fetched or learned. When
the late-pipe request is received, the store buffer loads the data as in a normal late-pipe request, and the state
machine moves to Full. The transfer of the late-pipe request then moves to the BP Table in the normal fashion.
The mid-pipe data is overwritten, and the mid-pipe request is dropped.
2. Mid-pipe transactions can be lost if another branch which is not supported by the BP causes a COF to occur
while the state machine is in Wait - interrupts are a good example of this. When a mid-pipe operation starts,
the state machine moves to Wait. If an interrupt COF occurs before the mid-pipe target is fetched and sent to
the BP, the correct target will never be sent to the BP to complete the mid-pipe learn. In this case, the non-
learned COF causes the mid-pipe request to be dropped, and the state returns to Idle. This scenario does not
occur for late-pipe requests because all of the data is presented in the same cycle and a late-pipe learn will not
occur in the same cycle as any other non-supported branch COF. This means that the BP has everything it
needs to complete the late-pipe request, so it executes without a problem.
3. If requests from the sequencer come close together, the store buffer state machine does not have to return to
Idle before starting a new request - it is possible for the state machine to move from Full to Wait or from Full
to Full again with the appropriate new mid- or late-pipe requests. If the state machine is Full when an new
request is started, and the request to move the current data to the BP Table Control local buffer has not been
accepted, the current data will be overwritten. The number of times that data is overwritten is an important
measure of store buffer performance, so overwrites are included as one of the BP event monitoring parameters.
4. Mid-pipe instruction mispredicts are treated differently than other mid-pipe requests - instruction mispredicts
occur when a branch which is no longer valid is predicted. It is important to correct this problem quickly to
prevent the branch from being predicted a second time and incurring the unneeded fetch stalls again. To speed
up these requests, the instruction mispredict request is treated as a late-pipe request even though it occurs at
mid-pipe. When the request is received, it causes the store buffer state machine to move from Idle to Full. This
is possible since the target address normally associated with mid-pipe requests is not needed to execute the re-
quest. Once the state machine is Full, the request moves to the BP Table Control buffer in the usual fashion.
The special handling of these requests allows them to move through the store buffers in one cycle.
The final aspect of store buffer operation which needs to be discussed is the interaction among them. Store buffers
interact in two ways:
1. The order that requests are sent to the BP Table Control buffer for execution - the most recently loaded or
newest store buffer request is sent to the BP Table Control buffer first. If a request has already been asserted but
has not been accepted by the BP Table Control state machine, it will be de-asserted, and the newer request will
be asserted. The older request may be re-asserted when the newer request is accepted by the BP Table Control
state machine. This policy results in getting branches close to the current PC into the BP Table as quickly as
possible, thus increasing the probability of these branches being predicted in tight loops and increasing BP effi-
ciency.
2. The order in which sequencer requests are loaded to the buffers, which is governed by several policies:
a. The first buffer loaded after reset is always Store Buffer 0.
b. When a store buffer is written and its state goes to Full, the other buffer is selected as the next store buffer
to be written to, which results in loads being alternated between the store buffers. The contents of the
oldest store buffer will be overwritten if all of the requests have not been executed to the BP Table when a
new request is received. This behavior is seen most frequently and occurs when requests are not close to-
gether.
c. Store buffers which have just had their data accepted by the BP Table Control state machine will now be
considered empty and will be loaded next. This policy is in effect for only one cycle per request, when the
BP Table Control state machine informs the store buffers that it has completed a request and can accept
data for the next request. This cycle is typically the cycle after a BP Table write, which is the last cycle of
the completed BP Table request. Once this single cycle is completed, control of buffer loading reverts to
the policy of alternating buffers. This policy reduces the amount of older store buffer data which is over-
written and dropped, thus increasing BP efficiency. It occurs when branches are spaced close together and
a new request has been allowed to move to the BP Table ahead of a request that is waiting. Several new
requests may be loaded and executed to the table before an older request has an opportunity to be loaded.
Due to this policy, the order and timing of store buffer loads depends on the exact cycle when requests are
completed by the BP Table Control state machine. The execution of requests by the BP Table Control
state machine is also strongly a function of when instructions are fetched. Since the timing of instruction
fetches is highly variable, the net effect is that timing and loading of store buffer requests is also highly
variable. This policy can also cause older requests to be delayed a long time before actually being entered
into the table.
There is one other case where store buffer loading is affected. This is the first exception case described above, which
occurs when a late-pipe request is sent to the BP before or at the same time as the target for a mid-pipe request. As
previously noted, the late-pipe request will override the mid-pipe request, which will change the data in the store
buffer, move the buffer state to Full, and cause this buffer to be marked as the newest buffer. This will also change
the order of buffer requests executed to the BP Table even though it did not change the next buffer which was select-
ed to be loaded.
Loading the most recent requests to the BP Table first is also important in achieving quick table loads when han-
dling instruction mispredict requests. When an instruction mispredict request is received, it is the newest request
and is thus sent to the table immediately, which allows the BP to guarantee a one-cycle throughput through the store
buffers for this type of request.
The order and timing for loading sequencer requests to the store buffers affects when branch entries enter the BP
Table, thus affecting BP predictions and program performance. As has been discussed, the order, timing, and poten-
tial overwriting of requests to the store buffers are highly variable. To provide some visibility and debug capability
for this process, the Store Buffer Full (ST0FULL and ST1FULL) state bits for each of the store buffer state machines
have been brought out to the BP Status register (BP_STAT). These bits are updated every cycle and can be used to
observe store buffer operation.
BP Predictions
The BP Table is checked for branch hits each time an instruction fetch is executed. Checks are triggered within the
BP in pipe stage “A”, and predictions are available approximately 1 ½ cycles after the instruction fetch is started.
The first step in generating a prediction is to determine if there are branch entries found in the line that is being
fetched, and the BP supports predictions for up to two branch entries per line. This requires that the BP also deter-
mine where each branch instruction is located within the line. Since each line is 64 bits (eight bytes) and the mini-
mum instruction size is two bytes, the maximum number of instructions per line is four, thus requiring two bits to
locate the start of an instruction or instruction offset (SOF) within a line. These bits correspond to bits 2:1 of the
instruction address, as the two-byte minimum requirement for an op-code makes bit 0 irrelevant.
When an instruction fetch is evaluated for a prediction, the BP compares the 22-bit TAGs and the two-bit SOFs in
both Sets 0 and 1 of the appropriate BP Table row with the corresponding bits in the fetch address. If the TAGS
match, the SOF of the fetch address is less than or equal to the SOF of the entry, and the Valid bit is true, then the
entry is a branch hit. If the TAGSs match, and the Valid bit is set, but the SOF of the fetch address is greater than
the SOF of the entry, then there is no hit.
If there are no branch hits for a given fetch address, then there will be no predictions for that instruction fetch.
If there is a hit in either Set 0 or Set 1 (but not both), then the data from the set which produced the hit will be used
to generate the prediction. In this case, a mux on the BP outputs will be switched to send the required SOF, type,
and target address data from the set which hit to the sequencer. The final prediction evaluation, however, depends
on the prediction value found in the BP Table for the branch entry. If the prediction value is strongly or weakly
taken, the hit creates a Taken prediction, and the sequencer is signaled to fetch the TARG value provided by the BP.
If the prediction value is strongly or weakly not taken, the hit creates a Not Taken prediction, and the sequencer is
not signaled to fetch the target. Not Taken predictions are useful to monitor BP performance but are not sent to the
sequencer, so no COF or BP Table updates occur.
A more complex prediction process happens when two hits are found in the BP Table for a given fetch address.
When this occurs, the prediction is determined using the SOFs and prediction values for each branch entry. The
cases which are possible are discussed below. In each case, the TAGs are assumed to match the fetch address, and the
Valid bits are set for both BP Table entries. Branch A may be in Set 0 or 1 with Branch B located in the other set.
1. Fetch SOF <= Branch A SOF < Branch B SOF - the prediction value for Branch A is Strongly or Weakly Tak-
en, and the value for Branch B is a "don’t care". In this case, a Taken prediction will be issued for Branch A,
which has the lower offset and will be first in the pipe. Since it is predicted Taken, it will be fetched to avoid
the fetch overhead caused by its COF. The presence of and prediction value for Branch B are not factors be-
cause Branch B comes after Branch A in the pipe and will be killed when Branch A is executed.
2. Fetch SOF <= Branch A SOF < Branch B SOF - the prediction value for Branch A is Strongly or Weakly Not
Taken, and the value for Branch B is Strongly or Weakly Taken. In this case, a Taken prediction will be issued
for Branch B. Branch A has the lower offset, so it will be first in the pipe, but it is not taken; therefore, a
prediction and a fetch are not necessary. Branch B will be executed because Branch A is not taken. Since
Branch B will be executed and is predicted taken, issuing the prediction for it will be useful in avoiding the
fetch overhead caused by its COF.
3. Branch A SOF < Fetch SOF <= Branch B SOF - the prediction value for Branch A is a "don’t care", and the
value for Branch B is Strongly or Weakly Taken. In this case, a Taken prediction will be issued for Branch B.
This happens when a branch with a target address offset that is between that of Branch A and B is executed.
Branch A has a lower offset, but it will not be in the pipe and will never be executed, so there is no need to
predict it. Branch B is predicted taken and will be executed, so a prediction for Branch B will be useful in
avoiding the fetch overhead caused by its COF.
4. Fetch SOF <= Branch A SOF < Branch B SOF - the prediction value for Branches A and B are both Strongly
or Weakly Not Taken. In this case, a Not Taken prediction will be issued for Branch A because it has the lower
SOF. A prediction will not be created for Branch B because it will be fetched when Branch A is fetched; and,
since it is not taken, a fetch and prediction will not be required to load its target.
5. Branch A SOF < Fetch SOF <= Branch B SOF - the prediction values for Branches A and B are both Strongly
or Weakly Not Taken. In this case, a Not Taken prediction will be issued for Branch B. The prediction will not
be issued for Branch A because the SOF of the target address is greater than the SOF for Branch A. The SOF
of the target address is less than or equal to the SOF for Branch B, so Branch B is a valid hit, and a Not Taken
prediction is issued for it.
6. Branch A SOF < Branch B SOF < Fetch SOF - the prediction values for Branches A and B are both "don’t
care". In this situation, no predictions are issued because no branches exist in the line beyond Branch B.
Once the correct set and prediction type has been identified, the BP prediction mux is switched to the appropriate
set, and the prediction signals and data are sent to the sequencer, as previously described for the single-hit case.
As presented in the BP RAM Design section, there are four values for the Prediction Code which can be assigned to
each Branch Table entry: Strongly Taken, Weakly Taken, Weakly Not Taken, and Strongly Not Taken. When a stat-
ic branch such as a CALL or RTS is learned, its prediction value is assigned as Strongly Taken. This value does not
change when updates are performed for these entries. When a conditional branch (BRCC) is learned, its initial pre-
diction value is assigned as Weakly Taken. This value is recomputed each time the branch is predicted, and an up-
date is executed for the entry.
Predictions which are predicted Taken and are Taken (not mispredicted) cause the value to move one prediction
value towards Strongly Taken. Once the Prediction Code is set to Strongly Taken, it will remain in that state
through taken updates until a Mispredict update occurs, which causes the state to move to Weakly Taken.
Predictions which are predicted Taken and are Not Taken (mispredicted) cause the value to move one prediction
value towards Strongly Not Taken. Once the Prediction Code is set to Strongly Not Taken, it will remain in that
state until a Mispredict update occurs, which causes the state to move to Weakly Not Taken.
Branch predictions are not directly affected by memory stalls. The BP will check for a branch hit each time an in-
struction is fetched. If a hit is found and a prediction made, the BP will hold the prediction signals and data on its
outputs until the next instruction fetch. If the new fetch produces a hit, the prediction signals and data will be
changed to reflect the new prediction. If the new fetch does not produce a hit, the control signals are de-asserted to
indicate that there is no prediction. The data, however, is a "don’t care" and may or may not change depending on
the previous and current data fetched from the BP Table. When an instruction fetch occurs and is stalled, the BP
will check for a prediction in the first cycle of the fetch. The results of this check are held on the BP outputs through
the stall until the next fetch, thus allowing the BP data to be used at the end of the stall when it is required by the
sequencer. The only effect that memory stalls have on the BP is that they change the timing of when fetches occur.
As discussed, BP efficiency is affected by the number and timing of instruction fetches, so it is possible to see a
change in BP operation when there is a high density of memory stalls.
In general, BP predictions will occur whenever there is an instruction fetch, with one notable exception. When the
BP makes a prediction, the target address of the branch is fetched in the next cycle. It is possible that the branch
fetched may span into the next address, so the consequent address must also be fetched. This additional fetch is
referred to as the trailer; and, when this added fetch is required, the policy described above is changed. In this case,
when a Taken prediction is made, the target address and the trailer address will always be pre-fetched in the pipeline.
Since the trailer address will only be used for branches which span the two addresses and the branch is taken,
branches which occur later in the trailer will never be executed. If they will never be executed, there is no need to
predict them; therefore, the BP does not check nor make predictions for trailer addresses which are fetched.
The BP is responsible for learning branches and detecting when the branches that it has learned are being fetched,
but it is not always responsible for providing the target address of the branch. For RTS branches, the target address
can come from either an internally-maintained eight-deep call return stack or directly from the RETS register.
Even without the shown CSYNC instruction, the sequence is fully functional, but the internal behavior of the pro-
cessor changes if it is omitted. For example, if P0 were 0x8000_0000 entering this sequence, the CALL instruction
would not execute. The presence of the CSYNC instruction before it guarantees that the pipe doesn't advance beyond
the CSYNC instuction, thus no fetches are performed to satisfy the CALL instruction's dependency on the P0 con-
tent. However, if the CSYNC instruction were removed from this sequence, the instruction fetch from P0 would still
happen to satisfy the CALL instruction. Since address 0x8000_0000 resides in DDR memory space, the sequence
would attempt to trigger an instruction fetch from that location. If the DDR controller were not yet initialized
properly, the conditional instruction fetch could trigger a hardware error; thus, the CSYNC instruction is recom-
mended. See the load/store instruction reference pages for details on related data load topics.
Hardware Loops
The sequencer supports a mechanism of zero-overhead looping, meaning that there are no cycle penalties when
wrapping from the loop bottom to the loop top. The sequencer contains two loop units, each containing three regis-
ters. Each loop unit has a Loop Top register (LT0, LT1), a Loop Bottom register (LB0, LB1), and a Loop Count
register (LC0, LC1).
Zero-overhead loops are most conveniently written with the LOOP/LOOP_END construct. The loop start is marked
with a LOOP, LOOPZ or LOOPLEZ instruction, and the end of the loop is marked with a LOOP_END pseudo-instruc-
tion. The following code example shows a loop that contains two instructions and iterates 32 times.
LOOP LC0 = 32 ;
R5 = R0 + R1(ns) || R2 = [P2++] || R3 = [I1++] ;
R5 = R5 + R2 ;
LOOP_END ;
Loops that begin with the LOOP instruction decrement and test the counter at the end of the loop, exiting the loop
if the decrement results in a count of zero. At least one iteration of the loop is always executed.
The LOOPZ and LOOPLEZ instructions test the counter before the first iteration of the loop and only execute the
first iteration if the counter is initially within range. The LOOPZ instruction jumps to the instruction after the
LOOP_END when the counter is initially zero. The LOOPLEZ instruction jumps to the instruction after the
LOOP_END when the counter is initially less than or equal to zero. When the counter is initially in range, then the
LOOPZ and LOOPLEZ instructions operate in the same way as the LOOP instruction. See the LSETUP and LOOP
instruction reference pages for operation details.
The following code shows two loops with an unknown iteration count. In the first loop, the LOOP instruction is
used, so at least one iteration is executed. In the second loop, the LOOPZ instruction is used, so the number of itera-
tions that are executed will match whatever is retrieved from the address pointed to by P4.
P5 = [P4]; /* Get loop count value from memory location in P4 */
LOOP LC1 = P5;
/* loop body executed at least once */
LOOP_END;
LOOPZ LC0 = P5;
/* loop body only executed if count is initially not 0 */
LOOP_END;
The assembler translates LOOP, LOOPZ, and LOOPLEZ instructions to LSETUP, LSETUPZ, and LSETUPLEZ instruc-
tions, respectively, which contain the PC-relative address of the final instruction in the loop. The LOOP_END pseu-
do-instruction is simply there to locate the end of the loop, so it does not get translated to any instruction at all.
Upon disassembly, the replacement LSETUP-type instruction is seen.
Two sets of zero-overhead loop registers implement loops, using hardware counters instead of software instructions
to evaluate loop conditions. After evaluation, processing branches to a new target address. Both sets of registers in-
clude the Loop Counter (LC), Loop Top (LT), and Loop Bottom (LB) registers. The Loop Registers table describes
the 32-bit loop register sets.
When an instruction at address X is executed, and X matches the contents of LB0, then the next instruction execut-
ed will be from the address in LT0. In other words, when PC == LB0, then an implicit jump to LT0 is executed.
The LC0 and LC1 registers are unsigned 32-bit registers, each supporting 2 32 -1 iterations through the loop.
A loopback only occurs when the count is greater than or equal to two. If the count is non-zero, then the count is
decremented by one. For example, consider the case of a loop with two iterations. At the beginning, the count is
two. On reaching the first loop end, the count is decremented to one, and the program flow jumps back to the top
of the loop (to execute a second time). On reaching the end of the loop again, the count is decremented to zero, but
no loopback occurs because the body of the loop has already been executed twice.
The LSETUP, LSETUPZ, or LSETUPLEZ instructions can be used to load all three registers of a loop unit at once.
When executing one of these loop setup instructions, the program sequencer loads the address of the loop's last in-
struction into LBx and the address of the loop's first instruction into LTx. The bottom address of the loop is com-
puted from a PC-relative offset held in the instruction, which limits the maximum loop size to 2046 bytes. It is
recommended that the loop top address is the instruction after the loop setup instruction.
Each loop register can also be loaded individually with a register transfer, but this incurs a significant overhead if the
loop count is non-zero (the loop is active) at the time of the transfer.
For compatibility with earlier Blackfin processors, the LSETUP instruction without immediate count may contain a
start offset. With this form of the instruction, the loop top can be up to 30 bytes after the LSETUP instruction.
However, a four-cycle latency occurs on the first loopback if the LSETUP specifies a non-zero start offset.
A legacy form of the LOOP syntax is also supported. Using this syntax, a loop gets assigned a name. All loop instruc-
tions are enclosed between the LOOP_BEGIN and LOOP_END brackets.
LC0 = R0;
The processor supports a four-location instruction loop buffer that reduces instruction fetches while in loops. If the
loop code contains four or fewer instructions, then no fetches to instruction memory are necessary for any number
of loop iterations because the instructions are stored locally. The loop buffer effectively eliminates the instruction
fetch time in loops with more than four instructions by allowing fetches to take place while instructions in the loop
buffer are being executed.
The processor has no restrictions regarding which instructions can occur in a loop end position. All instructions,
including branches and calls, are allowed at the loop bottom location.
Two-Dimensional Loops
The processor features two Loop Units, each providing its own set of loop registers:
• LC[1:0] – the Loop Count registers
• LT[1:0] – the Loop Top address registers
• LB[1:0] – the Loop Bottom address registers
Therefore, two-dimensional loops are supported directly in hardware, consisting of an outer loop and a nested inner
loop.
NOTE: The outer loop is always represented by Loop Unit 0 (LC0, LT0, LB0), while Loop Unit 1 (LC1, LT1,
LB1) manages the inner loop.
To enable the two nested loops to end at the same instruction (LB1 equals LB0), Loop Unit 1 is assigned higher
priority than Loop Unit 0. A loopback caused by Loop Unit 1 on a particular instruction (PC==LB1, LC1>=2) will
prevent Loop Unit 0 from looping back on that same instruction, even if the address matches. Loop Unit 0 is al-
lowed to loop back only after LC1 is exhausted. Consequently, when no instructions appear after the inner loop
within the outer loop body, the outer loop must use LC0 while the inner loop uses LC1. The following example
shows a two-dimensional loop:
#define M 32
#define N 1024
P4 = M (Z);
P5 = N-1 (Z);
LOOP LCO = P4;
R7 = 0 ;
MNOP || R2 = [I0++] || R3 = [I1++] ;
LOOP LC1 = P5;
R5 = R2 + R3 (NS) || R2 = [I0] || R3 = [I1++] ;
R7 = R5 + R7 (NS) || [I0++] = R5;
LOOP_END ;
R5 = R2 + R3 ;
R7 = R5 + R7 (NS) || [I0++] = R5 ;
[I2++] = R7 ;
LOOP_END ;
The above example processes an MxN data structure. The inner loop is unrolled and executes N-1 times. The outer
loop is not unrolled and still provides room for optimization.
Loop Unrolling
DSP algorithms are typically optimized for speed rather than for small code size. When fetching data from circular
buffers, loops are often unrolled in order to pass only N-1 times. The initial data fetch is executed before the loop is
entered. Similarly, the final calculations are done after the loop terminates, for example:
#define N 1024
global_setup:
/* Initialize DAG registers for 2 circular buffers, 1 in each of Banks A/B */
I0.H = 0x1180; I0.L = 0x0000; B0 = I0; L0 = N*2 (Z);
I1.H = 0x1190; I1.L = 0x0000; B1 = I1; L1 = N*2 (Z);
P5 = N-1 (Z);
algorithm:
A0 = 0 || R0.H = W[I0++] || R1.L = W[I1++];
LOOP LC0 = P5;
A0+= R0.H * R1.L || R0.H = W[I0++] || R1.L = W[I1++];
LOOP_END;
A0+= R0.H * R1.L;
As shown, the accumulator register is cleared while the first data elements are prefetched before the loop is iterated,
then the loop body performs the accumulation while fetching the next elements in a single instruction. This is iter-
ated for the length of the buffer minus one such that the last accumulation occurs after the loop completes. This
technique optimizes data fetching to exactly N times, and the Iregs are reset to their initial values when processing
is complete. As such, the algorithm can subsequently be executed multiple times without any need to re-initialize the
DAG registers.
[--SP] = LC0;
[--SP] = LB0;
[--SP] = LT0;
To pop the loop registers back off the stack, thus restoring them to the state they were in upon executing the above
code, the complementary restore code to insert into the function epilog is:
LT0 = [SP++];
LB0 = [SP++];
LC0 = [SP++];
Writes or pops to the loop registers cause some internal side-effects to re-initialize the loop hardware properly. The
hardware does not force the user to save and restore all three loop registers, as there might be cases where saving one
or two of them is sufficient. Consequently, every pop instruction in the example above may require the loop hard-
ware to re-initialize again, which takes multiple cycles because the loop buffers must also be pre-filled again.
To avoid unnecessary penalty cycles, the loop hardware follows these rules:
• Restoring LC0 and LC1 registers always re-initializes the loop hardware and causes a ten-cycle "replay" penalty.
• Restoring LT0, LT1, LB0, and LB1 performs in a single cycle, if the corresponding loop counter register is zero.
• If LCx is non-zero, every write to the LTx and LBx registers also attempts to re-initialize the loop hardware and
causes a ten-cycle penalty.
In terms of performance, there is a difference depending on the order that the loop registers are popped. For best
performance, restore the LCx registers last. Furthermore, it is recommended that interrupt service routines and glob-
al subroutines that contain hardware loops terminate their local loops cleanly; that is, do not artificially break the
loops, and do not execute return instructions within the loops. This guarantees that the LCx registers are 0 when
LTx and LBx registers are popped.
LT0 = [SP++];
LB0 = [SP++];
LC0 = [SP++]; /* This will cause a "replay" (a ten-cycle refetch) */
RTI;
If the handler uses Loop Unit 0, it is a good idea to have it leave LC0=0 at the end. Normally, this happens naturally,
as the loop is fully executed. When this is true, then LT0 and LB0 restores will not incur additional cycles. If LC0 is
non-zero when these restores occur, each pop will incur the ten-cycle "replay" penalty. Popping or writing LC0 al-
ways incurs this penalty.
NOTE: The word event describes all five types of activities shown above that can disrupt application code flow.
The Event Controller manages fifteen different events in all, as there are eleven of the Interrupt type that
can have dedicated handlers.
An interrupt is an event that asynchronously changes normal processor instruction flow. In contrast, an exception is a
software-initiated event whose effects are synchronous to program flow.
The event system is nested and prioritized. Consequently, several service routines may be active at any time, and a
low-priority event may be pre-empted by one of higher priority.
The processor employs a two-level event control mechanism. The Core Event Controller (CEC) works with the Sys-
tem Event Controller (SEC) to prioritize and control all system interrupts. The SEC provides mapping between the
many peripheral interrupt sources and the prioritized general-purpose interrupt inputs of the core, which can indi-
vidually be masked in the SEC. In addition to the dedicated handlers for many events (emulation, reset, NMI, ex-
ception, hardware error interrupt, and core timer interrupt), the CEC also supports nine general-purpose interrupts
(IVG7 - IVG15). It is recommended that at least the two lowest priority interrupts (IVG14 and IVG15) be reserved
for software interrupt handlers, leaving seven prioritized interrupt inputs (IVG7 - IVG13) to support the system.
NOTE: The SEC maps all system events to IVG11, thus leaving IVG7 to IVG10 and IVG12 to IVG15 free for
use as software interrupt handlers, with the first group having higher priority than all the system-related
interrupts and the second having lower priority. Refer to the Hardware Reference Manual for your pro-
cessor for a detailed description of the SEC.
The Core Event Mapping table shows the core events and their priority level, as seen by the core, including those
controlled by the SEC on IVG11. The Core Event Source column is sorted by priority from highest to lowest, such
that all the general-purpose interrupts are lower in priority than the rest, and these general-purpose interrupts are
also prioritized from IVG7(highest) to IVG15(lowest).
NOTE: The IPEND[4] bit is not associated with an event. It is used by the Core Event Controller to temporarily
disable interrupts on entry and exit to an interrupt service routine.
This register is read-only and accessible only in Supervisor mode. For more information, see the Interrupt Pending
Register .
cleared by hardware before the first instruction in the corresponding ISR is executed. This occurs at the point the
interrupt is accepted, where the CEC clears the ILAT[N] bit while simultaneously setting the corresponding
IPEND[N] bit, thus flagging that event to now be pending on the core (either actively being processed or nested at
some level).
The ILAT register can only be accessed in Supervisor mode. While reads are straightforward, writes to ILAT can be
used to manually clear latched events for cases where latched interrupt requests need to be cancelled rather than
serviced. To clear any ILAT[N] bit, first make sure that IMASK[N] == 0, then write ILAT[N] = 1.
The RAISE instruction can be used to set ILAT[15:5]and ILAT[2:1].
The EXCPT instruction can be used to set ILAT[3].
Only the JTAG TRSTpin can control ILAT[0].
For more information, see the Interrupt Latch Register .
Event Vector
Name Event Class Register MMR Location Notes
RST Reset EVT1 0x1FE0 2004 RAISE 1 vector. Not used by Reset
Control Unit (RCU).
NMI NMI EVT2 0x1FE0 2008
EVX Exception EVT3 0x1FE0 200C
Reserved Reserved EVT4 0x1FE0 2010 Reserved.
IVHW Hardware Error EVT5 0x1FE0 2014
IVTMR Core Timer EVT6 0x1FE0 2018
IVG7 GP Interrupt 7 EVT7 0x1FE0 201C User-Programmable Software/
System Interrupt.
IVG8 GP Interrupt 8 EVT8 0x1FE0 2020 User-Programmable Software/
System Interrupt.
IVG9 GP Interrupt 9 EVT9 0x1FE0 2024 User-Programmable Software/
System Interrupt.
IVG10 GP Interrupt 10 EVT10 0x1FE0 2028 User-Programmable Software/
System Interrupt.
IVG11 GP Interrupt 11 EVT11 0x1FE0 202C System Interrupt for System Event
Controller (SEC).
IVG12 GP Interrupt 12 EVT12 0x1FE0 2030 User-Programmable Software/
System Interrupt.
IVG13 GP Interrupt 13 EVT13 0x1FE0 2034 User-Programmable Software/
System Interrupt.
IVG14 GP Interrupt 14 EVT14 0x1FE0 2038 User-Programmable Software/
System Interrupt.
IVG15 GP Interrupt 15 EVT15 0x1FE0 203C User-Programmable Software/
System Interrupt.
When interrupt nesting is not enabled, there is no need to manually manage the RETI register, as the application
must always return to the application level before recognizing any events that were latched since vectoring to service
the interrupt, and the CEC will automatically care for this.
If the service routine must be interruptible by a higher-priority interrupt, nesting of interrupts is required, as is
thoughtful management of the RETI register. Reads of the RETI register enable nesting of interrupts, and writes to
it disable nesting. Typically, RETI is simply pushed to the stack, which will both save its content and enable nesting
of interrupts; similarly, the complementary stack pop operation will disable nesting and restore the content. This
scheme enables the service routine to be broken down into both interruptible and non-interruptible sections:
isr:
[--SP] = (R7:0, P5:0); /* save core registers to stack */
[--SP] = ASTAT; /* save arithmetic status register to stack */
NOTE: If there is not a need for non-interruptible code inside the service routine, it is good programming prac-
tice to enable nesting immediately by pushing RETI to the stack in the first instruction of the ISR, thus
avoiding incurring unnecessary delays before higher-priority interrupt routines can be executed:
NOTE: Unlike the interrupt event that requires manual configuration of interrupt nesting, each of the other event
types are enabled by the architecture to pre-empt anything of lower priority (e.g., an NMI event will al-
ways interrupt an exception event, an exception event will always interrupt an interrupt event, etc.).
Emulation Interrupt
An emulation event causes the processor to enter Emulation mode, where instructions are read from the JTAG inter-
face. It is the highest priority interrupt to the core.
For detailed information about emulation, see the Debug chapter of the Blackfin+ Processor Hardware Reference
Manual.
Reset Interrupt
The Reset Control Unit (RCU) controls how the core enters and exits the reset state and supplies the software ad-
dress to which the core vectors upon exiting it. See the Hardware Reference Manual for a detailed description of the
RCU.
Executing the RAISE 1 instruction does not directly assert a core reset; rather, it simply creates an interrupt with a
priority level of one (exceeded only by the emulation interrupt above). The RAISE 1 instruction also does not save
the return address to a register, therefore it is not possible to automatically return from the interrupt vector once it is
taken.
Use of the EVT1 register to provide the vector address for the RAISE 1 instruction is enabled by clearing bit 15 of
the EVT_OVERRIDE register. Otherwise, with this bit set, the vector address is supplied directly by the RCU. A reset
signalled by the RCU always vectors to the address supplied by the RCU.
In earlier Blackfin processors, the RAISE 1 instruction caused a software reset. However, it is generally unsafe for a
core to transfer control to boot code, which assumes the core and system is coming out of reset, when in fact noth-
ing has been reset. If the former software reset functionality is desired in the Blackfin+ application, the following
software control is assumed:
• After booting, the EVT1 register is programmed with the ISR location for software interrupt level one.
• EVT_OVERRIDE[15] is cleared.
• When a RAISE 1 instruction is executed:
• The ISR goes through the appropriate mechanisms via the RCU to shut off all core interfaces.
• The RCU resets the core, seen by the core as an external reset.
The SEQSTAT.NSPECABT bit will be set upon entry to an NMI, reset, or emulation handler if a non-speculative
access such as a system MMR read or a read from I/O device memory was aborted by the event. The read will be
attempted again upon returning from the handler, which may not be desired if the read has side-effects (e.g., access-
ing a FIFO, etc.).
Exceptions
Exceptions are discussed in Hardware Errors and Exception Handling.
NOTE: It is a useful practice to reserve the two lowest priority interrupts (IVG15 and IVG14) as software inter-
rupt handlers.
Interrupt Processing
The following sections describe interrupt processing.
When multiple instructions need to be atomic or are too time-critical to be delayed by an interrupt, disable the
general-purpose interrupts, but be sure to re-enable them at the conclusion of the code sequence.
Servicing Interrupts
The Core Event Controller (CEC) utilizes the ILAT register as a single interrupt queueing element per event. The
appropriate ILAT[n] bit is set when an interrupt rising edge is detected (which takes two core clock cycles) and
cleared when the respective IPEND[n] register bit is set. The IPEND[n] bit indicates that the event vector has en-
tered the core pipeline. At this point, the CEC recognizes and queues the next rising edge event on the correspond-
ing interrupt input. The minimum latency from the rising edge transition of the general-purpose interrupt to the
IPEND[n] output assertion is three core clock cycles. However, the latency can be higher, depending on the core’s
activity level and state.
To determine when to service an interrupt, the controller logically ANDs the three quantities in ILAT[n],
IMASK[n], and the current processor priority level.
Interrupt Nesting
If the processor takes a vector to service Event A, Event A becomes "active". In the absence of other events, the
processor will complete servicing of the active event and then resume the application at the point the event was serv-
iced. If, however, there are many events to handle and it is desired to handle certain events with higher priority in a
timely fashion, nesting of interrupts is required. Interrupt nesting allows the processor to continue to respond to
higher-priority events while servicing lower-priority events. In the given example, if the higher-priority Event B oc-
curred while servicing the active Event A, then nesting allows for the processor to immediately respond to Event B.
In this case, Event B becomes the active event, and Event A is nested. Several levels of nesting are possible, as descri-
bed in the Interrupt Pending Register (IPEND) section.
The highest-priority interrupt levels (emulation, reset, NMI, and exception) automatically support interrupt nesting.
Each can pre-empt another, provided its priority is higher, and each will pre-empt any of the interrupt events, which
are by definition to be lower priority. For example, if an NMI occurs while executing an exception handler, the pro-
cessor will immediately vector to service the NMI and will pend completion of the exception handler until the NMI
event has been fully serviced. However, if an exception were to occur during an NMI handler, the processor would
not vector (instead, a double-fault condition will be raised). Conversely, if either an exception or an NMI were to
occur while processing an interrupt event, the processor would immediately vector to the appropriate handler, com-
plete servicing of that event, then return to the interrupt handler to complete servicing of the interrupt. But no
interrupt event could be configured to interrupt a higher-priority event such as an exception or NMI.
Unlike the higher-priority events that automatically support nesting, the interrupt events themselves can be pro-
grammed to optionally support interrupt nesting. For more information, see Return Registers and Instructions.
Non-Nested Interrupts
If interrupts do not require nesting, all interrupts are disabled while the interrupt service routine is executing, there-
by gating the servicing of any interrupts that occur after the vector is taken. This restriction does not apply to emu-
lation, NMI, and exception events, which will still be accepted by the system.
When the system does not need to support nested interrupts, there is no need to store the return address held in
RETI. Only the portion of the machine state used within the interrupt service routine must be saved to the stack. To
return from a non-nested interrupt service routine, only the RTI instruction must be executed, as the return address
is already held in the RETI register.
The Non-Nested Interrupt Handling figure shows an example of interrupt handling where interrupts are globally
disabled for the entire interrupt service routine.
INTERRUPTS DISABLED
DURING THIS INTERVAL.
DEC A6 A7 A8 ... A3 A4
AC A5 A6 A7 ... A3
DF1 A4 A5 A6 ... RTI
DF2 A3 A4 A5 ... In RTI
EX1 A2 A3 A4 ... In-1 RTI
EX2 A1 A2 A3 ... In-2 In-1 In RTI
Nested Interrupts
If interrupts require nesting, the return address for the currently-being-serviced interrupt (stored to RETIwhen the
vector is taken) must be explicitly saved prior to executing the higher-priority ISR and then subsequently restored
upon completion of it. Interrupt service routines that support nesting should enable nesting and save the content of
the RETIregister in a single stack push instruction ([--SP] = RETI). This clears the global interrupt disable bit
IPEND[4], thus enabling interrupts again. After this, all registers that are modified by the interrupt service routine
should be saved to the stack. Processor state is stored in the Supervisor stack, not in the User stack; hence, the in-
structions to push to and pop from the stack use the Supervisor stack.
The Nested Interrupt Handling figure illustrates how pushing RETI to the stack re-enables interrupts while in an
interrupt service routine, resulting in a short duration where interrupts are globally disabled.
ISR:
[--SP] = RETI ; /* Enables interrupts and saves return address to stack */
[--SP] = ASTAT ;
[--SP] = (R7:0, P5:0) ; /* Context save */
/* Body of service routine */
RTI;
The RTI instruction causes the return from an interrupt. The return address is popped into the RETI register from
the stack, an action that suspends interrupts from the time that RETI is restored until RTI finishes executing. The
suspension of interrupts prevents a subsequent interrupt from corrupting the RETI register.
Next, the RTI instruction clears the highest priority bit that is currently set in IPEND. The processor then jumps to
the address pointed to by the value in the RETI register and re-enables interrupts by clearing IPEND[4].
As an example, assume that the SNEN bit is set and the processor is servicing an interrupt generated by the RAISE
14; instruction. Once the RETI register has been saved to the stack within the service routine, another RAISE 14;
instruction would force the processor to again vector to the beginning of the IVG14 interrupt handler. This scheme
becomes especially useful and is required when many events can share the same priority level, as is the case with the
SEC event being provided by default on IVG11. With the SEC monitoring all the system-level events with pro-
grammable priority, it has the ability to queue and prioritize the events that are passed to the core, but it continually
passes these events at the same IVG11priority level. If self-nesting were not supported by the core, each event would
have to be serviced to completion before the next could be accepted by the core from the SEC, regardless of the
event's programmed priority in the SEC. With self-nesting, the SEC can raise Event A on IVG11 and continue to
monitor for something of a higher priority. And if something with higher priority is raised, it can signal the core to
now interrupt servicing of the current IVG11event in order to now service the higher-priority system interrupt on
the same core interrupt IVG11level.
Device-specific code may be separated from general SEC programming concerns by maintaining a table of routines
dedicated to each interrupt source. The IVG11 ISR is responsible for the following actions:
1. Reading the SEC_SID MMR to obtain the interrupt source ID (SID).
2. Writing SEC_SID to send the acknowledge signal to the SEC.
3. Calling a device-specific handler.
4. Writing the SEC Global End Register MMR (SEC_END.SID) to indicate the interrupt has been serviced.
The following code is an example of an ISR that performs these actions, which illustrates some programming con-
cerns to be aware of:
ivg11_sec_isr:
[--SP] = (P4:P5) ;
P5 = [CEC_SID] ;
/* Writing any value to CEC_SID sends an ACK to the SEC. After acknowledgement, the value
in CEC_SID will change asynchronously if a higher-priority interrupt becomes active, so the
only reliable record of the current interrupt source is in P5 */
[CEC_SID] = R0 ;
/* The interrupt source is safely in P5, and the SEC will only interrupt with a higher-
priority notification. So it is safe to re-enable core interrupts by clearing IPEND[4],
which is a side-effect of saving RETI. */
[--SP] = RETI ;
[--SP] = RETS ;
/* Use interrupt source to index a table of handlers for each specific interrupt. */
P4 = specific_handlers;
P4 = P4 + (P5 << 2);
P4 = [P4];
CALL (P4);
/* Assume the handler has preserved the interrupt source in P5. */
RETS = [SP++] ;
RETI = [SP++] ;
/* Write interrupt source to SEC Global End register to indicate the core has finished
servicing the interrupt. The SEC may now raise lower-priority interrupts, so this should
happen with core interrupts disabled via IPEND[4] to prevent the lower-priority interrupt
being serviced before the higher-priority ISR has returned. Do this after RETI has been
restored, which implicitly sets IPEND[4]. */
[REG_SEC0_END] = P5 ;
(P4:5) = [SP++] ;
RTI ;
In practice, use of any device driver or interrupt support routine supplied by a third party will require the IVG11
ISR code it is designed to work with. Software designed to work with CCES requires the ISR included in the run-
time library, which is automatically linked at build-time.
Software Interrupts
Software cannot set bits of the ILAT register directly, as writes to ILAT cause a write-1-to-clear (W1C) operation.
Instead, use the RAISE instruction to set individual ILAT bits by software. It safely sets any of the ILAT bits with-
out affecting the rest of the register.
RAISE 14; /* fire software interrupt request */
The RAISE instruction must not be used to fire emulation events or exceptions, which are managed by the related
EMUEXCPT and EXCPT instructions. For details, see the external event management chapter.
Often, the RAISE instruction is executed in interrupt service routines to degrade the interrupt priority. This enables
less urgent parts of the service routine to be interrupted even by low priority interrupts.
isr7: /* service routine for IVG7 */
...
/* execute high priority instructions here */
/* handshake with signalling peripheral */
RAISE 14;
RTI;
isr7.end:
isr14: /* service routine for IVG14 */
...
/* further process event initiated by IVG7 */
RTI;
isr14.end:
The example above may read data from any receiving interface, post it to a queue, and let the lower priority service
routine process the queue after the isr7 routine returns. Since IVG15 is used for normal program execution in
non-multi-tasking system, IVG14 is often dedicated to software interrupt purposes.
The code in Example Code for an Exception Handler uses the same principle to handle an exception with normal
interrupt priority level.
CLOCK
OTHER PROCESSORS
FETCH
INSTRUCTION
DATA
INTERRUPT SERVICED
OCCURRING HERE
HERE
BLACKFIN PROCESSOR
FETCH
INSTRUCTION
DATA
INTERRUPT SERVICED
OCCURRING HERE
HERE
Writes to slow memory generally do not show this behavior, as the writes are deemed to be single cycle, being imme-
diately transferred to the write buffer for subsequent execution.
For detailed information about cache and memory structures, see the memory chapter .
SEQSTAT Register
The Sequencer Status register (SEQSTAT) contains information about the current state of the sequencer as well as
diagnostic information from the last event. SEQSTAT is accessible only in Supervisor mode. For more information,
see the Sequencer Status Register .
Exceptions (Events)
Exceptions are synchronous to the instruction stream. In other words, a particular instruction causes an exception
when it attempts to finish execution. No instructions after the offending instruction are executed before the excep-
tion handler takes effect.
Many of the exceptions are memory related. For example, an exception is given when a cacheability protection loo-
kaside buffer (CPLB) miss or protection violation occurs. Exceptions are also given when illegal instructions or ille-
gal combinations of registers are executed.
An excepting instruction may or may not commit before the exception event is taken, depending on if it is a service
type or an error type exception.
An instruction causing a service type event will commit, and the address written to the RETX register will be the next
instruction after the excepting one. An example of a service type exception is the single step.
An instruction causing an error type event cannot commit, so the address written to the RETX register will be the
address of the offending instruction. An example of an error type event is a CPLB miss.
NOTE: Usually the RETX register contains the correct address to return to. To skip over an excepting instruction,
take care in case the next address is not simply the next linear address. This could happen when the ex-
cepting instruction is a loop end. In that case, the proper next address would be the loop top.
The EXCAUSE[5:0] field in the Sequencer Status register (SEQSTAT) is written whenever an exception is taken,
and indicates to the exception handler which type of exception occurred. Refer to the events table for a list of events
that cause exceptions.
ATTEN- If an exception occurs in an event handler that is already servicing an exception, NMI, reset, or emula-
TION: tion event, this will trigger a double fault condition, and the address of the excepting instruction will
be written to RETX.
Type:
(E) Error
EXCAUSE (S) Service
Exception [5:0] See Note 1. Notes/Examples
Instruction fetch CPLB pro- 0x2B E Illegal instruction fetch access (memory protec-
tection violation tion violation).
Instruction fetch CPLB miss 0x2C E CPLB miss on an instruction fetch.
Instruction fetch multiple 0x2D E More than one CPLB entry matches instruction
CPLB hits fetch address.
Illegal use of supervisor re- 0x2E E Attempted to use a Supervisor register or instruc-
source tion from User mode. Supervisor resources are
registers and instructions that are reserved for Su-
pervisor use: Supervisor only registers, all MMRs,
and Supervisor only instructions. This error code
is also used for errors that do not fit into any oth-
er category.
NOTE: (1) For services (S), the return address is the address of the instruction that follows the exception. For
errors (E), the return address is the address of the excepting instruction.
If an instruction causes multiple exception, the exception with the highest priority is first registered in the SEQSTAT.
The exception priority is as listed in the exceptions by priority table. If the highest priority exception is handled, the
next highest priority exception is registered and can be handled (and so on).
For example, suppose that the following instruction generates an instruction CPLB miss (0x2C) exception and a
data CPLB miss (0x26) exception. On execution of this instruction, a instruction CPLB will be first generated. After
this instruction exception is handled by the user the core will execute the instruction again and this time it will gen-
erate a data CPLB exception.
[P0] = R0 ;
/* generates an instruction CPLB miss and a data CPLB miss */
.SECTION L1_code;
except_handler:
[ except_save_sp ] = SP; /* save stack pointer */
SP = except_stack+EXCEPT_STACK_SIZE;
/* now safe to save registers in except_stack */
[--SP] = (R7:6, P5:4);
[--SP] = ASTAT;
/* place core of service routine here */
ASTAT = [SP++];
(R7:6, P5:4) = [SP++];
/* restore stack pointer before return */
SP = [ except_save_sp ];
RTX;
except_handler.end:
Similar considerations apply to parity error handlers. If the system stack could be in memory that caused the parity
error then it is necessary to switch to a stack in a different memory region, such as ECC protected L2 memory,
before saving any registers or risk a double parity error fault.
Because exceptions, NMIs, and emulation events have a dedicated return register, guarding the return address is op-
tional. Consequently, the PUSH and POP instructions for exceptions, NMIs, and emulation events do not affect the
interrupt system.
Note, however, the return instructions for exceptions ( RTX, RTN, and RTE) do clear the Least Significant Bit (LSB)
currently set in IPEND.
NOTE: When deferring the processing of an exception to lower priority interrupt IVGx, the system must guaran-
tee that IVGx is entered before returning to the application-level code that issued the exception. If a pend-
ing interrupt of higher priority than IVGx occurs, it is acceptable to enter the high priority interrupt be-
fore IVGx.
/* The entry point for an event is as follows. Here, processing is deferred to low priority
interrupt IVG12. Also, parameter passing would typically be done here. */
_EVENT1:
RAISE 12 ;
JUMP.S _EXIT ;
/* Entry for event at IVG13 */
_EVENT2:
RAISE 13 ;
JUMP.S _EXIT ;
/* Comments for other events */
/* At the end of handler, restore R7, P5, P1 and ASTAT, and return. */
_EXIT:
ASTAT = [SP++] ;
(R7,P5:4) = [SP++] ;
SP = [EXCEPT_SAVED_SP] ;
RTX ;
_EVTABLE:
.byte2 addr_event1;
.byte2 addr_event2;
...
.byte2 addr_eventN;
/* The jump table EVTABLE holds 16-bit address offsets for each event. With offsets, this
code is position independent and the table is small.
+--------------+
| addr_event1 | _EVTABLE
+--------------+
| addr_event2 | _EVTABLE + 2
+--------------+
| . . . |
+--------------+
| addr_eventN | _EVTABLE + 2N
+--------------+
*/
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The RETS register is not a memory-mapped register, but it is directly accessible using move instructions and can be
pushed to or popped from the system stack; however, the RETS register cannot be used as the source or destination
register for load/store or immediate load operations.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
X X X X X X X X X X X X X X X X
ADDR[15:0] (R/W)
Return Address
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
X X X X X X X X X X X X X X X X
ADDR[31:16] (R/W)
Return Address
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
X X X X X X X X X X X X X X X X
ADDR[15:0] (R/W)
Return Address
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
X X X X X X X X X X X X X X X X
ADDR[31:16] (R/W)
Return Address
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
X X X X X X X X X X X X X X X X
ADDR[15:0] (R/W)
Return Address
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
X X X X X X X X X X X X X X X X
ADDR[31:16] (R/W)
Return Address
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
X X X X X X X X X X X X X X X X
ADDR[15:0] (R/W)
Return Address
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
X X X X X X X X X X X X X X X X
ADDR[31:16] (R/W)
Return Address
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
X X X X X X X X X X X X X X X X
ADDR[15:0] (R/W)
Return Address
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
X X X X X X X X X X X X X X X X
ADDR[31:16] (R/W)
Return Address
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
X X X X X X X X X X X X X X X 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
X X X X X X X X X X X X X X X X
ADDR[30:15] (R/W)
Loop Top Address
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
X X X X X X X X X X X X X X X X
ADDR[15:0] (R/W)
Loop Bottom Address
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
X X X X X X X X X X X X X X X X
ADDR[31:16] (R/W)
Loop Bottom Address
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
X X X X X X X X X X X X X X X X
COUNT[15:0] (R/W)
Loop Count Value
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
X X X X X X X X X X X X X X X X
COUNT[31:16] (R/W)
Loop Count Value
System ID Register
The CEC_SID register contains the system ID of the interrupt that was most recently accepted by the processor
core. When the core accepts the interrupt, it sends an interrupt acknowledge to the SCI, causing the SCI to update
the value in its SEC_CSID register. For more information about interrupt system ID assignments, see the SCI sec-
tion of the processor hardware reference manual.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
SID (R/W)
System Interrupt ID Value
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Context ID Register
The ICU_CID register is defined as part of the debug trace specification and holds a software-specified 32-bit con-
text ID. This context ID is captured by the program flow trace block for comparison (trace filtering) and may be
included in the trace packets.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
VALUE[15:0] (R/W)
Context ID Value
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
VALUE[31:16] (R/W)
Context ID Value
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
X X X X X X X X X X X X X X X X
ADDR[15:0] (R/W)
ISR Address for Core Event Handler
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
X X X X X X X X X X X X X X X X
ADDR[31:16] (R/W)
ISR Address for Core Event Handler
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
IVG8 (R/W1C)
IVG 8 Latch
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
BP Configuration Register
The BP_CFG register configures branch predictor features such as enabling dynamic branch prediction for various
types of branch instructions, controlling updates to the branch prediction table, permitting access to entries in the
prediction table and prediction table memory, and clearing the branch prediction table.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CLRNFL (R/W)
Clear Not Found Learn
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 1 0 1 1 0 0 1 1 1 0 1 1 0
BP Status Register
The BP_STAT register indicates the status of the branch predictor state machine, store buffer, and current operation.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Each Blackfin+ core features a dedicated timer. Unlike other peripherals, the core timer resides inside the Blackfin+
core and runs at the core clock (CCLK) rate. The core timer is typically used as a system tick clock for generating
periodic operating system interrupts.
TMR Features
The core timer is a programmable 32-bit interval timer which can generate periodic interrupts. Core timer features
include:
• 32-bit timer with 8-bit prescaler
• Operates at core clock (CCLK) rate
• Dedicated high-priority interrupt channel
• Single-shot or continuous operation
32
TMREN
TINT
COUNT REGISTER
LOAD LOGIC
TIMER
TIMER ENABLE INTERRUPT
CCLK
AND PRESCALE DEC TCOUNT ZERO
LOGIC
External Interfaces
The core timer does not directly interact with any external pins on the device.
Internal Interfaces
The core timer is accessed through the 32-bit register access bus (RAB). The core clock (CCLK) is the source clock
for the module. The dedicated interrupt request of the timer is a higher priority than requests from all other periph-
erals.
TMR Operation
The software initializes the timer count (TCOUNT) register before the timer is enabled. The TCOUNT register can be
written directly, but writes to the timer period (TPERIOD) register also pass through to TCOUNT.
When the timer is enabled by setting the TCNTL.EN bit, the TCOUNT register is decremented once every TSCALE+ 1
CCLK cycles. When the value of the TCOUNT register reaches 0, the core timer generates an interrupt and the
TCNTL.INT bit is set.
If the TCNTL.AUTORLD bit is set, then hardware automatically reloads the TCOUNT register with the contents of the
TPERIOD register, and the count begins again. If the TCNTL.AUTORLD bit is not set, the timer stops operation.
Clear the TCNTL.PWR bit to put the core timer into low-power mode, which disables clocks to the core timer to
reduce power consumption. Before using the timer, set the TCNTL.PWR bit to restore clocks to the timer unit before
setting the TCNTL.EN bit to enable the core timer.
Interrupt Processing
The core timer's dedicated interrupt request is a higher priority than interrupt requests from all other peripherals.
The request goes directly to the core event controller (CEC), thus bypassing the system event controller (SEC) en-
tirely. As such, interrupt processing is completely in the CCLK domain.
NOTE: The core timer interrupt request is edge-sensitive. Hardware clears it automatically as soon as the interrupt
is serviced.
The TCNTL.INT bit indicates that the core timer has generated an interrupt. Programs must write a 0 (not W1C) to
clear it, though this write is optional. The core timer module does not provide any further interrupt enable bit.
When the timer is enabled, interrupts can be masked in the CEC controller.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CNT[15:0] (R/W)
Timer Count
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CNT[31:16] (R/W)
Timer Count
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PERIOD[15:0] (R/W)
Timer Period
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PERIOD[31:16] (R/W)
Timer Period
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
SCALE (R/W)
Scaling factor
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Like most digital signal processor (DSP) and reduced instruction set computer (RISC) platforms, the Blackfin+ pro-
cessors have a load/store architecture. Computation operands and results are always represented by core registers.
Prior to computation, data is loaded from memory into core registers, and results are stored back by explicit move
operations. The Address Arithmetic Unit (AAU) provides all the required support to keep data transport between
memory and core registers efficient and seamless. Having a separate arithmetic unit for address calculations prevents
the data computation block from being burdened by address operations. Not only can the load and store operations
occur in parallel to data computations, but memory addresses can also be calculated at the same time.
The AAU uses Data Address Generators (DAGs) to generate addresses for data moves to and from memory. By gen-
erating addresses, the DAGs let programs refer to addresses indirectly, using a DAG register instead of an absolute
address. The figure shows the AAU block diagram.
RAB
32
ADDRESS ARITHMETIC UNIT
SP
I3 L3 B3 M3 FP
I2 L2 B2 M2 P5
I1 L1 B1 M1 DAG1 P4
I0 L0 B0 M0 P3
DAG0
P2
P1
P0
32 32
32
DA1 DA0 PREG
• Bit-reversed carry address - provides a bit-reversed carry address during a data move without reversing the stor-
ed address
The AAU comprises two DAGs, nine pointer registers, four index registers and four complete sets of related modify,
base, and length registers. These registers (shown in the AAU figure) hold the values that the DAGs use to generate
addresses. The types of registers are:
• Index (I[3:0]) registers. Unsigned 32-bit index registers hold an address pointer to memory. For example, the
R3 = [I0]; instruction loads the data value found at the memory location pointed to by the I0 register.
Index registers can be used for 16- and 32-bit memory accesses.
• Modify (M[3:0]) registers. Signed 32-bit modify registers provide the increment, or step size, by which an in-
dex register is modified after a register move. For example, the R0 = [I0 ++ M1]; instruction directs the
DAG to output the address in register I0, load the contents of the memory address pointed to by theI0 regis-
ter into the R0 register, and then modify the value of theI0 register by the value in the M1 register
• Base (B[3:0]) and length (L[3:0]) registers. Unsigned 32-bit base and length registers set up the the starting
address and length of a buffer, respectively. Each B/L pair is always grouped with a corresponding index register.
For example, I3, B3, and L3 are used collectively to handle a single buffer, but any modify register can be used
to update the dedicated index register. If the length register is set to 0, the buffer is unbound and linear. If the
length register is non-zero, the buffer is bound and circular, meaning that index modification beyond the end
of the buffer will wrap back to the base address. For more information on circular buffers, see Addressing Cir-
cular Buffers.
• Pointer registers. The core has six general-purpose (P[5:0]) pointer registers, a Frame Pointer (FP) register, a
User (Mode) Stack Pointer (USP) register, and a Stack Pointer (SP) register. Each is a 32-bit pointer register
holding the value of an address in memory and can be manipulated and used in various instructions. For exam-
ple, the R3 = [P0]; instruction loads the R3 register with the data found at the memory location pointed to
by the P0 register. The pointer registers have no effect on circular buffer addressing and can be used for 8-, 16-,
and 32-bit memory accesses. For added mode protection, the SP register is only accessible in Supervisor mode,
and the USP register is an alias that is either implicitly accessed via the stack pointer from User mode or explic-
itly accessed from Supervisor mode.
Pointer
Data Address Registers Registers
I0 L0 B0 M0 P0
I1 L1 B1 M1 P1
I2 L2 B2 M2 P2
I3 L3 B3 M3 P3
P4
P5
User SP
Supervisor SP
FP
This instruction fetches the 32-bit word pointed to by the address in the P3 register and places it in the 32-bit R0
register. It then post-increments P3 by four to point to the data after the 32-bit word that was just fetched.
Now consider this instruction:
R0.L = W [ I3++ ];
The W modifier in this instruction indicates that this is a 16-bit access to the address pointed to by the I3 register.
As such, the destination register must be a 16-bit entity, in this case the low half of one of the data registers (R0.L),
and the address in the I3 register is post-incremented by two after the access is made.
Finally, there is the byte access:
R0 = B [ P3++ ] (Z) ;
This instruction fetches a byte that is pointed to by the address in the P3 register, places it in the destination regis-
ter, R0, and then post-increments the address in P3 by one. Unlike the previous 16-bit move instruction that chose a
destination register of the same width as the access, the 32-bit destination register is satisfied by this instruction's
inclusion of the zero extension (Z), which fills the upper 24 bits with zeros. Sign-extension (X) of bit 7 through the
upper 24 bits of the register is also supported.
Instructions using index registers can use either a modify register or a small immediate value (+/- 2 or 4) as the
modifier. Instructions using pointer registers use a small immediate value or another pointer register as the modifier.
For more details, see AAU Instruction Summary.
There are no restrictions on data alignment. A 32-bit word can be fetched from any address, and the four contigu-
ous bytes starting from the specified address are fetched. Similarly, a 16-bit access may be to any two adjacent ad-
dresses in memory. The byte order of the memory is little-endian, so the lower addressed byte always contains the
least significant bits of the stored value.
• In User mode, any reference to SP (e.g., the R0 = [ SP++ ]; stack pop instruction) implicitly uses the USP
as the effective address to access.
• In Supervisor mode, the same reference to SP uses the Supervisor Stack Pointer as the effective address to ac-
cess. To manipulate the User Stack Pointer from code running in Supervisor mode, explicitly use the register
alias USP. When the processor is in Supervisor mode, a move from the USP register (e.g., R0 = USP;) moves
the current User Stack Pointer into R0. The USP register alias can only be used in Supervisor mode.
The following load/store instructions use FP and SP:
• FP-indexed load/store, which extends the addressing range for 16-bit encoded load/stores
• Stack push/pop instructions, including those for pushing and popping multiple registers
• Link/unlink instructions implicitly use both, as they control stack frame space and manage the frame pointer
register (FP) for that space
loads a 32-bit value from the address pointed to by I2 and stores it in the 32-bit destination register R0.
R0.H = W [ I2 ] ;
loads a 16-bit value from the address pointed to by I2 and stores it in the 16-bit destination register R0.H.
[ P1 ] = R0 ;
stores the 8-bit value from the least signficant byte of the R0 register to the address pointed to by the P1 register.
loads a 16-bit word into a 32-bit destination register from an address pointed to by the P1 pointer register. The
pointer is then incremented by two, and the word is zero-extended to fill the 32-bit destination register.
Auto-decrement works the same way by decrementing the address after the access. For example:
R0 = [ I2-- ] ;
loads a 32-bit value into the destination register and decrements the index register by four.
Post-modify Addressing
Post-modify addressing uses the value in the index or pointer registers as the effective address and then modifies it by
the contents of another register. Pointer registers are modified by other pointer registers, whereas index registers are
modified by modify registers. Post-modify addressing does not support the pointer registers as destination registers,
nor does it support byte-addressing. For example:
R5 = [ P1++P2 ] ;
loads a 32-bit value into the R5 register from the memory location pointed to by the P1 register. The value in the P2
register is then added to the value in the P1 register.
R2 = W [ P4++P5 ] (Z) ;
loads a 16-bit word from the memory location pointed to by the P4 register into the low half of the destination
register R2, zero-filling it to 32 bits. The value in the P5 register is then added to the value in the P4 register.
R2 = [ I2++M1 ] ;
loads a 32-bit word from the address pointed to by I2 into the destination register R2, and then the value in the I2
index register is then modified by the value in the M1 modify register.
Direct Addressing
Direct addressing uses the immediate value field in the instruction as the effective address. The location addressed
does not depend upon the contents of any register. The source or destination may be a pointer register, a data regis-
ter, or a data register half. Both 8- and 16-bit read operations may specify to either sign- or zero-extend the value
into the upper bits of the destination register. For example:
[ 0x100 ] = SP ;
stores the stack pointer register in the 32-bit word at address 0x100.
Sometimes, the address can be specified as a symbolic value defined at the assembler level:
R0 = B [ myvar ] (Z) ;
This instruction loads a byte from an address identified by the symbolic value myvar, which might be defined with
a .VAR directive in either a native or compiler-produced assembly source file.
Direct address instructions are convenient for accessing memory-mapped registers, which are always at defined ad-
dresses. They also enable context saving without the need to modify any core registers. For example, a CPLB miss
handler could be written to switch to a private stack in a known safe region of memory to protect against the case
where the stack pointer itself caused the CPLB miss to begin with:
.SECTION scratchpad;
.ALIGN 4;
.VAR save_sp, safe_stack[BIG_ENOUGH];
...
[ save_sp ] = SP; // save stack pointer
SP = safe_stack+BIG_ENOUGH; // make it point to the safe stack
// now registers can be saved into the safe area
[ --SP ] = ( R7:5, P5:P4 );
[ --SP ] = ASTAT;
...
// restore registers
ASTAT = [SP++];
( R7:5, P5:P4 ) = [ SP++ ];
SP = [ save_sp ]; // restore stack pointer
RTX;
pointer falls outside the buffer's range, the DAG automatically adjusts the value to wrap the index pointer to a
location that is in the buffer.
The starting address that the DAG wraps around to is called the buffer's base address (base register). There are no
restrictions on the value of the base address for circular buffers that contains 8-bit data. Circular buffers that contain
16- or 32-bit data must be 16-bit-aligned or 32-bit-aligned, respectively. Circular buffering uses post-modify ad-
dressing.
LENGTH = 11
BASE ADDRESS = 0X0
MODIFIER = 4
THE COLUMNS ABOVE SHOW THE SEQUENCE IN ORDER OF LOCATIONS ACCESSED IN ONE PASS.
THE SEQUENCE REPEATS ON SUBSEQUENT PASSES.
addressing feature permits repeatedly subdividing data sequences and storing this data in bit-reversed order. For
detailed information about bit-reversed addressing, see the description of the modify/increment instruction.
The Addressing Modes table summarizes the addressing modes. In the table, an asterisk (*) indicates the processor
supports the addressing mode.
Instruction
Preg = [ FP - uimm7m4 ] ;
Dreg = [ Preg ] ;
Dreg = [ Preg ++ ] ;
Dreg = [ Preg -- ] ;
Dreg = [ Preg + uimm6m4 ] ;
Dreg = [ Preg + uimm17m4 ] ;
Dreg = [ Preg - uimm17m4 ] ;
Dreg = [ Preg ++ Preg ] ;
Dreg = [ FP - uimm7m4 ] ;
Dreg = [ uimm32 ] ;
Dreg = [ Ireg ] ;
Dreg = [ Ireg ++ ] ;
Dreg = [ Ireg -- ] ;
Dreg = [ Ireg ++ Mreg ] ;
Dreg =W [ Preg ] (Z) ;
Dreg =W [ Preg ++ ] (Z) ;
Dreg =W [ Preg -- ] (Z) ;
Dreg =W [ Preg + uimm5m2 ] (Z) ;
Dreg =W [ Preg + uimm16m2 ] (Z) ;
Dreg =W [ Preg - uimm16m2 ] (Z) ;
Dreg =W [ Preg ++ Preg ] (Z) ;
Dreg =W [ uimm32 ] (Z) ;
Dreg = W [ Preg ] (X) ;
Dreg = W [ Preg ++] (X) ;
Dreg = W [ Preg -- ] (X) ;
Dreg =W [ Preg + uimm5m2 ] (X) ;
Dreg =W [ Preg + uimm16m2 ] (X) ;
Dreg =W [ Preg - uimm16m2 ] (X) ;
Dreg =W [ Preg ++ Preg ] (X) ;
Dreg =W [ uimm32 ] (X) ;
Dreg_hi = W [ Ireg ] ;
Instruction
Dreg_hi = W [ Ireg ++ ] ;
Dreg_hi = W [ Ireg -- ] ;
Dreg_hi = W [ Preg ] ;
Dreg_hi = W [ Preg ++ Preg ] ;
Dreg_hi = W [ uimm32 ] ;
Dreg_lo = W [ Ireg ] ;
Dreg_lo = W [ Ireg ++] ;
Dreg_lo = W [ Ireg -- ] ;
Dreg_lo = W [ Preg ] ;
Dreg_lo = W [ Preg ++ Preg ] ;
Dreg_lo = W [ uimm32 ] ;
Dreg = B [ Preg ] (Z) ;
Dreg = B [ Preg ++ ] (Z) ;
Dreg = B [ Preg -- ] (Z) ;
Dreg = B [ Preg + uimm15 ] (Z) ;
Dreg = B [ Preg - uimm15 ] (Z) ;
Dreg = B [ uimm32 ] (Z) ;
Dreg = B [ Preg ] (X) ;
Dreg = B [ Preg ++ ] (X) ;
Dreg = B [ Preg -- ] (X) ;
Dreg = B [ Preg + uimm15 ] (X) ;
Dreg = B [ Preg - uimm15 ] (X) ;
Dreg = B [ uimm32 ] (X) ;
[ Preg ] = Preg ;
[ Preg ++ ] = Preg ;
[ Preg -- ] = Preg ;
[ Preg + uimm6m4 ] = Preg ;
[ Preg + uimm17m4 ] = Preg ;
[ Preg - uimm17m4 ] = Preg ;
[ FP - uimm7m4 ] = Preg ;
[ uimm32 ] = Preg ;
Instruction
[ Preg ] = Dreg ;
[ Preg ++ ] = Dreg ;
[ Preg -- ] = Dreg ;
[ Preg + uimm6m4 ] = Dreg ;
[ Preg + uimm17m4 ] = Dreg ;
[ Preg - uimm17m4 ] = Dreg ;
[ Preg ++ Preg ] = Dreg ;
[ FP - uimm7m4 ] = Dreg ;
[ uimm32 ] = Dreg ;
[ Ireg ] = Dreg ;
[ Ireg ++ ] = Dreg ;
[ Ireg -- ] = Dreg ;
[ Ireg ++ Mreg ] = Dreg ;
W [ Ireg ] = Dreg_hi ;
W [ Ireg ++ ] = Dreg_hi ;
W [ Ireg -- ] = Dreg_hi ;
W [ Preg ] = Dreg_hi ;
W [ Preg ++ Preg ] = Dreg_hi ;
W [ uimm32 ] = Dreg_hi ;
W [ Ireg ] = Dreg_lo ;
W [ Ireg ++ ] = Dreg_lo ;
W [ Ireg -- ] = Dreg_lo ;
W [ Preg ] = Dreg_lo ;
W [ Preg ++ Preg ] = Dreg_lo ;
W [ uimm32 ] = Dreg_lo ;
W [ Preg ] = Dreg ;
W [ Preg ++ ] = Dreg ;
W [ Preg -- ] = Dreg ;
W [ Preg + uimm5m2 ] = Dreg ;
W [ Preg + uimm16m2 ] = Dreg ;
W [ Preg - uimm16m2 ] = Dreg ;
Instruction
W [ uimm32 ] = Dreg ;
B [ Preg ] = Dreg ;
B [ Preg ++ ] = Dreg ;
B [ Preg -- ] = Dreg ;
B [ Preg + uimm15 ] = Dreg ;
B [ Preg - uimm15 ] = Dreg ;
B [ uimm32 ] = Dreg ;
Preg = imm7 (X) ;
Preg = imm16 (X) ;
Preg = uimm32;
Preg += Preg (BREV) ;
Ireg += Mreg (BREV) ;
Preg = Preg << 2 ;
Preg = Preg >> 2 ;
Preg = Preg >> 1 ;
Preg = Preg + Preg << 1 ;
Preg = Preg + Preg << 2 ;
Preg -= Preg ;
Ireg -= Mreg ;
Name Description
L[n] Length (Circular Buffer) Register (n = 0 - 3)
Pointer Register
There are six 32-bit general-purpose pointer registers P[n] that are primarily used for load/store operations. Al-
though pointer registers are primarily used for address calculations, these registers may also be used for general inte-
ger arithmetic with a limited set of arithmetic operations; however, unlike computations involving data registers
(R[n]), pointer register arithmetic does not affect the ASTAT status bits.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
X X X X X X X X X X X X X X X X
ADDR[15:0] (R/W)
Memory Address
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
X X X X X X X X X X X X X X X X
ADDR[31:16] (R/W)
Memory Address
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
X X X X X X X X X X X X X X X X
ADDR[15:0] (R/W)
Stack Frame Address
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
X X X X X X X X X X X X X X X X
ADDR[31:16] (R/W)
Stack Frame Address
To speed up context switching, there are two stack pointer registers, a User stack pointer (USP) and a Supervisor
stack pointer (SP). In assembly code, only the SP syntax is used, as the correct stack pointer register will be used
based on whether the processor is in Supervisor or User mode.
The LINK and UNLINK instructions, which control stack frame space, implicitly use and modify the SP register.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
X X X X X X X X X X X X X X X X
ADDR[15:0] (R/W)
Stack Address
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
X X X X X X X X X X X X X X X X
ADDR[31:16] (R/W)
Stack Address
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
X X X X X X X X X X X X X X X X
ADDR[15:0] (R/W)
User Mode Stack Address
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
X X X X X X X X X X X X X X X X
ADDR[31:16] (R/W)
User Mode Stack Address
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
X X X X X X X X X X X X X X X X
ADDR[15:0] (R/W)
Buffer Index Address
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
X X X X X X X X X X X X X X X X
ADDR[31:16] (R/W)
Buffer Index Address
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
X X X X X X X X X X X X X X X X
MODIFY[15:0] (R/W)
Buffer Index Modify Value
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
X X X X X X X X X X X X X X X X
MODIFY[31:16] (R/W)
Buffer Index Modify Value
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
X X X X X X X X X X X X X X X X
BASE[15:0] (R/W)
Buffer Base Address
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
X X X X X X X X X X X X X X X X
BASE[31:16] (R/W)
Buffer Base Address
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
X X X X X X X X X X X X X X X X
LENGTH[15:0] (R/W)
Circular Buffer Length
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
X X X X X X X X X X X X X X X X
LENGTH[31:16] (R/W)
Circular Buffer Length
7 Memory
Blackfin+ processors support a hierarchical memory model with different performance and size parameters, depend-
ing on the memory location within the hierarchy. Level 1 (L1) instruction and data memories interconnect closely
and efficiently with the Blackfin+ core to achieve the best performance. Separate blocks of L1 memory can be ac-
cessed simultaneously through multiple bus systems. Instruction memory is separated from data memory, but unlike
classic Harvard architectures, all L1 memory blocks are accessed by a unified addressing scheme. Portions of L1
memory can be configured to function as cache memory.
The Blackfin+ processors feature on-chip Level 2 (L2) memory, which can freely store both instructions and data
but take more core clock cycles to access than L1 memory, as well as external memory space (including asynchro-
nous memory for static RAM devices and synchronous memory for dynamic RAM, such as DDR SDRAM devices).
This chapter discusses the architecture and principles of L1 memories, as well as memory protection and caching
mechanisms. For memory sizes, locations, and definitions for both L2 and off-chip memory interfaces, refer to the
specific Blackfin+ Processor Hardware Reference.
Memory Architecture
Blackfin+ processors have a unified 4 GB address range that spans a combination of on-chip and off-chip memory
and memory-mapped I/O resources. Of this range, some of the address space is dedicated to internal, on-chip re-
sources, populated as:
• Level-1 (Core) Static Random Access Memories (L1 SRAM)
• Level-2 (Core) Static Random Access Memories (L2 SRAM)
• A set of memory-mapped registers (MMRs)
• A boot Read-Only Memory (ROM)
The Processor Memory Architecture figure shows a block diagram.
64
CORE L1 MEMORY
INSTRUCTION
PROCESSOR 32
LOAD DATA
32
CORE CLOCK 32 LOAD DATA
(CCLK) DOMAIN
STORE DATA
SYSTEM CLOCK
(SCLK) DOMAIN DMA 16
CORE
BUS (DCB) 16
DMA EXTERNAL
CONTROLLER ACCESS
BUS (EAB)
DMA 16
EXTERNAL
PERIPHERAL BUS (DEB)
ACCESS 16
BUS (PAB)
EBIU
16
ROM
EXTERNAL
NON-DMA PERIPHERALS DMA PERIPHERALS
PORT
16 BUS (EPB)
16
EXTERNAL
MEMORY
DMA ACCESS BUS DEVICES
(DAB)
dedicated instruction SRAM. L1 instruction memory cannot normally be accessed by load or store instructions by
the core.
Typical L1 data memory is divided into three blocks. Two, known as block A and block B, are equal-sized and may
be configured as cache or SRAM, and the third, known as block C or scratchpad, is a small region of dedicated data
SRAM often used for system stacks and heaps. Concurrent accesses by the core or DMA to separate blocks proceed
in parallel, whereas concurrent accesses to the same block may be stalled due to a sub-bank conflict.
The L1 memory system contains special-purpose memory spaces, such as cache tags and parity bits, to which direct
access by load or store instructions is normally restricted. An Extended Data Access mode is supported, in which all
restricted memory spaces including L1 instruction memory may be directly accessed by suitably privileged software.
L1 Instruction Memory
L1 instruction memory is a continuous region of the address space which may only be used for storing instructions.
On all Blackfin+ processors, a subset of this space must be used as directly addressable SRAM, but a dedicated por-
tion of this L1 instruction memory space can be optionally be configured as instruction cache memory. Control bits
in the L1IM_ICTL register can be used to configure this portion of L1 instruction memory as a 4-Way, set-associa-
tive instruction cache.
The L1 instruction memory content is not accessible by core load or store operations during normal operation. This
memory's content may be read and modified using either DMA or by loads and stores in Extended Data Access
mode.
L1 Instruction SRAM
The processor core reads instruction memory via a 64-bit instruction fetch bus. All addresses from this bus are 64-
bit aligned. Each instruction fetch can return any combination of 16-, 32- and 64-bit instructions (i.e., four 16-bit
instructions, two 16-bit instructions and one 32-bit instruction, two 32-bit instructions, or one 64-bit instruction).
The pointer registers and index registers, which are described in the Address Arithmetic Unit chapter, may only access
L1 instruction memory directly when Extended Data Access is enabled (see Extended Data Access), otherwise a di-
rect access to an address in instruction memory SRAM space generates an exception. Write access to the L1 instruc-
tion SRAM memory may also be made through the 64-bit system DMA port.
Typically, the SRAM is implemented as a collection of single-ported sub-banks. Writes to one sub-bank may proceed
in parallel with reads from another, effectively making the instruction memory dual-ported.
L1 Instruction Cache
For information about cache terminology, see Terminology.
A portion of the L1 instruction memory can be configured as a 16 KB 4-way set-associative instruction cache, fea-
turing a cache line size of 32 bytes. To improve the average access latency for critical code sections, each line of the
cache can be locked independently. When the memory is configured as cache, it can only be accessed directly by
loads and stores if Extended Data Access mode is enabled.
When cache is enabled, only memory pages further specified as cacheable by the cacheability protection lookaside
buffers (CPLBs) are cached. When CPLBs are enabled, any memory location that is accessed must have an associat-
ed page definition available, else a CPLB exception is generated. CPLBs are described in Memory Protection and
Properties.
The figures in Cache Lines show the organization of the Blackfin+ processor instruction cache memory.
NOTE: The entire code sequence above must not be in a cacheable region of memory.
Cache Lines
As shown in the Instruction Cache Organization figure, the cache consists of a collection of cache lines. Each cache
line is made up of a tag component and a data component.
• The tag component incorporates a 20-bit address tag, replacement policy bits (including a Priority bit), and a
Valid bit.
• The data component is made up of four 64-bit words of instruction data.
The tag and data components of cache lines are stored in the tag and data memory arrays, respectively.
31 12 11 5 4 0
ADDRESS TAG
WAY 3
1 2+1 20 4 x 64
PRIO
TAG REPL V
WD 3 WD 2 WD 1 WD 0
view of cacheability in the memory, system code that enables, disables and configures cache must respect the follow-
ing guidelines:
• bypass mode must not be cacheable
• execute a SSYNC; instruction before enabling and after disabling cache bypass mode
Because the IFLUSH instruction is used to invalidate a specific address in the memory map and its corresponding
cache line, it is most useful when the buffer being invalidated is less than the cache size.
Finally, larger portions of the cache can be invalidated by writing directly to the Valid bits while Extended Data
Access is enabled (see Extended Data Access).
L1 Data Memory
L1 data memory is organized as a number of distinct blocks, each of which constitutes a separate contiguous region
of the address space. Accesses to different blocks are guaranteed not to collide, whereas accesses within a single block
will not collide if they are to different sub-banks. When there are no collisions, the following L1 data traffic could
occur in a single core clock cycle:
• Two 32-bit data accesses (two loads or one load and one store)
• One DMA I/O, up to 32 bits
• One 64-bit cache fill or victim access
There are three blocks of L1 data SRAM, blocks A, B and C. Parts of the two larger blocks, A and B, may be config-
ured as data cache, as controlled by bits in the L1DM_DCTL register.
The processor cannot fetch instructions directly from L1 data memory.
L1 Data SRAM
L1 data SRAM is directly addressable by the processor core and by the DMA controller. Core accesses are performed
with no stalls, so long as access collisions are avoided; therefore, configuring the whole of L1 data memory as SRAM
is potentially the most efficient way to use it. However, this is at the cost of additional software complexity when
data structures are larger than available L1 memory. In a common use case, DMA engines transfer data between L1
and the system while the core processes data previously loaded to L1.
To optimize memory performance, the programmer must ensure that stalls due to collisions are avoided. The Black-
fin+ architecture guarantees that accesses to different blocks do not collide. For example, a tight loop which loads
two operands per cycle will not incur stalls due to the loads if the operands are placed in separate blocks.
Depending on the memory microarchitecture, two accesses to the same block may not collide. However, a collision
due to two loads in a parallel issue instruction will not take more cycles than the additional cycles resulting from
executing one of the loads in a separate instruction.
Collisions between the core and DMA are less common because DMA runs at system clock speeds, which are some
fraction of the core clock. DMA accesses are also usually delayed behind core accesses, but after some delay a fairness
algorithm ensures that the DMA gets access to L1. If possible, DMA accesses should be to a different block than
concurrent core accesses.
L1 Data Cache
For definitions of cache terminology, see Terminology.
Each of the A and B blocks of L1 data memory can be configured to serve as a 16 KB 2-way set associative data
cache (up to 32 KB total), featuring a cache line length of 32 bytes. When the memory is configured as cache, it can
only be accessed directly by loads and stores if Extended Data Access mode is enabled.
If cache is enabled in the L1DM_CTL.CFG[1:0] bits, data CPLBs must also be enabled by setting the
L1DM_CTL.ENDCPLB bit. Only memory pages specified as cacheable by data CPLBs will be cached. The default
behavior when data CPLBs are disabled is for nothing to be cached.
Access to core MMR space is not controlled by the data CPLBs, so this region cannot be configured as cacheable.
• If the state of the line is modified (dirty), then the cache contains the only valid copy of the data. This is copied
back to external memory before the new tag and data is written to the cache.
• For a given large region of memory, data in the first 16 KB of that memory (offset 0x0000 - 0x3FFF) will
be cached only in data block B. Data in the next 16 KB address range (offset 0x4000 - 0x7FFF) will be
cached only in data block A, and so on.
• If DCBS = 1, A[23] selects data block A instead of data block B. With DCBS = 1, the system functions more
like two independent 16 KB caches, each being 2-way set-associative and serving alternating blocks of 8 MB
source memory regions. Data block B caches all data accesses for the first 8 MB of the memory address range,
with each access vying for the two line entries. Likewise, data block A caches data located above 8 MB and
below 16 MB, and so on.
For example, if DCBS = 1 and the application utilizes a 1 MB data buffer located entirely in the first 8 MB of
memory, it is effectively served by only half the cache, as the 2-Way set associative 16 KB cache associated with data
block B is the only cache memory it can target. In this instance, the application never derives any benefit from data
block A.
However, if the application is working from two data sets located at least 8 MB apart in memory, closer control over
how the cache maps to the data is possible. For example, if the program is doing a series of dual-MAC operations in
which both DAGs are accessing data on every cycle, the DAG0 data set can be mapped to one 8 MB region of
memory while the DAG1 data is mapped to another, thus ensuring that:
• DAG0 gets its data from data block A for all of its accesses, and
• DAG1 gets its data from data block B.
This arrangement causes the core to use both data buses for cache line transfers and achieves the maximum data
bandwidth between the cache and the core.
The Data Cache Mapping figure shows an example of how mapping is performed when DCBS = 1.
WAY0 WAY1
8MB
DATA BANK B
8MB
8MB
DATA BANK B
8MB
WAY0 WAY1
memory, system code that enables, disables and operates in data cache bypass mode should be preceded by and
followed by an SSYNC instruction.
Because the L1 memories are separated into instruction and data memories, the CPLB entries are also divided be-
tween instruction and data CPLBs. Sixteen CPLB entries and one default descriptor are used for instruction fetch
requests (ICPLBs). Another sixteen CPLB entries are used for data transactions (DCPLBs). The ICPLBs and
DCPLBs are enabled by setting the appropriate bits in the L1 Instruction Memory Control (L1IM_ICTL) and L1
Data Memory Control (L1DM_DCTL) registers, respectively.
Data accesses to system and core MMR space are never controlled by CPLBs. If the data CPLBs are enabled, data
accesses to all memory spaces (including extended data accesses, when enabled) are controlled by the data CPLBs.
Instruction CPLB
The Instruction CPLB (ICPLB) governs instruction fetches. Each of the 16 ICPLB page descriptors consists of a
pair of 32-bit values:
• L1IM_ICPLB_ADDR[n] defines the start address of the page described by the ICPLB descriptor. For more in-
formation, see Instruction Memory CPLB Address Registers .
• L1IM_ICPLB_DATA[n] defines the properties of the page described by the ICPLB descriptor. For more infor-
mation, see Instruction Memory CPLB Data Registers .
The L1IM_ICPLB_DFLT register provides default properties should no valid page descriptor match. For more infor-
mation, see Instruction Memory CPLB Default Settings Register .
NOTE: To ensure proper behavior and future compatibility, all reserved bits in the L1IM_ICPLB_DATAx and
L1IM_ICPLB_DFLT registers must be set to 0 whenever these registers are written.
Data CPLB
The Data CPLB (DCPLB) governs data accesses by load and store instructions. Each of the 16 DCPLB page de-
scriptors consists of a pair of 32-bit values:
• The L1DM_DCPLB_ADDR[m] defines the start address of the page described by the DCPLB descriptor. For
more information, see the Data Memory CPLB Address Registers .
• L1DM_DCPLB_DATA[m] defines the properties of the page described by the DCPLB descriptor. For more in-
formation, see the Data Memory CPLB Data Registers .
The L1DM_DCPLB_DFLT register provides default properties should no valid page descriptor match. For more infor-
mation, see the Data Memory CPLB Default Settings Register .
CAU- OTP memory does not support burst transfers, which is required to support cache line fills. As such,
TION: OTP memory should not be covered by a cache-enabled DCPLB. If it is, the OTP controller will re-
turn an error when a read access is attempted.
NOTE: To ensure proper behavior and future compatibility, all reserved bits in the L1DM_DCPLB_DATAx and
L1DM_DCPLB_DFLT register must be set to 0 whenever these registers are written.
• If cacheable: write-through or write-back determines whether data writes propagate directly to memory or
are deferred until the cache line is reallocated.
• If non-cacheable: I/O device space or regular memory. Data reads from I/O device space are non-specula-
tive. This is suitable for use with memory-mapped devices with read side-effects, but it is considerably less
efficient than a regular memory access and should not be used indiscriminately.
• Protection properties:
• Supervisor write access permission: enables or disables writes to this page when in Supervisor mode (ap-
plies to data pages only).
• User write access permission: enables or disables writes to this page when in User mode (applies to data
pages only).
• User read access permission: enables or disables reads from this page when in User mode.
• CPLB entry status:
• Valid: the processor ignores the CPLB page descriptor unless this bit is set.
• Dirty: the data in this page in memory has changed since the CPLB was last loaded. Writes to a page
without this bit set cause a CPLB protection exception. Software is responsible for setting the bit to enable
writes to the page and for propagating any subsequent modifications to the page further down the memo-
ry hierarchy. Ensure this bit is always set in data CPLBs if it is not required to track modifications to indi-
vidual pages.
• Lock: keep this entry in the MMU, and do not participate in CPLB replacement policy. This bit is ignor-
ed by the hardware and reserved for use by software implementing the CPLB replacement policy.
NOTE: The L1DM_DSTAT and L1IM_ISTAT registers are valid only while in the faulting exception service rou-
tine.
CPLB Management
Use of CPLBs is optional. If all L1 memory is configured as SRAM and no memory protection is required by the
application, then CPLBs need not be enabled. However, CPLBs must be used if cache or memory protection is re-
quired.
NOTE: Before caches are enabled, the MMU and its supporting data structures must be set up and enabled.
Upon reset, CPLBs are disabled, and the Memory Management Unit (MMU) is not used. CPLBs are enabled sepa-
rately for instruction fetches (by setting the L1IM_DCTL.ENCPLB bit) and data accesses (by setting the
L1DM_ICTL.ENCPLB bit).
Once the CPLBs are enabled, an exception occurs when the Blackfin+ processor issues a memory operation for
which no valid CPLB page descriptor exists and the default CPLB register indicates EOM (exception on miss). This
exception places the processor into Supervisor mode and vectors to the MMU exception handler. The handler is
typically part of the operating system (OS) kernel that implements the CPLB replacement policy.
The MMR storage locations for CPLB entries are limited to 16 page descriptors for instruction fetches and 16 page
descriptors for data load and store operations.
For small and/or simple memory models, it may be possible to define a set of CPLB page descriptors combined with
defaults that fit into these 32 entries, cover the entire addressable space, and never need to be replaced. This type of
definition is referred to as a static memory management model.
However, operating environments commonly define more CPLB descriptors (to cover the addressable memory and
I/O spaces) than will fit into the available on-chip CPLB MMRs. When this happens, a Page Descriptor Table is
used, which stores all the potentially required CPLB descriptors. The specific format for the Page Descriptor Table is
not defined as part of the Blackfin+ processor architecture. Different operating systems, which have different memo-
ry management models, can implement Page Descriptor Table structures that are consistent with the OS require-
ments. This allows adjustments to be made between the level of protection afforded versus the performance attrib-
utes of the memory-management support routines.
NOTE: Before CPLBs are enabled, valid CPLB page descriptors (defaults) must be in place for both the Page De-
scriptor Table and the MMU exception handler. The LOCK bits of these CPLB page descriptors are com-
monly set so that they are not inadvertently replaced in software.
The MMU exception handler uses the faulting address to index into the Page Descriptor Table structure to find the
correct CPLB descriptor data to load into one of the on-chip CPLB page descriptor register pairs. If all on-chip
registers contain valid CPLB entries, the handler selects one of the descriptors to be replaced, and the new descriptor
information is loaded. Before loading new descriptor data into any CPLBs, the corresponding group of sixteen
CPLBs must be disabled by clearing the ENCPLB bit in either L1DM_DCTL or L1IM_ICTL.
After the new CPLB page descriptor is loaded, the CPLB is re-enabled, the exception handler returns, and the fault-
ing memory operation is restarted. This operation should now find a valid CPLB descriptor for the requested ad-
dress, and it should proceed normally.
A single instruction may generate an instruction fetch as well as one or two data accesses. It is possible that more
than one of these memory operations references data for which there is no valid or default CPLB page descriptor. In
this case, the exceptions are prioritized and serviced in this order:
• Instruction page miss
• Data page miss using DAG0
• Data page miss using DAG1
CrossCore® Embedded Studio provides an MMU exception handler and automatic generation of the Page Descrip-
tor Table structure. Please refer to the Cache and CPLBs section of the System Run-Time Documentation.
service routine can also inspect the CPLB entries to infer the cause of the fault. For more information, see the Data
Memory CPLB Status Register and the Instruction Memory CPLB Status Register .
The DCPLB Fault Address (L1DM_DCPLB_FAULT_ADDR) and ICPLB Fault Address
(L1IM_ICPLB_FAULT_ADDR) registers hold the address that caused a fault in L1 data memory and L1 instruction
memory, respectively. For more information, see the Data Memory CPLB Fault Address Register and the Instruction
Memory CPLB Fault Address Register .
L1 Parity Protection
The following sections provide details of L1 parity error support on Blackfin+ processors.
The core will signal a double fault error to the SEC if a parity error is detected while servicing a Reset or Emulation
event. An NMI is not raised if a Parity Error is detected while servicing an NMI, but the Parity Error Status registers
are updated.
If a parity error is detected while servicing an NMI or a higher-priority event, the Parity Error Status registers are set.
When the processor is at thread level or servicing a lower-priority event and bits in the LOCATION or BYTELOC
fields are set, then an NMI is raised. As such, it is recommended that the NMI handler read and write back the
Parity Error Status registers to clear these bits for all NMI events associated with parity errors. If any further parity
errors are detected while still in the NMI handler, the status registers will update again, thus causing the processor to
vector again to the NMI handler immediately upon exiting it to handle the new error(s).
Parity bits are read into the low-order bit of each byte. The value of the remaining bits may not be zero and cannot
be relied upon.
Direct access to parity bits for L1 cache tags and dirty bits is possible only when extended data access is enabled.
Knowledge of the memory microarchitecture is required to do this (see Extended Data Access to L1 Caches).
L1 Initialization Requirements
If parity checking is to be used, software must write all locations of L1 after each processor power-up. This includes
initial device power-up, as well as subsequent exits from the Hibernate state. Normally, this process takes place in the
processor's boot code. These writes are necessary to initialize the otherwise random states of parity bits to legitimate
states. All of L1 must be initialized, rather than just those locations expected to be read, otherwise speculative access-
es have the potential to trigger unintended parity errors.
To allow the processor to be able to initialize L1 without the risk of triggering these unwanted parity errors, L1
parity error-checking on reads is disabled by default. Writes are always performed with parity bits calculated and
stored. This mode is controlled by the Read Parity Checking Enable (RDCHK) bit in the L1IM_ICTL and
L1DM_DCTL registers. This bit is deasserted by hardware reset and must be set by software (after L1 initialization) to
enable read parity checking. Initialization of L1 may be achieved through any combination of DMA and processor
L1 accesses.
When the cache is enabled, all the tags are written to. As such, enabling the cache also initializes the parity bits
protecting the tags. However, enabling the cache does not initialize the parity bits for the SRAM holding the cache
data arrays. These must be initialized prior to enabling the cache as part of the L1 SRAM initialization process de-
scribed above.
.section NONCACHED_ECC_PROTECTED_CODE;
.extern nmi_handler;
nmi_handler:
/* switch to ECC-protected stack */
[saved_sp] = SP;
SP = nmi_handler_stack + BIG_ENOUGH;
/* save other registers on ECC-protected stack */
/* If program used system MMRs or memory with read side-effects, check for non-speculative
access abort. */
R7 = SEQSTAT;
CC = BITTST(R7, BITP_SEQSTAT_NSPECABT);
IF CC JUMP unrecoverable_error;
/* If there are other sources of NMI, check for external NMI which vectors to the same
handler as parity. */
CC = BITTST(R7, BITP_SEQSTAT_SYSNMI);
IF CC JUMP nmi_handler;
/* CPARINT indicates parity error simultaneous with exception or interrupt, not necessarily
recoverable. */
CC = BITTST(R7, BITP_SEQSTAT_CPARINT);
IF CC JUMP unrecoverable_error;
/* If DMA may have read L1 or Write-back cache is enabled, check for parity error on system
read */
CC = BITTST(R7, BITP_SEQSTAT_PEIX);
IF CC JUMP unrecoverable_error;
CC = BITTST(R7, BITP_SEQSTAT_PEDX);
IF CC JUMP unrecoverable_error;
R6 = R7;
BITCLR(R6, BITP_L1DM_DCTL_CFG+1);
BITCLR(R6, BITP_L1DM_DCTL_CFG);
[REG_L1DM_DCTL] = R6; /* disable cache */
CSYNC;
[REG_L1DM_DCTL] = R7; /* re-enable cache */
CSYNC; /* wait for cache to reinitialize */
JUMP return_from_parity_error;
parity_in_instruction_L1:
R7 = [REG_L1IM_IPERR_STAT];
/* clear the error by writing BYTELOC and LOCATION */
[REG_L1IM_IPERR_STAT] = R7;
/* test for error in cache TAG or MOD */
R5 = ~BITM_L1DM_DPERR_STAT_LOCATION;
R5 = R7 & R5;
CC = R5 == 0;
IF !CC JUMP parity_error_in_instruction_cache;
/* compute the error address */
R6 = [REG_L1DM_SRAM_BASE_ADDR];
R5 = BITM_L1DM_DPERR_STAT_ADDRESS;
R5 = R7 & R5;
R6 = R6 + R5;
/* if address is in non-cache SRAM goto reload */
return_from_parity_error:
/* restore registers */
SP = [saved_sp]; /* restore stack */
RTN;
R0 B3 B2 B1 B0 B3 B2 B1 B0
B1 B0 B1 B0 B1 B0 B3 B2
Load/Store Operation
The Blackfin+ processor architecture supports the RISC concept of a Load/Store machine. This machine is the char-
acteristic in RISC architectures whereby memory operations (loads and stores) are intentionally separated from the
arithmetic functions that use the targets of the memory operations. The separation is made because memory opera-
tions, particularly instructions that access off-chip memory or I/O devices, often take multiple cycles to complete
and would normally halt the processor, preventing an instruction execution rate of one instruction per cycle.
In write operations, the store instruction is considered complete as soon as it executes, even though many cycles may
elapse before the data is actually written to an external memory or I/O location. This arrangement allows the pro-
cessor to execute one instruction per clock cycle, and it implies that the synchronization between when writes com-
plete and when subsequent instructions execute is not guaranteed. Moreover, this synchronization is considered un-
important in the context of most memory operations.
Interlocked Pipeline
In the execution of instructions, the Blackfin+ processor architecture implements an interlocked pipeline. When a
load instruction executes, the target register of the read operation is marked as busy until the value is returned from
the memory system. If a subsequent instruction tries to access this register before the new value is present, the pipe-
line will stall until the memory operation completes. This stall guarantees that instructions that require the use of
data resulting from the load do not use the previous or invalid data in the register, even though instructions are
allowed to start executing before the memory read completes.
This mechanism allows the execution of independent instructions between the load and the instructions that use the
read target without requiring the programmer or compiler to know how many cycles are actually needed for the
memory read operation to complete. If the instruction immediately following the load uses the same register, it sim-
ply stalls until the value is returned. Consequently, it operates as the programmer expects. However, if four other
instructions are placed after the load but before the instruction that uses the same register, all of them execute, and
the overall throughput of the processor is improved.
Alignment
Non-aligned memory operations are supported. Loads and stores with addresses which are not a multiple of the data
size access the bytes at sequential addresses starting with the address passed to the instruction, as expected. This may
generate multiple memory read or write operations, but generally the instruction will not take more cycles than the
equivalent two aligned loads and stores.
Aligned addresses are required in special circumstances, such as access to MMRs, I/O device space, exclusive loads
and stores, and extended data access. In these cases, an address which is not a multiple of the data size causes a Mis-
aligned Address exception.
For backward compatibility, some instructions in the quad 8-bit group and those used with the DISALGNEXCPT
instruction do not cause alignment exceptions, but ignore the low order bits of a non-aligned address to access
aligned data.
write does not stall the pipeline if it takes more cycles to propagate the value out to memory. This behavior could
cause a read that occurs in the program source code after a write in the program flow to actually return its value
before the write has been completed. This ordering provides significant performance advantages in the operation of
most memory instructions.
If the branch is taken, then the load is flushed from the pipeline, and any results that are in the process of being
returned can be ignored. Conversely, if the branch is not taken, the memory will have returned the correct value
earlier than if the operation were stalled until the branch condition was resolved.
Store operations never access memory speculatively because this could cause modification of a memory value before
it is determined whether or not the instruction should have executed.
before signalling that the data is available. Similarly, when writing to or reading from non-memory locations such as
off-chip I/O device registers, the order of how read and write operations complete is also significant. For example, a
read of a status register may depend on a write to a control register. If the address is the same, the read would return
a value from the store buffer rather than from the actual I/O device register, and the order of the read and write at
the register may be reversed. Both these phenomena could cause undesirable side-effects in the intended operation of
the program and peripheral.
Redundant memory reads can also be an issue. Interruptible load behavior can cause multiple memory read opera-
tions where only one was intended. For most memory accesses, multiple reads of the same memory address have no
side-effects; however, for some off-chip memory-mapped devices such as peripheral data FIFOs, reads are destructive
(i.e., each time the device is read, the FIFO advances and the data cannot be recovered and re-read). The redundant
memory reads due to speculation will also cause problems if the load from a peripheral with destructive read behav-
ior, such as a FIFO, is subsequently aborted.
Speculation can also be a problem where a load from an illegal address is aborted. A redundant memory read opera-
tion from a non-L1 address which does not map to any memory or device in the system will cause an External
Memory Addressing error. Note that the load might be aborted because it is in the shadow of a conditional jump
that tests whether or not the address is valid.
In summary, the following hazards exist:
• Reordering of an externally visible write with another externally visible action.
• Reordering of an externally visible read and write to the same location.
• Reordering of an externally visible read and write to different locations.
• Caches and write buffers can prevent memory operations becoming externally visible at all.
• Destructive reads that are not generated because they are serviced from the write buffer or cache.
• Repeated destructive reads due to interruptible loads.
• Unintended destructive reads due to speculative loads.
• Unintended access to illegal addresses causing spurious error interrupts.
The Blackfin+ architecture provides a number of solutions to these problems. Synchronization instructions (CSYNC
or SSYNC) may be used to impose a precise ordering at the points in the code where it is required while generally
retaining the benefits of weak ordering.
Cachebility properties may be specified in the CPLBs, which control the external visibility of memory operations
and specify some regions as I/O device space. Loads from MMRs and I/O device space are never executed specula-
tively and are non-interruptable. All reads and writes to MMRs are strongly ordered.
CPLBs may be used to avoid spurious illegal address exceptions, as the memory read operation will not be initiated
for a load that causes a page miss exception, but the exception will be suppressed in the case of a speculative load.
Synchronizing Instructions
When strong ordering of loads and stores is required, as may be the case for sequential accesses to shared memory,
use the core or system synchronization instructions, CSYNC or SSYNC, respectively.
The CSYNC instruction ensures that all pending core operations have completed and that the store buffer between
the processor core and the L1 memories has been flushed before proceeding to the next instruction. Pending core
operations include any pending interrupts, speculative states (such as branch predictions), and exceptions. Consider
the following example code sequence:
IF CC JUMP away_from_here;
CSYNC;
R0 = [P0];
away_from_here:
Cache Coherency
For shared data, software must provide cache coherency support, as required. To accomplish this, use the FLUSH
instruction (see the FLUSH description in Data Cache Control Instructions) and/or explicit line invalidation (see
Data Cache Invalidation).
Whenever the external visibility of reads and writes is a concern, the cachebility properties specified in the CPLBs
must be considered. A store to write-back L1 cache is only guaranteed to become visible externally after a FLUSH
instruction is executed, whereas stores to write-through L1 is visible after the memory write completes.
If the memory region is written by another core or device, there is no automatic coherence mechanism to ensure
earlier values in the cache are invalidated. As such, a load operation might return stale data from the cache unless
explicit line invalidation is used to ensure coherency.
Commonly shared data which is updated by more than one writer should be maintained in non-cacheable regions.
Memory-Mapped Registers
A portion of the address space is reserved for Memory-Mapped Registers (MMRs), which is split into a region for
system MMRs and a region for core MMRs. System MMRs are located in the memory space from
0x20000000-0x2FFFFFFF, and core MMRs are mapped to 0x1FC00000-0x1FFFFFFF. Refer to the Blackfin
Processor Hardware Reference for more information.
All MMRs are only accessible in Supervisor mode. Accesses to MMRs in User mode generate an Illegal Use of Su-
pervisor Resource exception. The same exception is also raised if a load or store to an MMR is issued in parallel with
another load or store. Loads from MMRs are non-speculative and non-interruptible. All loads and stores to each
MMR space are strongly ordered.
The core MMR space is located in the same memory region on every Blackfin+ core in a system. Core MMRs may
only be accessed by load and store instructions executed by the local core and are not accessible via DMA. Like non-
memory-mapped registers, the core MMRs connect to the 32-bit Register Access Bus (RAB) and are accessed at the
CCLK rate.
All core MMRs must be read and written with 32-bit-aligned accesses; however, some MMRs have fewer than 32
bits defined. In this case, the unused bits are reserved and must be written as zero when writing the register. System
MMRs connect through the system crossbars (SCBs) and must be accessed aligned to the data size. Accesses to non-
existent MMRs generate an Illegal Access exception, and writes to read-only MMRs are ignored.
Each chapter in this manual describing a portion of the processor architecture includes a description of any related
core MMRs. System MMRs are described in each chapter of the processor's hardware reference manual.
• Emulator hardware interrupt (IVG0): emulator hardware interrupts can interrupt I/O accesses. However, this
would only be during debugging sessions.
• Reset (IVG1): a reset event can interrupt non-speculative accesses. Since the core is being reset, this is expected
behavior.
• NMI (IVG2): Non-Maskable Interrupts (NMIs) come from three possible sources:
• External events on the NMI pin
• RAISE 2; instruction (which is committed before I/O starts, so it cannot interrupt an non-speculative
access)
• Parity errors (which can interrupt a non-speculative access if caused by DMA of cache victim traffic)
• Exception (IVG3): always related to an instruction executing. The non-speculative read mechanism is designed
to allow all exceptions in the pipeline before the non-speculative read to be taken before the non-speculative
read is placed on the bus. As such, there will never be an exception during an non-speculative read.
If a non-speculative read is interrupted, whether to an I/O device page or a MMR, the Non-Speculative Access Was
Aborted bit (SEQSTAT.NSPECABRT) is set. As the effect of the interrupted non-speculative read might be that a
read side-effect occurred but the read data was lost, this is a non-recoverable error condition.
/* critical section */
R1 = 0; /* unlocked value */
B[P0] = R1; /* unlock */
Semaphores controlling interaction between tasks on separate cores should be placed in non-cacheable, non-L1
memory that is accessible by both cores. In this case, the load and store exclusive instructions generate exclusive
transactions on the system bus, and the memory controller participates in a protocol that ensures that a store exclu-
sive instruction will fail if the memory location has been modified by another core since the corresponding load
exclusive instruction.
Semaphores controlling interaction between tasks on the same core may be placed in cacheable memory or L1
SRAM. In this case, the load and store exclusive instructions do not generate special memory transactions. The
SYNCEXCL instruction must be called in context switch code to clear any pending exclusive transactions and to pre-
serve the result of any store exclusive instructions in the CC bit of the preserved ASTAT register.
/* context switch */
SYNCEXCL;
[--SP] = ASTAT; /* saves store excl result if one was pending */
Interrupt handlers that are known not to use exclusive operations may leave the exclusive state unmodified. Any
pending exclusive write operations will complete and update the state in SEQSTAT which will be read by a
SYNCEXCL instruction upon returning from the interrupt.
SYNCEXCL should also be used in the exception handler to reset exclusive state on exceptions caused by load or store
exclusive instructions.
Load exclusive and store exclusive instructions must be aligned and may not be used with MMRs, I/O device space,
or extended data accesses.
Execution results for exclusive load instructions and exclusive store instructions vary, depending on whether the
memory addressed is shareable or non-shareable. The shareability of memory spaces is determined from the memory
space and the CPLB settings, as shown in the Memory Kinds table.
An exclusive load or exclusive store to an illegal memory location causes an exception. An exclusive load or exclusive
store to a non-shareable memory location succeeds, but the operation is not exclusive with respect to other cores. The
operation is exclusive with respect to other threads running on the core executing the instruction. An exclusive load
or exclusive store to a shareable memory location ensures exclusivity with respect to other cores by using exclusive
transactions on the memory bus. Exclusive transactions require hardware support in the memory device. If that sup-
port is not available, an uncached exclusive load from that memory will cause an exception.
/* critical section */
R1 = 0; /* unlock value */
B[P0] = R1; /* unlock */
NOTE: For more information, see the TESTSET instruction's reference page.
L1 Memory Microarchitecture
This section provides an overview of the L1 memory system in the Blackfin+ processors.
L1 Memory Access
Processor access to the L1 memory space is intended to occur in a single cycle. The L1 memory has four virtual
ports: DAG0 read, DAG1 read, Store, and DMA read/write. The memories used to implement L1 are physically
single-ported, so access conflicts are possible. To reduce the likelihood of such memory conflicts, L1 memory is divi-
ded into individually accessible 4 KB sub-banks, each having its own port multiplexor and a dedicated data bus. A
memory conflict will only occur when multiple ports request at least one byte from the same sub-bank in the same
cycle.
The incoming addresses for the four L1 ports are centrally decoded into sub-bank selects, which are then compared
for collisions. The collisions are prioritized, and the winner is allowed to access the memory. The losers receive a stall
and try again in the next cycle.
Load instructions present the read address to the memory system in pipeline stage F (DF1). The memory read oc-
curs in stage G (DF2). The result is returned to the processor's history buffer in stage H (EX1). Store instructions
are described in L1 Data Stores.
The DCPLB page descriptors generate exceptions in pipe stage G (DF2). The exception information is pipelined
along with the memory operation and deposited into the history buffer in pipe stage H (EX1).
DMA accesses use the same timing as DAG reads and writes, which is three cycles for reads and two cycles for
writes.
ADDRESS
RANGE 4 BYTES 4 BYTES
0x1180 7FF8
... 512 ROWS SUB BANK 6 HI SUB BANK 7 HI
0x1180 7000
0x1180 6FF8
... 512 ROWS SUB BANK 4 HI SUB BANK 5 HI
0x1180 6000
0x1180 5FF8
... 512 ROWS SUB BANK 2 HI SUB BANK 3 HI
0x1180 5000
0x1180 4FF8
... 512 ROWS SUB BANK 0 HI SUB BANK 1 HI
0x1180 4000
0x1180 3FF8
... 512 ROWS SUB BANK 6 LO SUB BANK 7 LO
0x1180 3000
0x1180 2FF8
... 512 ROWS SUB BANK 4 LO SUB BANK 5 LO
0x1180 2000
0x1180 1FF8
... 512 ROWS SUB BANK 2 LO SUB BANK 3 LO
0x1180 1000
0x1180 0FF8
... 512 ROWS SUB BANK 0 LO SUB BANK 1 LO
0x1180 0000
L1 Data Stores
Processor stores to data memory are more complicated than loads because the store address arrives in pipe stage F
(DF1), but the data does not arrive until pipe stage J (WB). As such, the Memory Controller must create a place-
holder for the address. When the data arrives, it is matched with the address. The address/data pair is then delivered
to L1 memory on the Store port. The block that records the write address is called the Read/Write Buffer.
The name Read/Write Buffer is derived from the fact that it handles uncached loads from the system, as well as all
stores. Each Read/Write buffer entry contains an address which describes a particular set of eight aligned memory
bytes, and data storage for those bytes. The number of bytes stored in a given buffer is determined by the datapath
width. When a new store enters pipe stage G (DF2), its address is compared to the addresses in all currently valid
Read/Write entries. If there is a match, then the existing buffer entry is used, otherwise a new buffer entry is alloca-
ted. The ID of the buffer entry is placed in a queue called the Store Queue. The Store Queue performs the function
of matching incoming store data with the proper Read/Write entry. When valid write data is present in the Read/
Write buffer, it gets scheduled for "draining" to L1 memory using the L1 Store port.
Write Gathering is supported. Consider a sequence of byte writes to L1 memory, starting with an aligned address.
The first byte will allocate a Read/Write entry, and the address of the second byte will match the Read/Write entry
of the first. These two bytes will use the same Read/Write entry and is called write gathering. The Read/Write buffer
entry tracks which of the eight bytes has been loaded with valid data. Only these bytes are written to memory.
Write Data Forwarding is supported. Consider a byte write, followed by a read of the same byte. The read operation
must wait for the write data to become available in the memory. Once the write data arrives at the memory system,
it often takes several cycles to get the data into the L1 memory sub-bank. Write forwarding short circuits this process
by delivering the write data to the history buffer as soon as it arrives at the memory controller.
plucked from the queue. When the read data returns, still tagged with the Read/Write ID, it is forwarded directly to
the history buffer.
System stores behave in a similar fashion to L1 stores. Multiple system stores into the same eight-byte aligned space
will gather into the same Read/Write buffer entry. Write drains to the system gasket are scheduled the same way as
L1 writes. When a system store causes a Read/Write entry to be allocated, subsequent system loads matching that
Read/Write address will use that entry's ID. This is necessary to ensure that the write data is properly forwarded to
the history buffer along with the bytes read from system memory.
System accesses may also be misaligned and result in a crossed access. In this case, two Read/Write entries will be
allocated. The two components of the crossed access are treated as separate transactions by the Read/Write buffer
and system gasket. The history buffer understands that crossed system reads are split and rejoins them before deliv-
ery to the core.
Reads of MMRs, I/O device space, and the read part of the TESTSET instruction are non-speculative and non-inter-
ruptible. These reads are not added to the System Read Queue. Instead, when the read operation is passed to the
history buffer in pipe stage H (EX1), a non-speculative request is sent to the sequencer. The sequencer will then
disable most interrupts and stall the pipeline below the non-speculative read while holding the non-speculative read
in stage H (EX1). Once this is done, the system read request is sent to the System Memory Interface. The non-
speculative read is then issued, and interrupts are re-enabled when the read data is returned.
NOTE: Do NOT set the SYSCFG.MEMSBYP bit when the system is programmed with a fractional
CCLK::SYSCLK ratio (M::N).
NOTE: Do NOT set the SYSCFG.MMRSBYP bit when the system is programmed with a fractional
CCLK::SYSCLK ratio (M::N).
L1 Cache Details
The upper 16 KB of L1 instruction SRAM, L1 data block A and L1 data block B, may be individually configured as
cache. When so configured, these 16 KB are reserved for cache lines fetched from non-L1 memory. The cache tags
occupy this memory space only when cache is enabled.
Both data block A and data block B can be configured as data cache. When both are enabled as cache, Address bits
14 and 23 of a cacheable load/store operation select which data cache is searched for a particular cache line (see Data
Cache Block Select).
When accessing a cache line present in L1 cache, the access timing is identical to a standard L1 memory access. On a
cache read, all lines in the set are read in parallel with the cache tag access, and the correct data is selected. On an
instruction fetch, all eight subbanks are read, and a data load may access up to four subbanks. These accesses will
collide with accesses to the other half of these subbanks, which remain in use as L1 SRAM.
from the address. When accessing cache tag and dirty arrays, PARSEL has no meaning, as the parity bits can be
directly read or written.
The following tables provide details of the data cache tags. Separate copies of each tag are maintained for accesses by
DAG0 and for accesses by DAG1.
A[18] PARSEL
A[17:14] 1111
Each access to the data cache dirty bits reads or writes the bits for both ways of two sets. Only even-addressed sets
can be addressed. As such, the following table always shows Set Index bit 0 and Way as zero.
Table 7-5: Data Cache Dirty Bits Extended Data Access Address
Address Bit Meaning
A[31:20] L1 Data Block A Address
Table 7-5: Data Cache Dirty Bits Extended Data Access Address (Continued)
A[18] PARSEL
A[17:11] 1110100
Table 7-6: Data Cache Dirty Bits Extended Data Access Value
Data Bit Value
D[31:8] 0
Cached data is stored in SRAM banks which may be accessed at their native address in ENX mode.
Table 7-7: Data Cache Data Extended Data Access Address and Mapping to L1 SRAM Subbank
L1 SRAM Address Bits Cache Location
A[31:21] L1 Data SRAM Address
A[20] Set Index Bit 8 Selects Block A or B
A[19] PARCTL
A[18] PARSEL
The 20-bit Tag, 9-bit Set Index, and 5-bit offset within the cache line can be combined to form the home address of
data in the cache.
Table 7-8: Data Address from Tag, Set Index, and Offset
L1DM_DCTL.CFG[1] L1DM_DCTL.DCBS Address
0 0 Tag[19:12], Tag[11], Tag[10:3], Tag[2], Tag[1], Set[7:0], Off-
set[4:0]
0 1 Tag[19:12], Tag[2], Tag[10:3], Tag[11], Tag[1], Set[7:0], Off-
set[4:0]
1 0 Tag[19:12], Tag[11], Tag[10:3], Set[8], Tag[1], Set[7:0], Off-
set[4:0]
1 1 Tag[19:12], Set[8], Tag[10:3], Tag[11], Tag[1], Set[7:0], Off-
set[4:0]
The instruction cache has a slightly different format because it has four ways and 128 sets, unlike the data cache
(which has two ways and 256 sets).
A[18] PARSEL
A[17:13] 11111
The Instruction Cache Tags extended data access value is the same as for data cache.
Table 7-10: Instruction Cache Data Extended Data Access Address and Mapping to L1 SRAM Subbank
L1 SRAM Address Bits Cache Location
A[31:20] L1 Instruction SRAM Address
A[19] PARCTL
A[18] PARSEL
A[13:12] Way
A[11:5] Set Index
Table 7-10: Instruction Cache Data Extended Data Access Address and Mapping to L1 SRAM Subbank (Continued)
Terminology
The following terminology is used to describe memory.
cache block
The smallest unit of memory that is transferred to/from the next level of memory from/to a cache memory as a
result of a cache miss.
cache hit
A memory access that is satisfied by a present entry in the cache with its Valid bit set.
cache line
Same as cache block.
cache miss
A memory access that does not match any valid entry in the cache.
direct-mapped
Cache architecture in which each line has only one place in which it can appear in the cache. Also described as 1-
Way associative.
dirty/modified
A state bit, stored along with the tag, indicating whether the data in the data cache line has been changed since it
was copied from the source memory and, therefore, needs to be updated in that source memory.
exclusive, clean
The state of a data cache line, indicating that the line is valid and that the data contained in the line matches that in
the source memory. The data in a clean cache line does not need to be written to source memory before it is re-
placed.
fully associative
Cache architecture in which each line can be placed anywhere in the cache.
index
Address portion that is used to select an array element (for example, a line index).
invalid
Describes the state of a cache line. When a cache line is invalid, a cache line match cannot occur.
little endian
The native data store format of the Blackfin processor. Words and half words are stored in memory (and registers)
with the least significant byte at the lowest byte address and the most significant byte in the highest byte address of
the data storage location.
replacement policy
The function used by the processor to determine which line to replace on a cache miss. In Blackfin+ processors, a
round-robin algorithm is employed (ways are iteratively cycled through as replacement is required).
set
A group of N-line storage locations in the Ways of an N-Way cache, selected by the INDEX field of the address (see
the Cache Lines figures).
set associative
Cache architecture that limits line placement to a number of sets (or Ways).
tag
Upper address bits, stored along with the cached data line, to identify the specific address source in memory that the
cached line represents.
valid
A state bit, stored with the tag, indicating that the corresponding tag and data are current/correct and can be used to
satisfy memory access requests.
victim
A dirty cache line that must be written to memory before it can be replaced to free space for a cache line allocation.
Way
An array of line storage elements in an N-Way cache (see the Cache Lines figures).
write back
A cache write policy, also known as copyback
NOTE: To ensure proper behavior and future compatibility, all reserved bits in this register must be cleared when-
ever this register is written.
For instruction fetch operations, L1IM_ICPLB_ADDR[n] defines the start address of the page described by the
ICPLB descriptor, and the associated L1IM_ICPLB_DATA[n] register defines the properties of the page described
by the ICPLB descriptor.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ADDR[5:0] (R/W)
ICPLB Page Address
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ADDR[21:6] (R/W)
ICPLB Page Address
NOTE: To ensure proper behavior and future compatibility, all reserved bits in this register must be cleared when-
ever this register is written.
For instruction fetch operations, L1IM_ICPLB_ADDR[n] defines the start address of the page described by the
ICPLB descriptor, and the associated L1IM_ICPLB_DATA[n] register defines the properties of the page described
by the ICPLB descriptor.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
UREAD (R/W)
Allow User Read
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PSIZE (R/W)
Page Size
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ADDR[15:0] (R)
Fault Address
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ADDR[31:16] (R)
Fault Address
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
CBYPASS (R/W)
Cache Bypass Enable
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
FAULT (R)
Fault Status
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
NOTE: To ensure proper behavior and future compatibility, all reserved bits in this register must be cleared when-
ever this register is written.
For data fetch operations, L1DM_DCPLB_ADDR[n] defines the start address of the page described by the DCPLB
descriptor, and the associated L1DM_DCPLB_DATA[n] register defines the properties of the page described by the
DCPLB descriptor.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ADDR[5:0] (R/W)
Address Value
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ADDR[21:6] (R/W)
Address Value
NOTE: To ensure proper behavior and future compatibility, all reserved bits in this register must be cleared when-
ever this register is written.
For data fetch operations, L1DM_DCPLB_ADDR[n] defines the start address of the page described by the DCPLB
descriptor, and the associated L1DM_DCPLB_DATA[n] register defines the properties of the page described by the
DCPLB descriptor.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
UWRITE (R/W)
User Mode Write
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PSIZE (R/W)
Page Size
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
SYSSWRITE (R/W)
System Supervisor Mode Write
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 Restricted
1 Permitted
10 L1UWRITE L1 User Mode Write.
(R/W) The L1DM_DCPLB_DFLT.L1UWRITE bit selects the default for User mode write ac-
cess to L1 memory, which determines whether to permit or restrict User mode write
access. If a write is attempted while restricted, the access generates a protection viola-
tion exception. This default setting is overridden by the
L1DM_DCPLB_DATA[n].UWRITE bit in a valid enabled DCPLB.
0 Restricted
1 Permitted
0 Restricted
1 Permitted
6 SYSUWRITE System User Mode Write.
(R/W) The L1DM_DCPLB_DFLT.SYSUWRITE bit selects the default for User mode write ac-
cess to system memory space, which determines whether to permit or restrict User
mode write access. If a write is attempted while restricted, the access generates a pro-
tection violation exception. This default setting is overridden by the
L1DM_DCPLB_DATA[n].UWRITE bit in a valid enabled DCPLB.
0 Restricted
1 Permitted
0 Restricted
1 Permitted
4 SYSEOM System Exception On Miss Disable.
(R/W) The L1DM_DCPLB_DFLT.SYSEOM bit disables access exception generation on a DAG
DCPLB miss to system memory space. Default access protection for system memory
space is controlled by the L1DM_DCPLB_DFLT.SYSCPROPS,
L1DM_DCPLB_DFLT.SYSUREAD, L1DM_DCPLB_DFLT.SYSUWRITE, and
L1DM_DCPLB_DFLT.SYSSWRITE bits.
0 Generate exception
1 Disable exception generation
2:0 SYSCPROPS System Cacheability Properties.
(R/W) The L1DM_DCPLB_DFLT.SYSCPROPS bits select the default system memory cachea-
bility properties, which determine whether or not the region cacheable in L1 memory.
This default setting is overridden by the L1DM_DCPLB_DATA[n].CPROPS bits in a
valid enabled DCPLB.
0 Non-cacheable
1 Write-back cacheable in L1
2 Non-cacheable
3 Write-back cacheable
4 I/O device space
5 Write-through cacheable in L1
6 Non-cacheable
7 Write-through cacheable in L1
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ADDR[15:0] (R)
Fault Address
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ADDR[31:16] (R)
Fault Address
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
DCBS (R/W)
Data Cache Bank Select
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ENX (R/W)
Extended Data Access Enable
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
FAULT (R)
CPLB Fault Indicator
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0
ADDR (R)
Address Value
The instruction reference pages provide detailed information about the syntax and operation of each instruction in
the processor's instruction set. The reference groups the instructions by type and by operation. This grouping stems
from the portion of the processor core (see the Blackfin+ Core Block Diagram figure), on which each instruction
executes. Because each instruction uses specific resources (portions of the processor architecture), understanding the
relationship between the instructions and the architecture can greatly influence how to write efficient code and ach-
ieve optimum code density (applying instruction parallelism).
• Arithmetic Instructions -- execute within the data arithmetic unit
• Sequencer Instructions -- execute within the control unit
• Memory or Pointer Instructions -- execute within the address arithmetic unit
• Specialized Compute Instructions -- execute within the data arithmetic unit
ADDRESS ARITHMETIC UNIT
SP
I3 L3 B3 M3 FP
I2 L2 B2 M2 P5
I1 L1 B1 M1 DAG1 P4
I0 L0 B0 M0 P3
DAG0
P2
DA1 32
P1
DA0 32
P0
TO MEMORY
32 32
RAB PREG
SD 32
LD1 32 32 ASTAT
LD0 32
32
SEQUENCER
R7.H R7.L
R6.H R6.L
R5.H R5.L ALIGN
16 32 16
R4.H R4.L
8 8 8 8
R3.H R3.L
R2.H R2.L DECODE
R1.H R1.L BARREL
R0.H R0.L 40 SHIFTER 40 LOOP BUFFER
72
40 A0 A1 40 CONTROL
UNIT
32 32
NOTE: Arithmetic instructions generate status, indicating information about the result of the operation. For more
information, see Arithmetic Status Register . To optimize program execution, many 16- and 32-bit in-
structions may be issued in parallel. For more information, see Issuing Parallel Instructions.
For more information about ADSP-BF70x processor family core architecture or memory infrastructure, see the cor-
responding chapters of this text. For information about ADSP-BF70x processor peripherals, see the hardware refer-
ence manual.
Each instruction reference page provides the following information:
• Syntax -- each section of a syntax table identifies the underlying instruction encoding (e.g., ALU Operations
(Dsp32Alu)). Each line of a syntax table defines the permitted processor resource classes (e.g., a register type)
that is allowed in that syntax position. To see the list of resources in a particular class, follow the link for that
resource class from the syntax line (e.g., DDST0_HL).
• Data Flow -- for instructions with sophisticated data placement options, a data flow diagram is provided (not
provided for all instructions).
• Abstract -- brief (1-2 sentence) description of the instruction.
• Description -- provides a full description of the instruction, including execution options, instruction encoding
size, instruction parallelism (if applicable), any special applications, and affect on status flags (if applicable).
• ASTAT Flags -- a table (where applicable) detailing the status flags affected by the instruction's execution. For
instructions that do not affect status, this section is omitted.
• Example -- provide a code snippet that demonstrates the instruction and its options.
Arithmetic Instructions
The arithmetic instructions provide operations which execute on the data arithmetic unit in the processor core.
Users can take advantage of these instructions to add, subtract, divide, and multiply, as well as to calculate and store
absolute values, detect exponents, round, saturate, and return the number of sign bits.
SP
I3 L3 B3 M3 FP
I2 L2 B2 M2 P5
I1 L1 B1 M1 DAG1 P4
I0 L0 B0 M0 P3
DAG0
P2
DA1 32
P1
DA0 32
P0
TO MEMORY
32 32
RAB PREG
SD 32
LD1 32 32 ASTAT
LD0 32
32
SEQUENCER
R7.H R7.L
R6.H R6.L
R5.H R5.L ALIGN
16 32 16
R4.H R4.L
8 8 8 8
R3.H R3.L
R2.H R2.L DECODE
R1.H R1.L BARREL
R0.H R0.L 40 SHIFTER 40 LOOP BUFFER
72
40 A0 A1 40 CONTROL
UNIT
32 32
Abstract
This instruction adds or subtracts two signed register halves.
See Also (Vectored 16-Bit Add or Subtract (AddSubVec16))
AddSub16 Description
The AddSub16 instruction adds or subtracts two source values and places the result in a destination register with or
without result saturation.
AddSub16 accepts any combination of upper and lower half-register operands, and places the results in the upper or
lower half of the destination register at the user's discretion.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
In the syntax, where SAT2 appears, substitute a saturation option (s or ns). See the Saturation topic in the Introduc-
tion chapter for a description of saturation behavior.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see the Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
AddSub16 Example
/* If r0.l = 0x7000 and r7.l = 0x2000, then . . . */
r4.l = r0.l + r7.l (ns) ; /* produces r4.l = 0x9000 (no saturation is enforced) */
r4.l = r0.l + r7.h (s) ; /* produces r4.l = 0x7FFF (saturated to maximum positive value)
*/
Abstract
This instruction adds or subtracts two set s of two signed 16-bit vectors, and it deposits them into two destination
registers. Optionally, the result of the additions can be saturated. Also, the y inputs can be "crossed" so that instead
of adding Rx.H + Ry.H, the crossed inputs allow for Rx.H + Ry.L(and so on). The output halves can also be
crossed on compute unit 0.
See Also (16-Bit Add or Subtract (AddSub16))
AddSubVec16 Description
The Vector Add / Subtract instruction simultaneously adds and/or subtracts two pairs of registered numbers. It then
stores the results of each operation into a separate 32-bit data register or 16-bit half register, according to the syntax
used. The destination register for each of the quad or dual versions must be unique.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
The AddSubVec16 instruction supports dual and quad 16-Bit operations.
In the syntax, where SX appears (for dual 16-bit operations), substitute a saturation and/or cross output option (s,
co, or sco) . In the syntax, where SXA appears (for quad 16-bit operations), substitute one of the SX values, substi-
tute an arithmetic shift right or left option (asr or asl) ASR (arithmetic shift right). The options shown for quad 16-
bit operations are scaling options. See the Saturation topic in the Introduction chapter for a description of saturation
behavior.
NOTE: A special application of the AddSubVec16 instruction is the FFT butterfly routines in which each of the
registers is considered a single complex number often use the Vector Add / Subtract instruction.
/* If r1 = 0x0003 0004 and r2 = 0x0001 0002, then . . . */
r0 = r2 +|- r1(co) ; /* . . . produces r0 = 0xFFFE 0004 */
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see the Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
AddSubVec16 Example
r5 = r3 +|+ r4 ;
/* dual 16-bit operations, add|add */
r6 = r0 -|+ r1(s) ;
/* same as above, subtract|add with saturation */
r0 = r2 +|- r1(co) ;
/* add|subtract with half-word results crossed over in the destination register */
r7 = r3 -|- r6(sco) ;
/* subtract|subtract with saturation and half-word results crossed over in the
destination register */
r5 = r3 +|+ r4, r7 = r3-|-r4 ;
/* quad 16-bit operations, add|add, subtract|subtract */
r5 = r3 +|- r4, r7 = r3 -|+ r4 ;
/* quad 16-bit operations, add|subtract, subtract|add */
r5 = r3 +|- r4, r7 = r3 -|+ r4(asr) ;
/* quad 16-bit operations, add|subtract, subtract|add, with all results divided
by 2 (right shifted 1 place) before storing into destination register */
r5 = r3 +|- r4, r7 = r3 -|+ r4(asl) ;
/* quad 16-bit operations, add|subtract, subtract|add, with all results
multiplied by 2 (left shifted 1 place) before storing into destination register dual
*/
Abstract
This instruction allows the user to add a constant to a register. This instruction does not saturate on overflow
See Also (32-bit Add or Subtract (AddSub32), 32-bit Add and Subtract (AddSub32Dual), 32-Bit Add or Subtract
with Carry (AddSubAC0))
AddImm Description
The Add Immediate instruction adds a constant value to a register without saturation.
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), but may not be issued in
parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
AddImm Example
r0 += 40 ; /* increment r0 value by 40 and store in r0 */
Abstract
This instruction adds or subtracts two signed registers. The ALU does not saturate the result by default.
See Also (32-bit Add and Subtract (AddSub32Dual), 32-Bit Add or Subtract with Carry (AddSubAC0), 32-bit Add
Constant (AddImm))
AddSub32 Description
The AddSub32 instruction adds or subtracts two source values and places the result in a destination register with or
without result saturation.
AddSub32 accepts any combination of register operands, and places the results in the destination register at the us-
er's discretion.
This instruction is encoded as a 16-bit instruction if the NSAT option is omitted. The 16-bit encoded instruction
16-bit instruction takes up less memory space (over a 32-bit encoded instruction), but may not be issued in parallel
with other instructions.
When the NSAT option is included, the instruction is encoded as a 32-bit instruction. The 32-bit encoded instruc-
tion can sometimes save execution time (over a 16-bit encoded instruction), because it can be issued in parallel with
certain other instructions.
This instruction may be used in either User or Supervisor mode.
In the syntax, where NSAT appears, substitute a saturation option (s or ns). See the Saturation topic in the Introduc-
tion chapter for a description of saturation behavior.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
AddSub32 Example
r5 = r2 + r1 ; /* 16-bit instruction length add, no saturation */
r5 = r2 + r1 (ns) ; /* same result as above, but 32-bit instruction length */
r5 = r2 + r1 (s) ; /* saturate the result */
Abstract
This instruction adds and subtracts two signed registers. The ALU does not saturate the result by default.
See Also (32-bit Add or Subtract (AddSub32), 32-Bit Add or Subtract with Carry (AddSubAC0), 32-bit Add Con-
stant (AddImm))
AddSub32Dual Description
The AddSub32Dual instruction simultaneously adds and/or subtracts two pairs of registered numbers. Then, the
instruction stores the results of each operation into a separate 32-bit data register with or without result saturation.
Each destination register must be unique.
AddSub32Dual accepts any combination of register operands, and places the results in the destination registers at
the user's discretion.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
In the syntax, where SAT appears, substitute a saturation option (s or ns). See the Saturation topic in the Introduction
chapter for a description of saturation behavior.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
AddSub32Dual Example
r2=r0+r1, r3=r0-r1 ; /* dual 32-bit operations */
r2=r0+r1, r3=r0-r1 (s) ; /* dual 32-bit operations with saturation */
Abstract
This instruction adds or subtracts two 32-Bit numbers plus a carry bit. This operation is used to to multi-precision
addition. Optionally, the user can saturate the result.
See Also (32-bit Add or Subtract (AddSub32), 32-bit Add and Subtract (AddSub32Dual), 32-bit Add Constant
(AddImm))
AddSubAC0 Description
The AddSubAC0 instruction adds or subtracts two source values plus a carry bit and places the result in a destina-
tion register with or without result saturation.
AddSub32 accepts any combination of register operands, and places the results in the destination register at the us-
er's discretion.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
When the SAT option is included, the instruction is encoded as a 32-bit instruction. The 32-bit encoded instruction
can sometimes save execution time (over a 16-bit encoded instruction), because it can be issued in parallel with cer-
tain other instructions.
This instruction may be used in either User or Supervisor mode.
In the syntax, where SAT appears, substitute a saturation option (s or ns). See the Saturation topic in the Introduction
chapter for a description of saturation behavior.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
AddSubAC0 Example
r5 = r2 + r1 + ac0; /* add with carry, no saturation implied */
r5 = r2 + r1 + ac0 (ns) ; /* same result as above */
r5 = r2 + r1 + ac0 (s) ; /* saturate the result */
Abstract
This instruction adds the two signed accumulators together, then extracts the result to a register.
See Also (Accumulator Add or Subtract (AddSubAcc), Dual Accumulator Add and Subtract to Registers (AddSu-
bAccExt))
AddAccExt Description
The AddAccExt instruction increments the 40-bit A0 accumulator register by A1 with saturation at 40 bits, then
extract the result into a 32-bit register with saturation at 32 bits.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
See the Saturation topic in the Introduction chapter for a description of saturation behavior.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
AddAccExt Example
r5 = (a0 += a1) ;
r2.l = (a0 += a1) ;
r5.h = (a0 += a1) ;
Abstract
This instruction adds or subtracts two signed accumulators. The ALU saturates the result on overflow.
See Also (Accumulator Add and Extract (AddAccExt), Dual Accumulator Add and Subtract to Registers (AddSu-
bAccExt))
AddSubAcc Description
The AddSubAcc instruction adds or subtracts two source values in the accumulator registers and places the result in
a destination acculator register with or without result saturation.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
The syntax of this instruction provides optional saturation/sign-extension of the result.
• (W32) - signed saturate the result at 32 bits, sign extended
See the Saturation topic in the Introduction chapter for a description of saturation behavior.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
AddSubAcc Example
a0 += a1 ; /* no saturation */
a0 += a1 (w32) ; /* signed saturate at 32 bits, sign extended */
a0 -= a1 ; /* no saturation */
a0 -= a1 (w32) ; /* signed saturate at 32 bits, sign extended */
Abstract
This instruction adds and subtracts the two accumulators together, then extracts the results.
See Also (Accumulator Add or Subtract (AddSubAcc), Accumulator Add and Extract (AddAccExt))
AddSubAccExt Description
The AddSubAccExt instruction simultaneously adds and subtracts the two 40-bit accumulator registers. Then, the
instruction stores the results of each operation into a separate 32-bit data register with or without result saturation.
Each destination register must be unique.
AddSubAccExt accepts any combination of register operands, and places the results in the destination registers at the
user's discretion.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
In the syntax, where SAT appears, substitute a saturation option (s or ns). See the Saturation topic in the Introduction
chapter for a description of saturation behavior.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
AddSubAccExt Example
r4=a1+a0, r6=a1-a0 ;
/* dual 40-bit accumulator operations with no saturation, A0 added/subtracted from A1 */
r4=a0+a1, r6=a0-a1(s) ;
/* dual 40-bit accumulator operations with saturation, A1 subtracted from A0 */
Abstract
This instruction adds then shifts left one or two places. This instruction always saturates on overflow.
AddSubShift Description
The AddSubShift instruction combines an addition operation with a one- or two-place logical shift left. The left
shift accomplishes a x2 (for shift 1) or x4 (for shift 2) multiplication on sign-extended numbers. Saturation is not
supported.
This instruction does not intrinsically modify values that are strictly input. However, the destination register (DDST)
serves as both an input operand and the result destination, so the DDST is intrinsically modified.
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), but may not be issued in
parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
AddSubShift Example
r3 = (r3+r2)<<1 ; /* r3 = (r3 + r2) * 2 */
r3 = (r3+r2)<<2 ; /* r3 = (r3 + r2) * 4 */
Bit Operations
These operations provide bitwise shift operations on registers operands:
• Ones Count (Shift_Ones)
• Redundant Sign Bits (Shift_SignBits32)
• Redundant Sign Bits (Shift_SignBitsAcc)
• Bit Mux (BitMux)
• Bit Modify (Shift_BitMod)
• Bit Test (Shift_BitTst)
• Deposit Bits (Shift_Deposit)
• Extract Bits (Shift_Extract)
Shift (Dsp32Shf )
DREG_L Register Type = ones DREG Register Type
Abstract
This instruction counts the number of 1's in a XOP register.
See Also (Redundant Sign Bits (Shift_SignBits32), Redundant Sign Bits (Shift_SignBitsAcc))
Shift_Ones Description
The Shift_Ones instruction (one's-population count) loads the number of 1's contained in the souce register
(DSRC1) into the lower half of the destination register (DDST_L).
The range of possible values loaded into DDST_L is 0 through 32.
The DDST_L and DSRC1 can be the same D-register. Otherwise, the One's-Population Count instruction does not
modify the contents of DSRC1.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
Shift_Ones Example
r3.l = ones r7 ;
Shift (Dsp32Shf )
DREG_L Register Type = signbits DREG Register Type
DREG_L Register Type = signbits DREG_L Register Type
DREG_L Register Type = signbits DREG_H Register Type
Abstract
This instruction returns the number of redundant sign bits. For example, if there are five sign bits, this instruction
returns 4. The result can then be used with ASHIFT to normalize the data.
See Also (Ones Count (Shift_Ones), Redundant Sign Bits (Shift_SignBitsAcc))
Shift_SignBits32 Description
The SignBits32 instruction returns the number of sign bits in a number, and can be used in conjunction with a shift
to normalize numbers. This instruction can operate on 16-bit or 32-bit input numbers.
• For a 16-bit input, Sign Bit returns the number of leading sign bits minus one, which is in the range 0 through
15. There are no special cases. An input of all zeros returns +15 (all sign bits), and an input of all ones also
returns +15.
• For a 32-bit input, Sign Bit returns the number of leading sign bits minus one, which is in the range 0 through
31. An input of all zeros or all ones returns +31 (all sign bits).
The result of the SignBits32 instruction can be used directly as the argument to an arithmetic shift instruction
(AShift) to normalize the number. Resultant numbers will be in the following formats (S == signbit, M == magni-
tude bit).
In addition, the SignBits32 instruction result can be subtracted directly to form the new exponent.
The SignBits32 instruction does not implicitly modify the input value. For 32-bit and 16-bit input, the destination
register (DDST_L) and source sample register (DSRC1) can be the same D-register. Doing this explicitly modifies the
DSRC1.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
Shift_SignBits32 Example
r2.l = signbits r7 ;
r1.l = signbits r5.l ;
r0.l = signbits r4.h ;
Shift (Dsp32Shf )
Abstract
This instruction returns the number of redundant sign bits. For example, if there are five sign bits, this instruction
returns 4. The result can then be used with ASHIFT to normalize the data.
See Also (Ones Count (Shift_Ones), Redundant Sign Bits (Shift_SignBits32))
Shift_SignBitsAcc Description
The SignBitsAcc instruction returns the number of sign bits in a number, and can be used in conjunction with a
shift to normalize numbers. This instruction can operate on 40-bit input numbers.
• For a 40-bit Accumulator input, Sign Bit returns the number of leading sign bits minus 9, which is in the range
-8 through +31. A negative number is returned when the result in the Accumulator has expanded into the ex-
tension bits; the corresponding normalization will shift the result down to a 32-bit quantity (losing precision).
An input of all zeros or all ones returns +31.
The result of the SignBitsAcc instruction can be used directly as the argument to an arithmetic shift instruction
(AShift) to normalize the number. Resultant numbers will be in the following formats (S == signbit, M == magni-
tude bit).
40-bit: SSSS SSSS S.MMM MMMM MMMM MMMM MMMM MMMM MMMM MMMM
In addition, the SignBitsAcc instruction result can be subtracted directly to form the new exponent.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
Shift_SignBitsAcc Example
r6.l = signbits a0 ;
r5.l = signbits a1 ;
Shift (Dsp32Shf )
bitmux (DREG Register Type, DREG Register Type, a0) (asr)
bitmux (DREG Register Type, DREG Register Type, a0) (asl)
Abstract
This instruction merges two bit streams into the accumulator. Each time you call this instruction, it takes a single bit
from the two source registers, muxes them together, and deposits them into the accumulator. The streams can be
taken from the MSBs of the register pair and shifted into the LSBs of the accumulator (ASL) or taken from the LSBs
of the registers and deposited into the MSBs of the accumulator (ASR).
See Also (Bit Modify (Shift_BitMod), Bit Test (Shift_BitTst))
BitMux Description
The BitMux instruction merges bit streams.
The instruction has two versions, shift right and shift left. This instruction overwrites the contents of source 1
(DSRC1) and source 0 (DSRC0). See the Contents Before Shift table, A Shift Right Instruction table, and A Shift Left
Instruction table.
In the Shift Right version, the processor performs the following sequence.
1. Right shift Accumulator A0 by one bit. Right shift the LSB of DSRC1 into the MSB of the Accumulator.
2. Right shift Accumulator A0 by one bit. Right shift the LSB of DSRC0 into the MSB of the Accumulator.
In the Shift Left version, the processor performs the following sequence.
1. Left shift Accumulator A0 by one bit. Left shift the MSB of DSRC0 into the LSB of the Accumulator.
2. Left shift Accumulator A0 by one bit. Left shift the MSB of DSRC1 into the LSB of the Accumulator.
DSRC1 and DSRC0 must not be the same D-register.
Accumulator A0: zzzz zzzz zzzz zzzz zzzz zzzz zzzz zzzz zzzz zzzz
Accumulator A0:*3 yxzz zzzz zzzz zzzz zzzz zzzz zzzz zzzz zzzz zzzz
Accumulator A0:*3 zzzz zzzz zzzz zzzz zzzz zzzz zzzz zzzz zzzz zzyx
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
BitMux Example
bitmux (r2, r3, a0) (asr) ; /* right shift*/
• If
• R2=0b1010 0101 1010 0101 1100 0011 1010 1010
• R3=0b1100 0011 1010 1010 1010 0101 1010 0101
• A0=0b0000 0000 0000 0000 0000 0000 0000 0000 0000 0111
then the Shift Right instruction produces:
• R2=0b0101 0010 1101 0010 1110 0001 1101 0101
• R3=0b0110 0001 1101 0101 0101 0010 1101 0010
• A0=0b1000 0000 0000 0000 0000 0000 0000 0000 0000 0001
bitmux (r3, r2, a0) (asl) ; /* left shift*/
• If
• R3=0b1010 0101 1010 0101 1100 0011 1010 1010
• R2=0b1100 0011 1010 1010 1010 0101 1010 0101
• A0=0b0000 0000 0000 0000 0000 0000 0000 0000 0000 0111
then the Shift Left instruction produces:
• R2=0b1000 0111 0101 0101 0100 1011 0100 1010
• R3=0b0100 1011 0100 1011 1000 0111 0101 0100
• A0=0b0000 0000 0000 0000 0000 0000 0000 0000 0001 1111
Abstract
This instruction takes the data register specified and clears, sets, or toggles a bit.
See Also (Bit Mux (BitMux), Bit Test (Shift_BitTst))
Shift_BitMod Description
The BitMod instruction includes BitSet, BitTgl, and BitTst forms:
• The BitSet (bit set) instruction sets the bit designated by the bit position (source immediate value, SRCI) in the
specified D-register destination (DDST). It does not affect other bits in the D-register.
The SRCI range of values is 0 through 31, where 0 indicates the LSB, and 31 indicates the MSB of the 32-bit
D-register.
• The BitTgl (bit toggle) instruction inverts the bit designated by SRCI in the specified D-register. The instruc-
tion does not affect other bits in the D-register.
The SRCI range of values is 0 through 31, where 0 indicates the LSB, and 31 indicates the MSB of the 32-bit
D-register.
• The BitClr (bit clear) instruction clears the bit designated by SRCI in the specified D-register. It does not affect
other bits in that register.
The SRCI range of values is 0 through 31, where 0 indicates the LSB, and 31 indicates the MSB of the 32-bit
D-register.
This 16-bit instruction instruction takes up less memory space (over a 32-bit encoded instruction), but may not be
issued in parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Shift_BitMod Example
BitSet Example
bitset (r2, 7) ; /* set bit 7 (the eighth bit from LSB) in R2 */
For example, if R2 contains 0x00000000 before this instruction, it contains 0x00000080 after the instruction.
BitTgl Example
bittgl (r2, 24) ; /* toggle bit 24 (the 25th bit from LSB in R2 */
For example, if R2 contains 0xF1FFFFFF before this instruction, it contains 0xF0FFFFFF after the instruction. Exe-
cuting the instruction a second time causes the register to contain 0xF1FFFFFF.
BitClr Example
bitclr (r2, 3) ; /* clear bit 3 (the fourth bit from LSB) in R2 */
For example, if R2 contains 0xFFFFFFFF before this instruction, it contains 0xFFFFFFF7 after the instruction.
Abstract
This instruction sets CC bits if the specified condition is true. In the bittst case, the CC bit is set if the specified bit
is a 1. For the !bittst case, it is set if the bit is a zero.
See Also (Bit Mux (BitMux), Bit Modify (Shift_BitMod))
Shift_BitTst Description
The Bit Test instruction sets or clears the CC bit, based on the bit designated by the bit position (source immediate
value, SRCI) in the specified D-register destination (DDST).
One version tests whether the specified bit is set; the other tests whether the bit is clear. The instruction does not
affect other bits in the D-register.
The SRCI range of values is 0 through 31, where 0 indicates the LSB, and 31 indicates the MSB of the 32-bit D-
register.
This 16-bit instruction instruction takes up less memory space (over a 32-bit encoded instruction), but may not be
issued in parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Shift_BitTst Example
cc = bittst (r7, 15) ; /* test bit 15 TRUE in R7 */
For example, if R7 contains 0xFFFFFFFF before this instruction, CC is set to 1, and R7 still contains 0xFFFFFFFF
after the instruction.
cc = ! bittst (r3, 0) ; /* test bit 0 FALSE in R3 */
Shift (Dsp32Shf )
DREG Register Type = deposit (DREG Register Type, DREG Register Type)
DREG Register Type = deposit (DREG Register Type, DREG Register Type) (x)
Abstract
The bit field deposit instruction merges the background data in SRC1 with a foreground bit field in SRC0.h and
saves the result into DEST.
See Also (Extract Bits (Shift_Extract))
Shift_Deposit Description
The Bit Field Deposit instruction merges the background data in SRC1 with a foreground bit field in SRC0.h and
saves the result into DEST. The length of the bit field is stored in SRC0.b0 and the position of the field is stored in
SRC0.b1. This takes the lower SRC0.b0 bits from SRC0.h and deposits them into SRC1 at bit SRC0.b1. The (X)
syntax sign-extends the field. If you are not sign extending the bits above the inserted field are unchanged.
Field deposit
31 16 15 8 7 0
+-----------------+--------+--------+
src0: |xxxxxxxxxxxxxSNN | P | L | L--length, P--position
+-----------------+--------+--------+
+-----------------+-----------------+
src1 |bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbb|
+-----------------+-----------------+
+-----------------+-----------------+
dst0: |bbbbbbbbbbbbbbSNN bbbbbbbbbbbbbbbbb| SN--inserted field in src0
+-----------------+-----------------+ b--previous contents of src1
The Bit Field Deposit instruction merges the background bit field in the background register (DSRC1) with the fore-
ground bit field in the upper half of the foreground register (DSRC0) and saves the result into the destination register
(DDST). The user determines the length of the foreground bit field and its position in the background field.
The input register bit field definitions appear in the Input Register Bit Field Definitions table.
DSRC1*1
bbbb bbbb bbbb bbbb bbbb bbbb bbbb bbbb
DSRC0*2
nnnn nnnn nnnn nnnn xxxp pppp xxxL LLLL
The operation writes the foreground bit field of length L over the background bit field with the foreground LSB
located at bit p of the background.
There are a number of boundary cases related to Shift_Deposit instruction operation that should be considered.
• Unsigned syntax, L = 0: The architecture copies DSRC1 contents without modification into DDST. By defini-
tion, a foreground of zero length is transparent.
• Sign-extended, L = 0 and p = 0: This case loads 0x0000 0000 into DDST. The sign of a zero length, zero posi-
tion foreground is zero; therefore, sign-extended is all zeros.
• Sign-extended, L = 0 and p = 0: The architecture copies the lower order bits of DSRC1 below position p into
DDST, then sign-extends that number. The foreground value has no effect. For instance, if:
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Shift_Deposit Example
Bit Field Deposit Unsigned
• If
• R4=0b1111 1111 1111 1111 1111 1111 1111 1111 where this is the background bit field
• R3=0b0000 0000 0000 0 000 0000 0111 0000 0011 where bits 31-16 are the foreground bit
field, bits 15-8 are the position, and bits 7-0 are the length
then the Bit Field Deposit (unsigned) instruction produces:
• R7=0b1111 1111 1111 1111 1111 11 00 0 111 1111
• If
• R4=0b1111 1111 1111 1111 1111 1111 1111 1111 where this is the background bit field
• R3=0b0000 000 0 1111 1010 0000 1101 0000 1001 where bits 31-16 are the foreground bit
field, bits 15-8 are the position, and bits 7-0 are the length
then the Bit Field Deposit (unsigned) instruction produces:
• R7=0b1111 1111 11 01 1111 010 1 1111 1111 1111
Bit Field Deposit Sign-Extended
• If
• R4=0b1111 1111 1111 1111 1111 1111 1111 1111 where this is the background bit field
• R3=0b0101 1010 0101 1 010 0000 0111 0000 0011 where bits 31-16 are the foreground bit
field, bits 15-8 are the position, and bits 7-0 are the length
then the Bit Field Deposit (unsigned) instruction produces:
• R7=0b 0000 0000 0000 0000 0000 0001 0 111 1111
• If
• R4=0b1111 1111 1111 1111 1111 1111 1111 1111 where this is the background bit field
• R3=0b0000 100 1 1010 1100 0000 1101 0000 1001 where bits 31-16 are the foreground bit
field, bits 15-8 are the position, and bits 7-0 are the length
then the Bit Field Deposit (unsigned) instruction produces:
• R7=0b 1111 1111 1111 0101 100 1 1111 1111 1111
Shift (Dsp32Shf )
DREG Register Type = extract (DREG Register Type, DREG_L Register Type) (z)
DREG Register Type = extract (DREG Register Type, DREG_L Register Type) (x)
Abstract
This instruction extracts specified bits from the SRC1 register and writes them to the low order bits of the destina-
tion register.
See Also (Deposit Bits (Shift_Deposit))
Shift_Extract Description
Extracts specified bits from the SRC1 register and writes them to the low order bits of the destination register. The
bit position is stored in SRC0.b1 and the length is stored in SRC0.b0. The field is either sign extended or zero
extended to fill the 32-bit output register. ( (Z) zero fills, (X) sign extends) )
Field extraction
31 16 15 8 7 0
+--------+----+----+
src0: |xxxxxxxx| P | L | L--length, P--position
+--------+----+----+
+--------+---------+
src1 |bbbbbbbbbbSNNbbbbb|
+--------+---------+
+--------+---------+
dst0: |000000000000000SNN| SN--inserted field in hi half of src0
+--------+---------+ b--previous contents of src1
x--unused
with sign extension 0--zero
+--------+---------+
dst0: |SSSSSSSSSSSSSSSSNN|
+--------+---------+
The Bit Field Extraction instruction moves only specific bits from the scene register (DSRC1) into the low-order bits
of the destination register (DDST). The user determines the length of the pattern bit field and its position in the
scene field using the pattern register (DSRC0_L).
The input register bit field definitions appear in the Input Register Bit Field Definitions table.
DSRC1:*1
ssss ssss ssss ssss ssss ssss ssss ssss
DSRC0_L:*2
xxxp pppp xxxL LLLL
The operation reads the pattern bit field of length L from the scene bit field, with the pattern LSB located at bit p of
the scene. See "Example", below, for more.
There are a number of boundary cases related to Shift_Extract instruction operation that should be considered.
If (p + L) > 32: In the zero-extended and sign-extended versions of the instruction, the architecture assumes that all
bits to the left of the DSRC1 are zero. In such a case, the user is trying to access more bits than the register actually
contains. Consequently, the architecture fills any undefined bits beyond the MSB of the DSRC1 with zeros.
The Bit Field Extraction instruction does not modify the contents of the two source registers. One of the source
registers can also serve as DDST.
The user has the choice of using the (X) option syntax to perform sign-extend extraction or the (Z) option syntax
to perform zero-extend extraction.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Shift_Extract Example
Bit Field Extraction Unsigned
• If
• R4=0b1010 0101 1010 0101 1100 0 011 1 010 1010 where this is the scene bit field
• R3=0bxxxx xxxx xxxx xxxx 0000 0111 0000 0100 where bits 15-8 are the position, and bits
7-0 are the length
then the Bit Field Extraction (unsigned) instruction produces:
• R7=0b0000 0000 0000 0000 0000 0000 0000 0111
• If
• R4=0b1010 0101 10 10 0101 110 0 0011 1010 1010 where this is the scene bit field
• R3=0bxxxx xxxx xxxx xxxx 0000 1101 0000 1001 where bits 15-8 are the position, and bits
7-0 are the length
then the Bit Field Extraction (unsigned) instruction produces:
• R7=0b0000 0000 0000 0000 0000 000 1 0010 1110
Bit Field Extraction Sign-Extended
• If
• R4=0b1010 0101 1010 0101 1100 0 011 1 010 1010 where this is the scene bit field
• R3=0bxxxx xxxx xxxx xxxx 0000 0111 0000 0100 where bits 15-8 are the position, and bits
7-0 are the length
then the Bit Field Extraction (sign-extended) instruction produces:
• R7=0b0000 0000 0000 0000 0000 0000 0000 0111
• If
• R4=0b1010 0101 10 10 0101 110 0 0011 1010 1010 where this is the scene bit field
• R3=0bxxxx xxxx xxxx xxxx 0000 1101 0000 1001 where bits 15-8 are the position, and bits
7-0 are the length
Then the Bit Field Extraction (sign-extended) instruction produces:
• R7=0b1111 1111 1111 1111 1111 111 1 0010 1110
Comparison Operations
These operations provide 16- and 32-bit maximum/minimum comparison and array search operations on register
operands:
• Vectored 16-Bit Maximum (Max16Vec)
Abstract
This instruction calculates the maximum of one or two pairs of signed 16-Bit words.
See Also (Vectored 16-Bit Minimum (Min16Vec))
Max16Vec Description
The vector maximum instruction returns the maximum value (meaning the largest positive value, nearest to
0x7FFF) of the 16-bit half-word source registers to the dest_reg.
The instruction compares the upper half-words of src_reg_0 and src_reg_1 and returns that maximum to the
upper half-word of dest_reg. It also compares the lower half-words of src_reg_0 and src_reg_1 and returns that
maximum to the lower half-word of dest_reg. The result is a concatenation of the two 16-bit maximum values.
The vector maximum instruction does not implicitly modify input values. The dest_reg can be the same D-regis-
ter as one of the source registers. Doing this explicitly modifies that source register.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Max16Vec Example
r7 = max (r1, r0) (v) ;
Abstract
This instruction calculates the minimum of two pairs of signed word vectors.
See Also (Vectored 16-Bit Maximum (Max16Vec))
Min16Vec Description
The Vector Minimum instruction returns the minimum value (the most negative value or the value closest to
0x8000) of the 16-bit half-word source registers to the dest_reg.
This instruction compares the upper half-words of src_reg_0 and src_reg_1 and returns that minimum to the
upper half-word of dest_reg. It also compares the lower half-words of src_reg_0 and src_reg_1 and returns
that minimum to the lower half-word of dest_reg. The result is a concatenation of the two 16-bit minimum val-
ues.
The input values are not implicitly modified by this instruction. The dest_reg can be the same D-register as one
of the source registers. Doing this explicitly modifies that source register.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Min16Vec Example
r7 = min (r1, r0) (v) ;
Abstract
This instruction calculates the maximum of two signed 32-bit values.
See Also (32-Bit Minimum (Min32))
Max32 Description
The maximum instruction returns the maximum, or most positive, value of the source registers. The operation sub-
tracts src_reg_1 from src_reg_0 and selects the output based on the signs of the input values and the arithmet-
ic status bits.
The maximum instruction does not implicitly modify input values. The dest_reg can be the same D-register as
one of the source registers. Doing this explicitly modifies the source register.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Max32 Example
r5 = max (r2, r3) ;
Abstract
This instruction calculates the minimum of two signed 32-bit values.
See Also (32-bit Maximum (Max32))
Min32 Description
The minimum instruction returns the minimum value of the source registers to the dest_reg. (The minimum value
of the source registers is the value closest to –∞.) The operation subtracts src_reg_1 from src_reg_0 and selects
the output based on the signs of the input values and the arithmetic status bits.
The minimum instruction does not implicitly modify input values. The dest_reg can be the same D-register as
one of the source registers. Doing this explicitly modifies the source register.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Min32 Example
r5 = min (r2, r3) ;
Abstract
This instruction is used in a loop to locate a minimum or maximum in an array. For each compute unit, a value is
compared against the current signed max or min in the accumulator. Two values are tested at a time, the current
winner will be stored in the accumulator and the current value of P0 will be written to the result register if the
comparison is true.
Search Description
This instruction is used in a loop to locate a maximum or minimum element in an array of 16-bit packed data. Two
values are tested at a time. The vector search instruction compares two 16-bit, signed half-words to values stored in
the Accumulators. Then, it conditionally updates each accumulator and destination pointer based on the compari-
son. Pointer register P0 is always the implied array pointer for the elements being searched.
More specifically, the signed high half-word of src_reg is compared in magnitude with the 16 low-order bits in
A1. If src_reg_hi meets the comparison criterion, then A1 is updated with src_reg_hi, and the value in point-
er register P0 is stored in dest_pointer_hi. The same operation is performed for src_reg_low and A0.
Based on the search mode specified in the syntax, the instruction tests for maximum or minimum signed values.
Values are sign extended when copied into the accumulator(s). See the examples for one way to implement the
search loop. After the vector search loop concludes, A1 and A0 hold the two surviving elements, and
dest_pointer_hi and dest_pointer_lo contain their respective addresses. The next step is to select the final
value from these two surviving elements.
Modes
The four supported compare modes are specified by the mandatory searchmode flag.
src_reg_hi
Compared to least significant 16 bits of A1. If compare condition is met, overwrites lower 16 bits of A1 and
copies P0 into dest_pointer_hi.
src_reg_lo
Compared to least significant 16 bits of A0. If compare condition is met, overwrites lower 16 bits of A0 and
copies P0 into dest_pointer_lo.
This 32-bit instruction can be issued in parallel with the combination of one 16-bit length load instruction to the
P0 register and one 16-bit NOP. No other instructions can be issued in parallel with the vector search instruction.
Note the following legal and illegal forms.
(r1, r0) = search r2 (LT) || r2 = [p0++p3]; /* ILLEGAL */
(r1, r0) = search r2 (LT) || r2 = [p0++]; /* LEGAL */
(r1, r0) = search r2 (LT) || r2 = [p0++]; /* LEGAL */
Search Example
/* Initialize Accumulators with appropriate value for the type of search. */
r0.l=0x7fff ;
r0.h=0 ;
a0=r0 ; /* max positive 16-bit value */
a1=r0 ; /* max positive 16-bit value */
/* Initialize R2. */
r2=[p0++] ;
/* Assume P1 is initialized to the size of the vector length. */
LSETUP (loop_, loop_) LC0=P1>>1 ; /* set up the loop */
loop_: (r1,r0) = SEARCH R2 (LE) || R2=[P0++];
/* search for the last minimum in all but the
Conversion Operations
These operations provide absolute value, negate, pass, and saturate operations on register operands:
• Vectored 16-Bit Absolute Value (Abs2x16)
• 32-bit Absolute Value (Abs32)
• Accumulator0 Absolute Value (AbsAcc0)
• Accumulator Absolute Value (AbsAcc1)
• Accumulator Absolute Value (AbsAccDual)
• Vectored 16-bit Negate (Neg16Vec)
• 32-Bit Negate (Neg32)
• Accumulator0 Negate (NegAcc0)
• Accumulator1 Negate (NegAcc1)
• Dual Accumulator Negate (NegAccDual)
• Fractional 32-bit to 16-Bit Conversion (Pass32Rnd16)
• Accumulator0 32-Bit Saturate (ALU_SatAcc0)
• Accumulator1 32-Bit Saturate (ALU_SatAcc1)
• Dual Accumulator 32-Bit Saturate (ALU_SatAccDual)
Abstract
This instruction calculates the absolute value of the signed 16-bit input vector. Saturation only applies when the
input is 0x8000.
Abs2x16 Description
The vector absolute value instruction calculates the individual absolute values of the upper and lower halves of a
single 32-bit data register. The results are placed into a 32-bit dest_reg, using the following rules.
• If the input value is positive or zero, copy it unmodified to the destination.
• If the input value is negative, subtract it from zero and store the result in the destination.
This instruction saturates the result.
For example, as shown in the figure, if the source register contains the data shown, the destination register receives
the data shown.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Abs2x16 Example
/* If r1 = 0xFFFF 7FFF, then . . . */
r3 = abs r1 (v) ;
/* . . . produces 0x0001 7FFF */
Abstract
This instruction calculates the absolute value of the 32-bit input. Saturation only applies when the input is
0x80000000.
Abs32 Description
This instruction calculates the absolute value of a 32-bit register and stores it into a 32-bit dest_reg. Calculation
is done according to the following rules.
• If the input value is positive or zero, copy it unmodified to the destination.
• If the input value is negative, subtract it from zero and store the result in the destination. Saturation is auto-
matically performed with the instruction, so taking the absolute value of the largest- magnitude negative num-
ber returns the largest-magnitude positive number.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Abs32 Example
r3 = abs r1 ;
Abstract
This instruction calculates the absolute value of the A0 (accumulator 0) register.
See Also (Accumulator Absolute Value (AbsAcc1), Accumulator Absolute Value (AbsAccDual))
AbsAcc0 Description
This instruction takes the absolute value of a 40-bit input value in a register and produces a 40-bit result. Calcula-
tion is done according to the following rules.
• If the input value is positive or zero, copy it unmodified to the destination.
• If the input value is negative, subtract it from zero and store the result in the destination. Saturation is auto-
matically performed with the instruction, so taking the absolute value of the largest- magnitude negative num-
ber returns the largest-magnitude positive number.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
AbsAcc0 Example
a0 = abs a0 ;
a0 = abs a1 ;
Abstract
This instruction calculates the absolute value of the A1 (accumulator 1) register.
See Also (Accumulator0 Absolute Value (AbsAcc0), Accumulator Absolute Value (AbsAccDual))
AbsAcc1 Description
This instruction takes the absolute value of a 40-bit input value in a register and produces a 40-bit result. Calcula-
tion is done according to the following rules.
• If the input value is positive or zero, copy it unmodified to the destination.
• If the input value is negative, subtract it from zero and store the result in the destination. Saturation is auto-
matically performed with the instruction, so taking the absolute value of the largest- magnitude negative num-
ber returns the largest-magnitude positive number.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
AbsAcc1 Example
a1 = abs a0 ;
a1 = abs a1 ;
Abstract
This instruction calculates the absolute value of the A0 (accumulator 0) register and A1 (accumulator 1) register.
See Also (Accumulator0 Absolute Value (AbsAcc0), Accumulator Absolute Value (AbsAcc1))
AbsAccDual Description
This instruction performs the ABS operation on both accumulators by a single instruction, taking the absolute value
of 40-bit input values in two registers and producing two 40-bit results. Calculation is done according to the follow-
ing rules.
• If the input value is positive or zero, copy it unmodified to the destination.
• If the input value is negative, subtract it from zero and store the result in the destination. Saturation is auto-
matically performed with the instruction, so taking the absolute value of the largest- magnitude negative num-
ber returns the largest-magnitude positive number.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
AbsAccDual Example
a1 = abs a1, a0=abs a0 ;
Abstract
This instruction negates the input operands. The maximum negative inputs (0x8000) saturates to maximum posi-
tive.
Neg16Vec Description
The vector negate instruction returns the same magnitude with the opposite arithmetic sign, saturated for each 16-
bit half-word in the source. The instruction calculates by subtracting the source from zero.
For more information, see the Saturation section in the Introduction chapter.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Neg16Vec Example
r5 =–r3 (v) ;
/* R5.H becomes the negative of R3.H and R5.L becomes the negative of R3.L. */
/* If r3 = 0x0004 7FFF the result is r5 = 0xFFFC 8001 */
Abstract
This instruction negates the input operands. If saturation is specified, the special case of negate (MAX_NEG_32) re-
turns MAX_POS_32. If not, it returns MAX_NEG_32.
Neg32 Description
The negate (two’s-complement) instruction returns the same magnitude with the opposite arithmetic sign. The in-
struction calculates by subtracting from zero.
The Dreg version of the negate (two’s-complement) instruction is offered with or without saturation. The only case
where the nonsaturating negate would overflow is when the input value is 0x8000 0000. The saturating version re-
turns 0x7FFF FFFF; the nonsaturating version returns 0x8000 0000.
In the syntax, where NSAT appears, substitute a saturation option (s or ns). See the Saturation topic in the Introduc-
tion chapter for a description of saturation behavior.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Neg32 Example
r5 =-r0 ; /* default is no saturation */
r5 =-r0 (s) ; /* saturation */
r5 =-r0 (ns) ; /* no saturation */
Abstract
This instruction negates the input operands.
See Also (Accumulator1 Negate (NegAcc1), Dual Accumulator Negate (NegAccDual))
NegAcc0 Description
The negate (two’s-complement) instruction returns the same magnitude with the opposite arithmetic sign. The ac-
cumulator versions saturate the result at 40 bits. The instruction calculates by subtracting from zero.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
NegAcc0 Example
a0 =-a0 ;
a0 =-a1 ;
a1 = -a1
Abstract
This instruction negates the input operands.
See Also (Accumulator0 Negate (NegAcc0), Dual Accumulator Negate (NegAccDual))
NegAcc1 Description
The negate (two’s-complement) instruction returns the same magnitude with the opposite arithmetic sign. The ac-
cumulator versions saturate the result at 40 bits. The instruction calculates by subtracting from zero.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
NegAcc1 Example
a1 =-a0 ;
a1 =-a1 ;
Abstract
This instruction negates the input operands.
See Also (Accumulator0 Negate (NegAcc0), Accumulator1 Negate (NegAcc1))
NegAccDual Description
The dual negate (two’s-complement) instruction returns the same magnitude with the opposite arithmetic sign for
each accumulator. The accumulator versions saturate the result at 40 bits. The instruction calculates by subtracting
from zero.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
NegAccDual Example
a1 =-a1, a0=-a0 ;
Abstract
This instruction converts a 32-bit, normalized-fraction number into a 16-Bit normalized-fraction number by adding
a round bit at bit 15, then saturating and extracting bits 31-16, then discarding bits 15-0. The instruction supports
only biased rounding, which adds a half LSB (bit 15) before truncating bits 15-0. The RND_MOD bit in the ASTAT
register has no bearing on the rounding behavior of this instruction.
Pass32Rnd16 Description
The round to half-word instruction rounds a 32-bit, normalized-fraction number into a 16-bit, normalized-fraction
number by extracting and saturating bits 31–16, then discarding bits 15–0. The instruction supports only biased
rounding, which adds a half LSB (in this case, bit 15) before truncating bits 15–0. The ALU performs the rounding.
The RND_MOD bit in the ASTAT register has no bearing on the rounding behavior of this instruction.
Fractional data types such as the operands used in this instruction are always signed.
For more information, see the Saturation section and the Rounding and Truncation section in the Introduction
chapter.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Pass32Rnd16 Example
/* If r6 = 0xFFFC FFFF, then rounding to 16-bits with . . . */
r1.l = r6 (rnd) ;
/* . . . produces r1.l = 0xFFFD */
/* If r7 = 0x0001 8000, then rounding . . . */
r1.h = r7 (rnd) ;
/* . . . produces r1.h = 0x0002 */
Abstract
This instruciton saturates the accumulator at 32-bits (a0.w). The resulting saturated value is sign extended into the
accumulator extension bits (a0.x).
See Also (Accumulator1 32-Bit Saturate (ALU_SatAcc1), Dual Accumulator 32-Bit Saturate (ALU_SatAccDual))
ALU_SatAcc0 Description
The saturate instruction saturates the 40-bit Accumulators at 32 bits. The resulting saturated value is sign extended
into the Accumulator extension bits.
For more information, see the Saturation section in the Introduction chapter.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ALU_SatAcc0 Example
a0 = a0 (s) ;
Abstract
This instruction saturates the accumulator at 32-bits (a1.w). The resulting saturated value is sign extended into the
accumulator extension bits (a1.x).
See Also (Accumulator0 32-Bit Saturate (ALU_SatAcc0), Dual Accumulator 32-Bit Saturate (ALU_SatAccDual))
ALU_SatAcc1 Description
The saturate instruction saturates the 40-bit Accumulators at 32 bits. The resulting saturated value is sign extended
into the Accumulator extension bits.
For more information, see the Saturation section in the Introduction chapter.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ALU_SatAcc1 Example
a1 = a1 (s) ;
Abstract
This instruction saturates the accumulator at 32-bits (a1.w). The resulting saturated value is sign extended into the
accumulator extension bits (a1.x).
See Also (Accumulator0 32-Bit Saturate (ALU_SatAcc0), Accumulator1 32-Bit Saturate (ALU_SatAcc1))
ALU_SatAccDual Description
The dual saturate instruction saturates the 40-bit Accumulators at 32 bits. The resulting saturated values are sign
extended into the Accumulators extension bits.
For more information, see the Saturation section in the Introduction chapter.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ALU_SatAccDual Example
a1 = a1 (s), a0 = a0 (s) ;
Logic Operations
These operations provide one's complement and other logic operations on register operands:
• 32-Bit One's Complement (Not32)
• 32-Bit Logic Operations (Logic32)
Abstract
This instruction performs logic operations on two 32-bit values. It does either an AND, OR, or XOR.
See Also (32-Bit One's Complement (Not32))
Logic32 Description
The AND instruction performs a 32-bit, bit-wise logical AND operation on the two source registers and stores the
results into the dest_reg. The instruction does not implicitly modify the source registers. The dest_reg and one
src_reg can be the same D-register; this operation explicitly modifies the src_reg.
The OR instruction performs a 32-bit, bit-wise logical OR operation on the two source registers and stores the re-
sults into the dest_reg. The instruction does not implicitly modify the source registers. The dest_reg and one
src_reg can be the same D-register; this operation explicitly modifies the src_reg.
The Exclusive-OR (XOR) instruction performs a 32-bit, bit-wise logical exclusive OR operation on the two source
registers and loads the results into the dest_reg.
The XOR instruction does not implicitly modify source registers. The dest_reg and one src_reg can be the
same D-register; this operation explicitly modifies the src_reg.
This 16-bit instruction may not be issued in parallel with other instructions.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Logic32 Example
r4 = r4 & r3 ; /* AND */
r4 = r4 | r3 ; /* OR */
r4 = r4 ^ r3 ; /* XOR */
Abstract
This instruction (NOT one's complement) toggles every bit in the 32-bit register.
See Also (32-Bit Logic Operations (Logic32))
Not32 Description
The NOT one’s-complement instruction toggles every bit in the 32-bit register. The instruction does not implicitly
modify the src_reg. The dest_reg and src_reg can be the same D-register. Using the same D-register as the
dest_reg and src_reg would explicitly modify the src_reg.
This 16-bit instruction may not be issued in parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Not32 Example
r3 = ~ r4 ; /* NOT */
Move Operations
These operations provide register move operations on register, half register, and accumulator register operands:
• Move 32-Bit Accumulator Section to Even Register (MvA0ToDregE)
• Move 16-Bit Accumulator Section to Low Half Register (MvA0ToDregL)
• Move 16-Bit Accumulator Section to High Half Register (MvA1ToDregH)
• Move 32-Bit Accumulator Section to Odd Register (MvA1ToDregO)
• Move Register to Accumulator0 (MvAxToAx)
• Move Accumulator to Register (MvAxToDreg)
• Move 8-Bit Accumulator Section to Register Half (MvAxXToDregL)
• Pass 8-Bit to 32-Bit Register Expansion (MvDregBToDreg)
• Move Register Half to 16-Bit Accumulator Section (MvDregHLToAxHL)
• Move Register Half (LSBs) to 8-Bit Accumulator Section (MvDregLToAxX)
• Pass 16-Bit to 32-Bit Register Expansion (MvDregLToDreg)
• Move Register to Accumulator1 (MvDregToAx)
• Move Register to Accumulator0 & Accumulator1 (MvDregToAxDual)
• Move Register to Register (MvRegToReg)
• Conditional Move Register to Register (MvRegToRegCond)
• Dual Move Accumulators to Half Registers (ParaMvA1ToDregHwithMvA0ToDregL)
• Dual Move Accumulators to Register (ParaMvA1ToDregOwithMvA0ToDregE)
Abstract
This instruction moves an 32-bit section of an accumulator to an even register.
See Also (Move 8-Bit Accumulator Section to Register Half (MvAxXToDregL), Move 16-Bit Accumulator Section
to Low Half Register (MvA0ToDregL), Move 16-Bit Accumulator Section to High Half Register (MvA1ToDregH),
Move 16-Bit Accumulator Section to Low Half Register (MvA0ToDregL), Move 16-Bit Accumulator Section to
High Half Register (MvA1ToDregH), Move 32-Bit Accumulator Section to Odd Register (MvA1ToDregO), Move
Accumulator to Register (MvAxToDreg), Move Register to Accumulator0 (MvAxToAx))
MvA0ToDregE Description
The move accumulator register to even data register instruction copies the contents of the source accumulator regis-
ter into the destination even data register. The operation does not affect the source register contents.
In the syntax, where MMODE appears, substitute a MAC mode for the accumulator copy format option: default
(none), (fu), (is), (iss2), (iu), or (s2rnd). See the Saturation topic in the Introduction chapter for a de-
scription of saturation behavior.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
MvA0ToDregE Example
r2 = a0 ; /* 32-bit move with saturation */
r0 = a0 (iss2) ; /* 32-bit move with scaling, truncation and saturation */
Abstract
This instruction moves an 16-bit section of an accumulator to a low half register (16-bit section of a data register).
See Also (Move 8-Bit Accumulator Section to Register Half (MvAxXToDregL), Move 16-Bit Accumulator Section
to Low Half Register (MvA0ToDregL), Move 16-Bit Accumulator Section to High Half Register (MvA1ToDregH),
Move 32-Bit Accumulator Section to Even Register (MvA0ToDregE), Move 16-Bit Accumulator Section to High
Half Register (MvA1ToDregH), Move 32-Bit Accumulator Section to Odd Register (MvA1ToDregO), Move Accu-
mulator to Register (MvAxToDreg), Move Register to Accumulator0 (MvAxToAx))
MvA0ToDregL Description
The move accumulator register to low half data register instruction copies the contents of the source accumulator
register into the destination low half data register. The operation does not affect the source register contents.
In the syntax, where MMOD1 appears, substitute a MAC mode for the accumulator copy format option: default
(none), (fu), (ih), (is), (iss2), (iu), (s2rnd), (t), or (tfu). See the Saturation topic in the Introduction
chapter for a description of saturation behavior.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
MvA0ToDregL Example
r3.l = a0 ;
r7.l = a0 (fu) ; /* fractional unsigned format */
r2.l = a0 (s2rnd) ; /* signed fraction, scaled */
Abstract
This instruction moves an 16-bit section of an accumulator to a high half register (16-bit section of a data register).
See Also (Move 8-Bit Accumulator Section to Register Half (MvAxXToDregL), Move 16-Bit Accumulator Section
to Low Half Register (MvA0ToDregL), Move 16-Bit Accumulator Section to High Half Register (MvA1ToDregH),
Move 32-Bit Accumulator Section to Even Register (MvA0ToDregE), Move 16-Bit Accumulator Section to Low
Half Register (MvA0ToDregL), Move 32-Bit Accumulator Section to Odd Register (MvA1ToDregO), Move Accu-
mulator to Register (MvAxToDreg), Move Register to Accumulator0 (MvAxToAx))
MvA1ToDregH Description
The move accumulator register to low half data register instruction copies the contents of the source accumulator
register into the destination low half data register. The operation does not affect the source register contents.
In the syntax, where MMLMMOD1 appears, substitute a MAC mode for the accumulator copy format option: default
(none), (fu), (ih), (is), (iss2), (iu), (m), (m,fu), (m,ih), (m,is), (m,iss2), (m,iu), (m,s2rnd),
(m,t), (m,tfu), (s2rnd), (t), or (tfu) . See the Saturation topic in the Introduction chapter for a description
of saturation behavior.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
MvA1ToDregH Example
r3.h = a1 ;
r7.h = a1 (fu) ; /* fractional unsigned format */
r2.h = a1 (s2rnd) ; /* signed fraction, scaled */
Abstract
This instruction moves an 32-bit section of an accumulator to an odd register.
See Also (Move 8-Bit Accumulator Section to Register Half (MvAxXToDregL), Move 16-Bit Accumulator Section
to Low Half Register (MvA0ToDregL), Move 16-Bit Accumulator Section to High Half Register (MvA1ToDregH),
Move 32-Bit Accumulator Section to Even Register (MvA0ToDregE), Move 16-Bit Accumulator Section to Low
Half Register (MvA0ToDregL), Move 16-Bit Accumulator Section to High Half Register (MvA1ToDregH), Move
Accumulator to Register (MvAxToDreg), Move Register to Accumulator0 (MvAxToAx))
MvA1ToDregO Description
The move accumulator register to low half data register instruction copies the contents of the source accumulator
register into the destination low half data register. The operation does not affect the source register contents.
In the syntax, where MMLMMODE appears, substitute a MAC mode for the accumulator copy format option: default
(none), (fu), (is), (iss2), (iu), (m), (m,fu), (m,is), (m,iss2), (m,iu), (m,s2rnd), (s2rnd) . See
the Saturation topic in the Introduction chapter for a description of saturation behavior.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
MvA1ToDregO Example
r3 = a1 ;
r7 = a1 (fu);
r1 = a1 (s2rnd);
Abstract
This instruction moves the contents of one accumulator register to the other accumulator register.
See Also (Move 8-Bit Accumulator Section to Register Half (MvAxXToDregL), Move 16-Bit Accumulator Section
to Low Half Register (MvA0ToDregL), Move 16-Bit Accumulator Section to High Half Register (MvA1ToDregH),
Move 32-Bit Accumulator Section to Even Register (MvA0ToDregE), Move 16-Bit Accumulator Section to Low
Half Register (MvA0ToDregL), Move 16-Bit Accumulator Section to High Half Register (MvA1ToDregH), Move
32-Bit Accumulator Section to Odd Register (MvA1ToDregO), Move Accumulator to Register (MvAxToDreg))
MvAxToAx Description
The move accumulator register to accumulator register instruction copies the contents of the source accumulator
register into the destination accumulator register. The operation does not affect the source register contents.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
MvAxToAx Example
a0 = a1 ;
a1 = a0 ;
Abstract
This instruction moves the value in the accumulator register to the selected data register.
See Also (Move 8-Bit Accumulator Section to Register Half (MvAxXToDregL), Move 16-Bit Accumulator Section
to Low Half Register (MvA0ToDregL), Move 16-Bit Accumulator Section to High Half Register (MvA1ToDregH),
Move 32-Bit Accumulator Section to Even Register (MvA0ToDregE), Move 16-Bit Accumulator Section to Low
Half Register (MvA0ToDregL), Move 16-Bit Accumulator Section to High Half Register (MvA1ToDregH), Move
32-Bit Accumulator Section to Odd Register (MvA1ToDregO), Move Register to Accumulator0 (MvAxToAx))
MvAxToDreg Description
The move accumulator register to low half data register instruction copies the contents of the source accumulator
register into the destination low half data register. The operation does not affect the source register contents.
In the syntax, where M32MMOD appears, substitute a MAC mode for the accumulator copy format option: default
(none), (fu), (is), (is,ns), (iu), (iu,ns), (m), (m,is), (m,is,ns), (m,t), (t), or (tfu) . See the
Saturation topic in the Introduction chapter for a description of saturation behavior.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
MvAxToDreg Example
r3 = a0 (iu,ns); /* integer unsigned no saturate */
r4 = a1 (fu); /* fractional unsigned */
r2 = a0 (m); /* mixed mode */
r5 = a1 (tfu); /* fractional unsigned truncated */
Abstract
This instruction moves an 8-bit section of an accumulator to a low half register (16-bit section of a data register).
See Also (Move 16-Bit Accumulator Section to Low Half Register (MvA0ToDregL), Move 16-Bit Accumulator Sec-
tion to High Half Register (MvA1ToDregH), Move 32-Bit Accumulator Section to Even Register (MvA0ToDregE),
Move 16-Bit Accumulator Section to Low Half Register (MvA0ToDregL), Move 16-Bit Accumulator Section to
High Half Register (MvA1ToDregH), Move 32-Bit Accumulator Section to Odd Register (MvA1ToDregO), Move
Accumulator to Register (MvAxToDreg), Move Register to Accumulator0 (MvAxToAx))
MvAxXToDregL Description
The move accumulator register extension copies 8 bits from an accumulator extension source register into a low half
data register. The instruction does not affect the unspecified half of the destination register. It supports only data
registers and the accumulator.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
MvAxXToDregL Example
r7.l = a0.x ;
r0.l = a1.x ;
Abstract
This instruction copies the least significant 8-bits from the source register into the least significant 8-bits of the des-
tination and either sign or zero extends it the upper bits.
See Also (Move Register Half (LSBs) to 8-Bit Accumulator Section (MvDregLToAxX), Move Register Half to 16-Bit
Accumulator Section (MvDregHLToAxHL), Pass 16-Bit to 32-Bit Register Expansion (MvDregLToDreg), Move
Register to Accumulator1 (MvDregToAx), Move Register to Accumulator0 & Accumulator1 (MvDregToAxDual))
MvDregBToDreg Description
The move data register byte to data register instruction converts a signed byte to a signed word (32 bits). It copies
the least significant 8 bits from a source register into the least significant 8 bits of a 32-bit register. The instruction
sign-extends or zero-extends the upper bits of the destination register. This instruction supports only data registers.
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), and this instruction may be
issued in parallel with certain other 16-bit instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
MvDregBToDreg Example
r7 = r2.b (x) ; /* sign extended */
r7 = r2.b (z) ; /* zero extended */
Abstract
This instruction moves a 16-bit section of a data register to a single 16-bit section of an accumulator.
See Also (Move Register Half (LSBs) to 8-Bit Accumulator Section (MvDregLToAxX), Pass 16-Bit to 32-Bit Register
Expansion (MvDregLToDreg), Pass 8-Bit to 32-Bit Register Expansion (MvDregBToDreg), Move Register to Accu-
mulator1 (MvDregToAx), Move Register to Accumulator0 & Accumulator1 (MvDregToAxDual))
MvDregHLToAxHL Description
The move high/low half register to high/low half accumulator instruction copies 16 bits from a source register into
half of an accumulator register. The instruction does not affect the unspecified half of the destination register. It
supports only data registers and the accumulator.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
MvDregHLToAxHL Example
a0.l = r0.l /* least significant 16 bits of Dreg into least significant 16 bits of A0.W */
a0.h = r1.l
a1.l = r2.l /* least significant 16 bits of Dreg into least significant 16 bits of A1.W */
a1.h = r3.l
a0.l = r4.h
a0.h = r5.h /* most significant 16 bits of Dreg into most significant 16 bits of A0.W */
a1.l = r6.h
a1.h = r7.h /* most significant 16 bits of Dreg into most significant 16 bits of A1.W */
Abstract
This instruction moves the 8 LSBs from a low half register (16-bit section of a data register) to an 8-bit section of an
accumulator.
See Also (Move Register Half to 16-Bit Accumulator Section (MvDregHLToAxHL), Pass 16-Bit to 32-Bit Register
Expansion (MvDregLToDreg), Pass 8-Bit to 32-Bit Register Expansion (MvDregBToDreg), Move Register to Accu-
mulator1 (MvDregToAx), Move Register to Accumulator0 & Accumulator1 (MvDregToAxDual))
MvDregLToAxX Description
The move low half data register to accumulator register extension instruction copies 8 bits from a low half data regis-
ter source into an accumulator extension register. The instruction does not affect the unspecified portion of the des-
tination register. It supports only data registers and the accumulator.
The accumulator extension registers A0.X and A1.X are defined only for the 8 low-order bits 7 through 0 of A0.X
and A1.X. This instruction truncates the upper byte of DDST0_L before moving the value into the accumulator
extension register (A0.X or A1.X).
ADSP-BF7xx Blackfin+ Processor 8–61
Move Operations
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
MvDregLToAxX Example
a0.x = r1.l ;
a1.x = r4.l ;
Abstract
This instruction zero extends or sign extends a 16-Bit register half and deposits it into a 32-bit destination register.
The X option signifies sign extension signifies sign extension while the Z signifies zero extension.
See Also (Move Register Half (LSBs) to 8-Bit Accumulator Section (MvDregLToAxX), Move Register Half to 16-Bit
Accumulator Section (MvDregHLToAxHL), Pass 8-Bit to 32-Bit Register Expansion (MvDregBToDreg), Move
Register to Accumulator1 (MvDregToAx), Move Register to Accumulator0 & Accumulator1 (MvDregToAxDual))
MvDregLToDreg Description
The move low half register to data register instruction converts an unsigned half word (16 bits) to an unsigned word
(32 bits). The instruction copies the least significant 16 bits from a source register into the lower half of a 32-bit
register and sign- or zero-extends the upper half of the destination register. The operation supports only data regis-
ters. Zero extension is appropriate for unsigned values. If used with signed values, a small negative 16-bit value will
become a large positive value.
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), and this instruction may be
issued in parallel with certain other 16-bit instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
MvDregLToDreg Example
/* If r0.l = 0xFFFF, before move with zero extend ... */
r4 = r0.l (z) ; /* zero-extends; equivalent operation to r4.l = r0.l and r4.h = 0 */
/* . . . then r4 = 0x0000FFFF, after move with zero extend */
r4 = r0.l (x) ; /* sign-extends */
Abstract
This instruction moves the contents of a data register to an accumulator register.
See Also (Move Register Half (LSBs) to 8-Bit Accumulator Section (MvDregLToAxX), Move Register Half to 16-Bit
Accumulator Section (MvDregHLToAxHL), Pass 16-Bit to 32-Bit Register Expansion (MvDregLToDreg), Pass 8-
Bit to 32-Bit Register Expansion (MvDregBToDreg), Move Register to Accumulator0 & Accumulator1 (MvDreg-
ToAxDual))
MvDregToAx Description
The move data register to accumulator instruction copies 32 bits from a source register into Ax.W section of an
accumulator register with zero- or sign-extension. The instruction does not affect the unspecified portion of the des-
tination register. It supports only data registers and the accumulator.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
MvDregToAx Example
a0 = r7 (z) ; /* move R7 to 32-bit A0.W, zero extended */
a1 = r3 (x) ; /* move R3 to 32-bit A1.W, sign-exteneded */
Abstract
This instruction moves the contents of two data registers to the accumulator registers (A0, A1).
See Also (Move Register Half (LSBs) to 8-Bit Accumulator Section (MvDregLToAxX), Move Register Half to 16-Bit
Accumulator Section (MvDregHLToAxHL), Pass 16-Bit to 32-Bit Register Expansion (MvDregLToDreg), Pass 8-
Bit to 32-Bit Register Expansion (MvDregBToDreg), Move Register to Accumulator1 (MvDregToAx))
MvDregToAxDual Description
The dual move data register to accumulator instruction copies 32 bits from a source register into Ax.W section of an
accumulator register with zero- or sign-extension, and perform a second move in parallel as indicated. The instruc-
tion does not affect the unspecified portion of the destination register. It supports only data registers and the accu-
mulator.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
MvDregToAxDual Example
a1 = r3 (z), a0 = r7 (z) ; /* move R3 to 32-bit A1.W (zero extended), move R7 to 32-bit
A0.W (zero extended) */
a1 = r3 (z), a0 = r7 (x) ; /* move R3 to 32-bit A1.W (zero extended), move R7 to 32-bit
A0.W (sign-exteneded) */
a1 = r3 (x), a0 = r7 (z) ; /* move R3 to 32-bit A1.W (sign-exteneded), move R7 to 32-bit
A0.W (zero extended) */
a1 = r3 (x), a0 = r7 (x) ; /* move R3 to 32-bit A1.W (sign-exteneded), move R7 to 32-bit
A0.W (sign-exteneded) */
Abstract
This instruction moves data from any register in the data arithmetic unit, address arithmetic unit, or control unit to
any other register in those units.
See Also (Conditional Move Register to Register (MvRegToRegCond))
MvRegToReg Description
The move any register to any register instruction copies from a source register into a destination register with zero-
or sign-extension. The instruction does not affect the unspecified portion of the destination register. It supports all
processor core registers. All moves from smaller to larger registers are sign extended.
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), and this instruction may be
issued in parallel with certain other 16-bit instructions.
This instruction may be used in either User or Supervisor mode, except for cases where register access restrictions
only permit Supervisor mode .
MvRegToReg Example
r3 = r0 ;
r7 = p2 ;
r2 = a0 ;
a0.w = r7 ; /* move R7 to 32-bit A0.W */
r3 = a1.x ; /* move 8-bit A1.X to R3 with sign extension*/
retn = p0 ; /* must be in Supervisor mode */
r7 = a0 ; /* move A0 to odd data register */
r2 = a1 ; /* move A1 to even data register */
Abstract
This instruction conditionally moves registers.
See Also (Move Register to Register (MvRegToReg))
MvRegToRegCond Description
The Move Conditional instruction moves source register contents into a destination register, depending on the value
of CC.
MvRegToRegCond Example
if cc r3 = r0 ; /* move if CC=1 */
if cc r2 = p4 ;
if cc p0 = r7 ;
if cc p2 = p5 ;
if ! cc r3 = r0 ; /* move if CC=0 */
if ! cc r2 = p4 ;
if ! cc p0 = r7 ;
if ! cc p2 = p5 ;
Abstract
This dual move instruction moves the contents of the accumulator registers to half data registers.
See Also (Dual Move Accumulators to Register (ParaMvA1ToDregOwithMvA0ToDregE))
ParaMvA1ToDregHwithMvA0ToDregL Description
The dual move accumulator 1 to high half register with move accumulator 0 to low half register instruction provide
a the combination of operations in the Move 16-Bit Accumulator Section to Low Half Register (MvA0ToDregL)
and Move 16-Bit Accumulator Section to High Half Register (MvA1ToDregH).
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ParaMvA1ToDregHwithMvA0ToDregL Example
r3.h = a1 , r3.l = a0 ;
r2.h = a1 (s2rnd) , r7.l = a0 (fu) ; /* signed fraction, scaled, fractional unsigned format
*/
r7.h = a1 (fu) , r2.l = a0 (s2rnd) ; /* fractional unsigned format, signed fraction, scaled
*/
Abstract
This dual move instruction moves the contents of the accumulator registers to data registers.
See Also (Dual Move Accumulators to Half Registers (ParaMvA1ToDregHwithMvA0ToDregL))
ParaMvA1ToDregOwithMvA0ToDregE Description
The dual move accumulator 1 to high half register with move accumulator 0 to low half register instruction provide
a the combination of operations in the Move 32-Bit Accumulator Section to Odd Register (MvA1ToDregO) with
Move 32-Bit Accumulator Section to Even Register (MvA0ToDregE).
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
31
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ParaMvA1ToDregOwithMvA0ToDregE Example
r3 = a1 , r0 = a0 (iss2) ;
r7 = a1 (fu) , r2 = a0 ;
r1 = a1 (s2rnd) , r2 = a0 ;
Multiplication Operations
These operations provide multiply and multiply-accumulate operations on register and immediate value operands:
• 16 x 16-Bit MAC (Mac16)
• 16 x 16-Bit MAC with Move to Register (Mac16WithMv)
• 32 x 32-Bit MAC (Mac32)
• 32 x 32-Bit MAC with Move to Register (Mac32WithMv)
• Complex Multiply to Accumulator (Mac32Cmplx)
• Complex Multiply to Register (Mac32CmplxWithMv)
• Complex Multiply to Register with Narrowing (Mac32CmplxWithMvN)
• 16 x 16-Bit Multiply (Mult16)
• 32 x 32-bit Multiply (Mult32)
• 32 x 32-Bit Multiply, Integer (MultInt)
• Dual 16 x 16-Bit MAC (ParaMac16AndMac16)
• Dual 16 x 16-Bit MAC with Move to Register (ParaMac16AndMac16WithMv)
• Dual 16 x 16-Bit MAC with Move to Register (ParaMac16WithMvAndMac16)
• Dual 16 x 16-Bit MAC with Moves to Registers (ParaMac16WithMvAndMac16WithMv)
• Dual 16 x 16-Bit MAC with Move to Register (ParaMac16AndMv)
• Dual 16 x 16-Bit MAC with Moves to Registers (ParaMac16WithMvAndMv)
• Dual 16 x 16-Bit Multiply (ParaMult16AndMult16)
• Dual Move to Register and 16 x 16-Bit MAC (ParaMvAndMac16)
• Dual Move to Register and 16 x 16-Bit MAC with Move to Register (ParaMvAndMac16WithMv)
Abstract
This multiply-accumulate instruction multiplies two 16-bit half word operands. Then, the instruction stores, adds,
or subtracts the product into a designated accumulator register with saturation. By default, the instruction treats all
operands as signed fractions with left-shift correction as required.
See Also (16 x 16-Bit MAC with Move to Register (Mac16WithMv))
Mac16 Description
The Multiply and Multiply-Accumulate to Accumulator instruction multiplies two 16-bit half-word operands. It
stores, adds or subtracts the product into a designated Accumulator with saturation.
The Multiply-and-Accumulate Unit 0 (MAC0) portion of the architecture performs operations that involve Accu-
mulator A0. MAC1 performs A1 operations.
By default, the instruction treats both operands of both MACs as signed fractions with left-shift correction as re-
quired.
In the syntax, where MMOD0 appears, substitute a MAC mode for the accumulator copy format option: default
(none), (fu), (is), or (w32) .
In the syntax, where MMLMMOD0 appears, substitute a MAC mode for the accumulator copy format option: default
(none), (fu), (is), (m), (W32), (m,fu), (m,is), or (m,w32) .
See the Saturation topic in the Introduction chapter for a description of saturation behavior.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Mac16 Example
a0 = r3.h * r2.h ; /* MAC0, only. Both operands are signed fractions. Load the product into
A0. */
a0 += r6.h * r4.l (fu) ; /* MAC0, only. Both operands are unsigned fractions. Accumulate
into A0 */
a0 -= r3.h * r2.h ; /* MAC0, only. Both operands are signed fractions. Accumulate into A0.
*/
a1 = r6.h * r4.l (fu) ; /* MAC1, only. Both operands are unsigned fractions. Load the
product into A1 */
a1 += r3.h * r2.h ; /* MAC1, only. Both operands are signed fractions. Accumulate into A1.
*/
a1 -= r6.h * r4.l (fu) ; /* MAC1, only. Both operands are unsigned fractions. Accumulate
into A1 */
Abstract
This multiply-accumulate instruction multiplies two 16-bit half word operands. Then, the instruction stores, adds,
or subtracts the product into a designated accumulator register with saturation. By default, the instruction treats all
operands as signed fractions with left-shift correction as required.
See Also (16 x 16-Bit MAC (Mac16))
Mac16WithMv Description
The multiply and multiply-accumulate to half register (with move) instruction multiplies two 16-bit half-word oper-
ands. The instruction stores, adds or subtracts the product into a designated accumulator. Then, it copies 16 bits
(saturated at 16 bits) of the accumulator into a high or low half data register.
In the syntax, where MMOD1 appears, substitute a MAC mode for the accumulator copy format option: default
(none), (fu), (ih), (is), (iss2), (iu), (s2rnd), (t), (tfu) .
In the syntax, where MMLMMOD1 appears, substitute a MAC mode for the accumulator copy format option: default
(none), (fu), (ih), (is), (iss2), (iu), (m), (m,fu), (m,ih), (m,is), (m,iss2), (m,iu), (m,s2rnd),
(m,t), (m,tfu), (s2rnd), (t), (tfu) .
See the Saturation topic in the Introduction chapter for a description of saturation behavior.
The fraction versions of this instruction (the default and (fu) options) transfer the Accumulator result to the desti-
nation register according to the Result to Destination Register ((IS) and (IU) Options) diagram.
The integer versions of this instruction (the (is) and (iu) options) transfer the Accumulator result to the destina-
tion register according to the Result to Destination Register ((IS) and (IU) Options) diagram.
The multiply-and-accumulate unit 0 (MAC0) portion of the architecture performs operations that involve accumu-
lator A0 and loads the results into the lower half of the destination data register. MAC1 performs A1 operations and
loads the results into the upper half of the destination data register.
All versions of this instruction that support rounding are affected by the RND_MOD bit in the ASTAT register when
they copy the results into the destination register. RND_MOD determines whether biased or unbiased rounding is
used.
A0 0000 0000 XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX
Destination Register XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX
A1 0000 0000 XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX
Destination Register XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX
A0 0000 0000 XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX
Destination Register XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX
A1 0000 0000 XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX
Destination Register XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Mac16WithMvExample
r3.l = ( a0 = r3.h * r2.h ) ;
/* MAC0, only. Both operands are signed fractions. Load the product into A0, then copy to
r3.l. */
r3.h = ( a1 += r6.h * r4.l ) (fu) ;
/* MAC1, only. Both operands are unsigned fractions. Add the product into A1, then copy to
r3.h */
Abstract
This instruction executes a multiply accumulate operation on 32-bit registers.
See Also (32 x 32-Bit MAC with Move to Register (Mac32WithMv))
Mac32 Description
The multiply-accumulate to accumulator instruction multiplies two 32-bit half-word operands. It stores, adds or
subtracts the product into a designated accumulator with saturation.
The multiply-and-accumulate unit 0 (MAC0) portion of the architecture performs operations that involve Accumu-
lator A0. MAC1 performs A1 operations.
By default, the instruction treats both operands of both MACs as signed fractions with left-shift correction as re-
quired.
In the syntax, where M32MMOD0 appears, substitute a MAC mode for the accumulator copy format option: default
(none), (fu), (is), (is,ns), (iu), (iu,ns), (m), (m,is), or (m,is,ns) .
See the Saturation topic in the Introduction chapter for a description of saturation behavior.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Mac32 Example
a0 = r0 * r7 ; /* (default) fractional signed, place product in a0 */
a0 += r1 * r6 (fu) ; /* fractional unsigned, accumulate in a0 */
a0 -= r2 * r5 (is) ; /* integer signed, accumulate in a0 */
a1 = r3 * r4 (is,ns) ; /* integer signed (no saturation), place product in a1 */
Abstract
This multiply-accumulate instruction multiplies two 32-bit half word operands. Then, the instruction stores, adds,
or subtracts the product into a designated accumulator register with saturation. By default, the instruction treats all
operands as signed fractions with left-shift correction as required.
See Also (32 x 32-Bit MAC (Mac32))
Mac32WithMv Description
The multiply-accumulate to accumulator instruction multiplies two 32-bit half-word operands. It stores, adds or
subtracts the product into a designated accumulator with saturation. Then, the instruction moves the result to the
selected register or register pair.
The multiply-and-accumulate unit 0 (MAC0) portion of the architecture performs operations that involve Accumu-
lator A0. MAC1 performs A1 operations.
By default, the instruction treats both operands of both MACs as signed fractions with left-shift correction as re-
quired.
In the syntax, where MMODE appears, substitute a MAC mode for the accumulator copy format option: default
(none), (fu), (is), (iss2), (iu), or (s2rnd).
In the syntax, where MMLMMODE appears, substitute a MAC mode for the accumulator copy format option: default
(none), (fu), (is), (iss2), (iu), (m), (m,fu), (m,is), (m,iss2), (m,iu), (m,s2rnd), or (s2rnd).
In the syntax, where M32MMOD1 appears, substitute a MAC mode for the accumulator copy format option: default
(none), (fu), (is), (is,ns), (iu), (iu,ns), (m,is), (m,is,ns), (m,t), (t), or (tfu).
In the syntax, where M32MMOD appears, substitute a MAC mode for the accumulator copy format option: default
(none), (fu), (is), (is,ns), (iu), (iu,ns), (m), (m,is), or (m,is,ns).
See the Saturation topic in the Introduction chapter for a description of saturation behavior.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Mac32WithMv Example
r0 = (a0 = r0 * r3) (fu) ; /* MMODE options, place product in a0 and move it to EVEN data
register */
r2 = (a0 += r1 * r2) (is) ; /* MMODE options, accumulate in a0 and move it to EVEN data
register */
r4 = (a0 -= r2 * r1) (iss2) ; /* MMODE options, accumulate in a0 and move it to EVEN data
register */
r1 = (a1 = r3 * r0) (m,fu) ; /* MMLMMODE options, place product in a1 and move it to ODD
data register */
r3 = (a1 += r4 * r7) (m,is) ; /* MMLMMODE options, accumulate in a1 and move it to ODD data
register */
r5 = (a1 -= r5 * r6) (m,s2rnd) ; /* MMLMMODE options, accumulate in a1 and move it to ODD
data register */
r1 = (a0 = r6 * r5) (t) ; /* M32MMOD1 options, place product in a0 and move it to any data
register */
r2 = (a0 += r7 * r4) (tfu) ; /* M32MMOD1 options, accumulate in a0 and move it to any data
register*/
r3 = (a0 -= r0 * r3) (m,is,ns) ; /* M32MMOD1 options, accumulate in a0 and move it to any
data register*/
r4 = (a1 = r1 * r2) (iu,ns) ; /* M32MMOD1 options, place product in a1 and move it to any
data register*/
r5 = (a1 += r2 * r1) (fu) ; /* M32MMOD1 options, accumulate in a1 and move it to any data
register*/
r6 = (a1 -= r3 * r0) (m,is) ; /* M32MMOD1 options, accumulate in a1 and move it to any
data register*/
r1:0 = (a0 = r4 * r7) (fu) ; /* M32MMOD options, place product in a0 and move it to
register pair */
r3:2 = (a0 += r5 * r6) ; /* M32MMOD options, accumulate in a0 and move result to register
pair */
r5:4 = (a0 -= r6 * r5) (m) ; /* M32MMOD options, accumulate in a0 and move result to
register pair */
r7:6 = (a1 = r7 * r4) (is,ns) ; /* M32MMOD options, place product in a1 and move it to
register pair */
r5:4 = (a1 += r0 * r3) (iu,ns) ; /* M32MMOD options, accumulate in a1 and move result to
register pair */
r3:2 = (a1 -= r1 * r2) (is) ; /* M32MMOD options, accumulate in a1 and move result to
register pair */
Abstract
This instruction executes a complex multiply-accumulate operation, placing the results in an accumulator register.
See Also (Complex Multiply to Register (Mac32CmplxWithMv), Complex Multiply to Register with Narrowing
(Mac32CmplxWithMvN))
Mac32Cmplx Description
The multiply-accumulate complex values instruction performs a number of parallell multiply-accumulate operations
to produce complex results. To understand the operations, it is important to understand the placement of the imagi-
nary part and real part of the data. Let operand A = (Ar + j *Bi), operand B = (Br + j *Bi) and result C = (Cr + j *Ci),
where Ai (the imaginary part) is stored in the most significant 16 bits of a 32 bit register, and Ar (the real part) is
stored in the least significant 16 bits. Other notations (such as Bi, Ci, and others) are similarly defined regarding
data placement of imaginary and real parts. Complex multiplication and complex multiplication of conjugates is
defined as follows:
This complex multiply syntax for placing the product in the accumulator registers corresponds to the commented
operations:
a1:0 = cmul(r1,r0); /* complex multiply of r1 and r0, place imaginary product in a1 and
real product in a0 */
/* a1 = (r1.l * r0.h) + (r1.h * r0.l), a0 = (r1.l * r0.l) - (r1.h * r0.h) */
a1:0 = cmul(r1,r0*); /* complex multiply of r1 and r0, place imaginary product in a1 and
real product in a0 */
/* a1 = (r1.h * r0.l) - (r1.l * r0.h), a0 = (r1.l * r0.l) + (r1.h * r0.h) */
a1:0 = cmul(r1*,r0*); /* complex multiply of r1 and r0, place imaginary product in a1 and
real product in a0 */
/* a1 = - [ (r1.l * r0.h) + (r1.h * r0.l) ], a0 = (r1.l * r0.l) - (r1.h * r0.h) */
In the syntax, where CMODE appears, substitute a complex multiply mode for the accumulator copy format option:
default (none) or (is).
Default operation is signed fraction multiplication. Multiply 1.15 * 1.15 to produce 1.31 results after left-shift cor-
rection for each of the four partial products. Add or subtract corresponding partial products for real and imaginary
part of result. Saturate results between minimum -1 and maximum 1-2-31. The resulting hexadecimal range is mini-
mum 0x8000 0000 through maximum 0x7FFF FFFF. This operation uses signed fraction rounding.
If the (is) option is used, the operation is signed integer multiplication. Multiply 16.0 * 16.0 to produce 32.0
results. No shift correction. Saturate integer results between minimum -231 and maximum 231-1.
See the Saturation topic in the Introduction chapter for a description of saturation behavior.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
31
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Mac32Cmplx Example
a1:0 = (r1,r0) ; /* fractional signed complex multiply; place complex product in a1
(imaginary) and a0 (real) */
a1:0 = (r7,r3*) (is) ; /* integer signed complex multiply; place complex product in a1
(imaginary) and a0 (real) */
a1:0 = (r2*,r4*) ; /* fractional signed complex multiply; place complex product in a1
(imaginary) and a0 (real) */
a1:0 += (r3*,r1*) (is) ; /* integer signed complex mac; accumulate complex result in a1
(imaginary) and a0 (real) */
a1:0 += (r5,r2*) ; /* fractional signed complex mac; accumulate complex result in a1
(imaginary) and a0 (real) */
a1:0 += (r6,r1) ; /* fractional signed complex mac; accumulate complex result in a1
(imaginary) and a0 (real) */
a1:0 -= (r2,1) (is) ; /* integer signed complex mac; accumulate complex result in a1
(imaginary) and a0 (real) */
a1:0 -= (r3*,r7*) /* fractional signed complex mac; accumulate complex result in a1
(imaginary) and a0 (real) */
a1:0 -= (r4,r5*) (is) ; /* integer signed complex mac; accumulate complex result in a1
(imaginary) and a0 (real) */
Abstract
This instruction executes a complex multiply-accumulate operation, placing the results in a register or register pair.
See Also (Complex Multiply to Accumulator (Mac32Cmplx), Complex Multiply to Register with Narrowing
(Mac32CmplxWithMvN))
Mac32CmplxWithMv Description
The multiply-accumulate complex values instruction performs a number of parallell multiply-accumulate operations
to produce complex results with a move. The product of the multiplication is placed in a pair of data registers. Alter-
nately, the instruction may accumulate the result in the accumulator registers, then move the result to a pair of data
registers. To understand the operations, it is important to understand the placement of the imaginary part and real
part of the data. Let operand A = (Ar + j *Bi), operand B = (Br + j *Bi) and result C = (Cr + j *Ci), where Ai (the
imaginary part) is stored in the most significant 16 bits of a 32 bit register, and Ar (the real part) is stored in the least
significant 16 bits. Other notations (such as Bi, Ci, and others) are similarly defined regarding data placement of
imaginary and real parts. Complex multiplication and complex multiplication of conjugates is defined as follows:
This complex multiply syntax for placing the product in the accumulator registers corresponds to the commented
operations:
a1:0 = cmul(r1,r0); /* complex multiply of r1 and r0, place imaginary product in a1 and
real product in a0 */
/* a1 = (r1.l * r0.h) + (r1.h * r0.l), a0 = (r1.l * r0.l) - (r1.h * r0.h) */
a1:0 = cmul(r1,r0*); /* complex multiply of r1 and r0, place imaginary product in a1 and
real product in a0 */
/* a1 = (r1.h * r0.l) - (r1.l * r0.h), a0 = (r1.l * r0.l) + (r1.h * r0.h) */
a1:0 = cmul(r1*,r0*); /* complex multiply of r1 and r0, place imaginary product in a1 and
real product in a0 */
/* a1 = - [ (r1.l * r0.h) + (r1.h * r0.l) ], a0 = (r1.l * r0.l) - (r1.h * r0.h) */
In the syntax, where CMODE appears, substitute a complex multiply mode for the accumulator copy format option:
default (none) or (is).
Default operation is signed fraction multiplication. Multiply 1.15 * 1.15 to produce 1.31 results after left-shift cor-
rection for each of the four partial products. Add or subtract corresponding partial products for real and imaginary
part of result. Saturate results between minimum -1 and maximum 1-2-31. The resulting hexadecimal range is mini-
mum 0x8000 0000 through maximum 0x7FFF FFFF.
If the (is) option is used, the operation is signed integer multiplication. Multiply 16.0 * 16.0 to produce 32.0
results. No shift correction. Saturate integer results between minimum -231 and maximum 231-1.
See the Saturation topic in the Introduction chapter for a description of saturation behavior.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Mac32CmplxWithMv Example
r7:6 = (r1,r0) ; /* fractional signed complex multiply; place complex product in r7
(imaginary) and r6 (real) */
r5:4 = (r7,r3*) (is) ; /* integer signed complex multiply; place complex product in r5
(imaginary) and r4 (real) */
r1:0 = (r2*,r4*) ; /* fractional signed complex multiply; place complex product in r1
(imaginary) and r0 (real) */
r7:6 = a1:0 = (r1,r0) ; /* fractional signed complex multiply; place complex product in a1
(imaginary) and a0 (real); move to r7:6 */
r5:4 = a1:0 = (r7,r3*) (is) ; /* integer signed complex multiply; place complex product in
a1 (imaginary) and a0 (real); move to r5:4 */
r1:0 = a1:0 = (r2*,r4*) ; /* fractional signed complex multiply; place complex product in
a1 (imaginary) and a0 (real); move to r1:0 */
r5:4 = a1:0 += (r3*,r1*) (is) ; /* integer signed complex mac; accumulate complex result
in a1 (imaginary) and a0 (real); move to r5:4 */
r1:0 = a1:0 += (r5,r2*) ; /* fractional signed complex mac; accumulate complex result in a1
(imaginary) and a0 (real); move to r1:0 */
r3:2 = a1:0 += (r6,r1) ; /* fractional signed complex mac; accumulate complex result in a1
(imaginary) and a0 (real); move to r3:2 */
r7:6 = a1:0 -= (r2,1) (is) ; /* integer signed complex mac; accumulate complex result in
a1 (imaginary) and a0 (real); move to r7:6 */
r1:0 = a1:0 -= (r3*,r7*) /* fractional signed complex mac; accumulate complex result in a1
(imaginary) and a0 (real); move to r1:0 */
r3:2 = a1:0 -= (r4,r5*) (is) ; /* integer signed complex mac; accumulate complex result in
a1 (imaginary) and a0 (real); move to r3:2 */
Abstract
This instruction executes a complex multiply-accumulate operation, placing the results in a register or register pair
with narrowing.
See Also (Complex Multiply to Accumulator (Mac32Cmplx), Complex Multiply to Register
(Mac32CmplxWithMv))
Mac32CmplxWithMvN Description
The multiply-accumulate complex values instruction performs a number of parallell multiply-accumulate operations
to produce complex results with a narrowing move. The product of the multiplication is placed in a data register.
Alternately, the instruction may accumulate the result in the accumulator registers, then move the result to a data
register. To understand the operations, it is important to understand the placement of the imaginary part and real
part of the data. Let operand A = (Ar + j *Bi), operand B = (Br + j *Bi) and result C = (Cr + j *Ci), where Ai (the
imaginary part) is stored in the most significant 16 bits of a 32 bit register, and Ar (the real part) is stored in the least
significant 16 bits. Other notations (such as Bi, Ci, and others) are similarly defined regarding data placement of
imaginary and real parts. Complex multiplication and complex multiplication of conjugates is defined as follows:
This complex multiply syntax for placing the product in the accumulator registers corresponds to the commented
operations:
a1:0 = cmul(r1,r0); /* complex multiply of r1 and r0, place imaginary product in a1 and
real product in a0 */
/* a1 = (r1.l * r0.h) + (r1.h * r0.l), a0 = (r1.l * r0.l) - (r1.h * r0.h) */
a1:0 = cmul(r1,r0*); /* complex multiply of r1 and r0, place imaginary product in a1 and
real product in a0 */
/* a1 = (r1.h * r0.l) - (r1.l * r0.h), a0 = (r1.l * r0.l) + (r1.h * r0.h) */
a1:0 = cmul(r1*,r0*); /* complex multiply of r1 and r0, place imaginary product in a1 and
real product in a0 */
/* a1 = - [ (r1.l * r0.h) + (r1.h * r0.l) ], a0 = (r1.l * r0.l) - (r1.h * r0.h) */
In the syntax, where NARROWING_CMODE appears, substitute a complex multiply mode for the accumulator copy
format option: default (none), (is), or (t).
Default operation is signed fraction multiplication. Multiply 1.15 * 1.15 to produce 1.31 results after left-shift cor-
rection for each of the four partial products. Add or subtract corresponding partial products for real and imaginary
part of result. Saturate results between minimum -1 and maximum 1-2-31. The resulting hexadecimal range is mini-
mum 0x8000 0000 through maximum 0x7FFF FFFF. This operation uses signed fraction rounding. Round 1.31
format value at bit 16, (RND_MOD bit in the ASTAT register controls the rounding) extract the high 16 bits to
produce a 1.15 result. Result is between minimum -1 and maximum 1-2-15 (or, expressed in hex, between minimum
0x8000 and maximum 0x7FFF).
If the (is) option is used, the operation is signed integer multiplication. Multiply 16.0 * 16.0 to produce 32.0
results. No shift correction. Saturate integer results between minimum -231 and maximum 231-1. This operation
uses signed integer saturation. Saturate 32-bit integer values at bit 15 and extract the low 16 bits to produce a result
between minimum -215 and maximum 215-1.
If the (t) option is used, the operation is signed fraction multiplication with truncation. Multiply 1.15 * 1.15 to
produce 1.31 results after left-shift correction for each of the four partial products. Add or subtract corresponding
partial products for real and imaginary part of result. Saturate results between minimum -1 and maximum 1-2-31.
The resulting hexadecimal range is minimum 0x8000 0000 through maximum 0x7FFF FFFF. This operation uses
signed fraction truncation. Truncate 1.31 format values for real and imaginary parts of the result at bit 16, (Perform
no rounding) extract the high 16 bits to produce a 1.15 result. Result is between minimum -1 and maximum 1-2-15
(or, expressed in hex, between minimum 0x8000 and maximum 0x7FFF).
See the Saturation topic in the Introduction chapter for a description of saturation behavior.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Mac32CmplxWithMvN Example
r6 = (r1,r0) ; /* fractional signed complex multiply; place complex product in r6.h
(imaginary) and r6.l (real) */
r4 = (r7,r3*) (is) ; /* integer signed complex multiply; place complex product in r4.h
(imaginary) and r4.l (real) */
r0 = (r2*,r4*) ; /* fractional signed complex multiply; place complex product in r0.h
(imaginary) and r0.l (real) */
r6 = a1:0 = (r1,r0) ; /* fractional signed complex multiply; place complex product in a1
(imaginary) and a0 (real); move to r6 */
r4 = a1:0 = (r7,r3*) (is) ; /* integer signed complex multiply; place complex product in a1
(imaginary) and a0 (real); move to r4 */
r0 = a1:0 = (r2*,r4*) ; /* fractional signed complex multiply; place complex product in a1
(imaginary) and a0 (real); move to r0 */
r4 = a1:0 += (r3*,r1*) (is) ; /* integer signed complex mac; accumulate complex result in
a1 (imaginary) and a0 (real); move to r4 */
r0 = a1:0 += (r5,r2*) ; /* fractional signed complex mac; accumulate complex result in a1
(imaginary) and a0 (real); move to r0 */
r2 = a1:0 += (r6,r1) ; /* fractional signed complex mac; accumulate complex result in a1
(imaginary) and a0 (real); move to r2 */
r6 = a1:0 -= (r2,1) (is) ; /* integer signed complex mac; accumulate complex result in a1
(imaginary) and a0 (real); move to r6 */
r0 = a1:0 -= (r3*,r7*) /* fractional signed complex mac; accumulate complex result in a1
(imaginary) and a0 (real); move to r0 */
r2 = a1:0 -= (r4,r5*) (is) ; /* integer signed complex mac; accumulate complex result in a1
(imaginary) and a0 (real); move to r2 */
Abstract
This instruction multiplies two 16-bit half word operands. It stores, adds, or subtracts the product into a designated
accumulator register with saturation.
See Also (32 x 32-Bit Multiply, Integer (MultInt), 32 x 32-bit Multiply (Mult32))
Mult16 Description
The multiply 16-bit operands instruction multiplies the two 16-bit operands and stores the result directly into the
destination register with saturation.
NOTE: This instruction is similar to the multiply-accumulate instructions, except that the multiply 16-bit oper-
ands does not affect the accumulators.
Operations performed by the multiply-and-accumulate unit 0 (MAC0) portion of the architecture load their 16-bit
results into the lower half of the destination data register; 32-bit results go into an even numbered data register.
Operations performed by MAC1 load their results into the upper half of the destination data register or an odd
numbered data register.
In 32-bit result syntax (result goes to a 32-bit data register), the MAC performing the operation is determined by
the destination data register. Instructions placing results in even-numbered data registers (R6, R4, R2, or R0) execute
on MAC0 and may use MMODE options. Instructions placing results in odd-numbered data registers (R7, R5, R3, or
R1) execute on MAC1 and may use MMLMMODE options. For example, 32-bit result operations with the (m) option
may only be performed using odd-numbered data register destinations.
In 16-bit result syntax (result goes to a 16-bit half data register), the MAC performing the operation is determined
by the destination data register half. Instructions placing results in low-half data registers (R7.L through R0.L) exe-
cute on MAC0 and may use MMOD1 options. Instructions placing results in high-half data registers (R7.H through
R0.H) execute on MAC1 and may use MMLMMOD1 options. For example, 16-bit result operations using the (m)
option may only be performed using high-half data register destinations.
The versions of this instruction that produce 16-bit results are affected by the RND_MOD bit in the ASTAT register
when they copy the results into the 16-bit destination register. RND_MOD determines whether biased or unbiased
rounding is used. RND_MOD controls rounding for all versions of this instruction that produce 16-bit results except
the (is), (iu) and (iss2) options.
In the syntax, where MMOD1 appears, substitute a MAC mode for the accumulator copy format option: default
(none), (fu), (ih), (is), (iss2), (iu), (s2rnd), (t), or (tfu).
In the syntax, where MMLMMOD1 appears, substitute a MAC mode for the accumulator copy format option: default
(none), (fu), (ih), (is), (iss2), (iu), (m), (m,fu), (m,ih), (m,is), (m,iss2), (m,iu), (m,s2rnd),
(m,t), (m,tfu), (s2rnd), (t), or (tfu).
In the syntax, where MMODE appears, substitute a MAC mode for the accumulator copy format option: default
(none), (fu), (is), (iss2), or (s2rnd).
In the syntax, where MMLMMODE appears, substitute a MAC mode for the accumulator copy format option: default
(none), (fu), (is), (iss2), (m), (m,fu), (m,is), (m,iss2), (m,s2rnd), or (s2rnd).
See the Saturation topic in the Introduction chapter for a description of saturation behavior.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Mult16 Example
r3.l = r3.h * r2.h ; /* MAC0. Both operands are signed fractions. */
r3.h = r6.h * r4.l (fu) ; /* MAC1. Both operands are unsigned fractions. */
r6 = r3.h * r4.h ; /* MAC0. Signed fraction operands, results saved as 32 bits. */
Abstract
This instruction executes multiply operations on 32-bit registers and on register pairs.
See Also (32 x 32-Bit Multiply, Integer (MultInt), 16 x 16-Bit Multiply (Mult16))
Mult32 Description
The multiply 32-bit operands instruction multiplies two 32-bit half-word operands. It stores the product into a des-
ignated data register or data register pair with saturation.
In the syntax, where M32MMOD2 appears, substitute a MAC mode for the accumulator copy format option: default
(none), (fu), (is), (is,ns), (iu), (iu,ns), (m), (m,is), (m,is,ns), (m,t), (t), or (tfu).
In the syntax, where MM32MMOD appears, substitute a MAC mode for the accumulator copy format option: default
(none), (fu), (is), (is,ns), (iu), (iu,ns), (m), (m,is), or (m,is,ns).
See the Saturation topic in the Introduction chapter for a description of saturation behavior.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Mult32 Example
r0 = r0 * r3 (fu) ; /* fractional unsigned, place product in r0 */
r1:0 = r4 * r7 (fu) ; /* fractional unsigned, place product in r1:0 register pair */
Abstract
This instruction does a //C style//, modulo 32-bit multiply with no saturation.
See Also (16 x 16-Bit Multiply (Mult16), 32 x 32-bit Multiply (Mult32))
MultInt Description
The multiply 32-Bit operands instruction multiplies two 32-bit data registers (dest_reg and multiplier_register) and
saves the product in dest_reg. The instruction mimics multiplication in the C language and effectively performs
Dreg1 = (Dreg1 * Dreg2) modulo 232. Since the integer multiply is modulo 232, the result always fits in a 32-bit
dest_reg, and overflows are possible but not detected. The overflow status bit in the ASTAT register is never set.
Users are required to limit input numbers to ensure that the resulting product does not exceed the 32-bit dest_reg
capacity. If overflow notification is required, users should write their own multiplication macro with that capability.
Accumulators A0 and A1 are unchanged by this instruction.
The multiply 32-bit operands instruction does not implicitly modify the number in multiplier_register.
This instruction might be used to implement the congruence method of random number generation according to:
32
X [ n + a ] = ( a × X [ n ] )mod 2
MultInt Example
r3 *= r0 ; /* equivalent to r3 = r3 * r0 */
Abstract
This dual multiply-accumulate instruction multiplies two 16-bit half word operands. Then, the instruction stores,
adds, or subtracts the product into a designated accumulator register with saturation. By default, the instruction
treats all operands as signed fractions with left-shift correction as required. A second MAC operation occurs in paral-
lel.
See Also (Dual 16 x 16-Bit MAC with Moves to Registers (ParaMac16WithMvAndMac16WithMv), Dual 16 x 16-
Bit MAC with Move to Register (ParaMac16AndMac16WithMv), Dual 16 x 16-Bit MAC with Move to Register
(ParaMac16WithMvAndMac16))
ParaMac16AndMac16 Description
The dual multiply and multiply-accumulate to accumulator instruction is a dual (two instances issued in parallel) of
the 16 x 16-Bit MAC (Mac16) instruction. For more information about instruction operation, see that instruction's
reference page.
The parallel issue instructions operate independently and may use the same (or different) data registers for the com-
putation operands.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ParaMac16AndMac16 Example
a1 += r3.h * r2.h , a0 = r3.h * r2.h ;/
a1 -= r6.h * r4.l (fu) , a0 += r6.h * r4.l (fu) ;
a1 = r6.h * r4.l (fu) , a0 -= r3.h * r2.h ;
Abstract
This dual multiply-accumulate instruction multiplies two 16-bit half word operands. Then, the instruction stores,
adds, or subtracts the product into a designated accumulator register with saturation. By default, the instruction
treats all operands as signed fractions with left-shift correction as required. A second MAC operation occurs in paral-
lel.
See Also (Dual 16 x 16-Bit MAC (ParaMac16AndMac16), Dual 16 x 16-Bit MAC with Moves to Registers (Para-
Mac16WithMvAndMac16WithMv), Dual 16 x 16-Bit MAC with Move to Register (ParaMac16WithMvAnd-
Mac16))
ParaMac16AndMac16WithMv Description
The dual multiply and multiply-accumulate to half register (with move) instruction is a parallel issue instruction
with an instance of the the 16 x 16-Bit MAC (Mac16) instruction (using MAC1) and an instance of the 16 x 16-Bit
MAC with Move to Register (Mac16WithMv) instruction (using MAC0). For more information about instruction
operation, see that instruction's reference page.
The parallel issue instructions operate independently and may use the same (or different) data registers for the com-
putation operands.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ParaMac16AndMac16WithMv Example
a1 += r6.h * r4.l (fu) , r3.l = ( a0 = r3.h * r2.h ) ;
Abstract
This dual multiply-accumulate instruction multiplies two 16-bit half word operands. Then, the instruction stores,
adds, or subtracts the product into a designated accumulator register with saturation. By default, the instruction
treats all operands as signed fractions with left-shift correction as required. A second MAC operation occurs in paral-
lel.
See Also (Dual 16 x 16-Bit MAC (ParaMac16AndMac16), Dual 16 x 16-Bit MAC with Moves to Registers (Para-
Mac16WithMvAndMac16WithMv), Dual 16 x 16-Bit MAC with Move to Register
(ParaMac16AndMac16WithMv))
ParaMac16WithMvAndMac16 Description
The dual multiply and multiply-accumulate to half register (with move) instruction is a parallel issue instruction
with an instance of the 16 x 16-Bit MAC with Move to Register (Mac16WithMv) instruction (using MAC1) and an
instance of the 16 x 16-Bit MAC (Mac16) (using MAC0). For more information about instruction operation, see
that instruction's reference page.
The parallel issue instructions operate independently and may use the same (or different) data registers for the com-
putation operands.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ParaMac16WithMvAndMac16 Example
r3.h = (a1 += r6.h * r4.l) (fu) , a0 = r3.h * r2.h ;
Abstract
This dual multiply-accumulate instruction multiplies two 16-bit half word operands. Then, the instruction stores,
adds, or subtracts the product into a designated accumulator register with saturation. By default, the instruction
treats all operands as signed fractions with left-shift correction as required. A second MAC operation occurs in paral-
lel.
See Also (Dual 16 x 16-Bit MAC (ParaMac16AndMac16), Dual 16 x 16-Bit MAC with Move to Register (Para-
Mac16AndMac16WithMv), Dual 16 x 16-Bit MAC with Move to Register (ParaMac16WithMvAndMac16))
ParaMac16WithMvAndMac16WithMv Description
The dual multiply and multiply-accumulate to accumulator (with move) instruction is a dual (two instances issued
in parallel) of the 16 x 16-Bit MAC with Move to Register (Mac16WithMv) instruction. For more information
about instruction operation, see that instruction's reference page.
The parallel issue instructions operate independently and may use the same (or different) data registers for the com-
putation input operands. The instructions must NOT use the same data registers for results.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ParaMac16WithMvAndMac16WithMv Example
r1 = (a1 = r3 * r0) (m,fu) , r0 = (a0 = r0 * r3) (fu) ;
r3 = (a1 += r4 * r7) (m,is) , r2 = (a0 += r1 * r2) (is) ;
r1 = (a1 = r3 * r0) (m,fu) , r4 = (a0 -= r2 * r1) (iss2) ;
Abstract
This dual move and multiply-accumulate instruction multiplies two 16-bit half word operands. Then, the instruc-
tion stores, adds, or subtracts the product into a designated accumulator register with saturation. By default, the
instruction treats all operands as signed fractions with left-shift correction as required. A second (independent) move
operation occurs in parallel with the MAC operation.
See Also (Dual 16 x 16-Bit MAC with Moves to Registers (ParaMac16WithMvAndMv))
ParaMac16AndMv Description
The dual multiply and multiply-accumulate to half register (with move) instruction is a parallel issue instruction
with an instance of the the 16 x 16-Bit MAC (Mac16) instruction (using MAC1) and either an instance of the Move
16-Bit Accumulator Section to Low Half Register (MvA0ToDregL) instruction (using MAC0) or an instance of the
Move 32-Bit Accumulator Section to Even Register (MvA0ToDregE) instruction (using MAC0). For more informa-
tion about instruction operation, see that instruction's reference page.
The parallel issue instructions operate independently and may use the same (or different) data registers for the com-
putation operands.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ParaMac16AndMv Example
a1 += r6.h * r4.l (fu) , r3.l = a0 ;
a1 += r6.h * r4.l (fu) , r2 = a0 ;
Abstract
This dual move and multiply-accumulate instruction multiplies two 16-bit half word operands. Then, the instruc-
tion stores, adds, or subtracts the product into a designated accumulator register with saturation. By default, the
instruction treats all operands as signed fractions with left-shift correction as required. A second (independent) move
operation occurs in parallel with the MAC operation.
See Also (Dual 16 x 16-Bit MAC with Move to Register (ParaMac16AndMv))
ParaMac16WithMvAndMv Description
The dual multiply and multiply-accumulate to half register (with move) instruction is a parallel issue instruction
with an instance of the 16 x 16-Bit MAC with Move to Register (Mac16WithMv) instruction (using MAC1) and
either an instance of the Move 16-Bit Accumulator Section to Low Half Register (MvA0ToDregL) instruction (using
MAC0) or an instance of the Move 32-Bit Accumulator Section to Even Register (MvA0ToDregE) instruction (us-
ing MAC0). For more information about instruction operation, see that instruction's reference page.
The parallel issue instructions operate independently and may use the same (or different) data registers for the com-
putation operands.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ParaMac16WithMvAndMv Example
r1.h = (a1 += r6.h * r4.l) (fu) , r3.l = a0 ;
r1.h = (a1 += r6.h * r4.l) (fu) , r2 = a0 ;
r2 = (a1 += r6.h * r4.l) (fu) , r3.l = a0 ;
r0 = (a1 += r6.h * r4.l) (fu) , r2 = a0 ;
Abstract
This instruction executes a two parallel multiply operations on 16-bit registers.
ParaMult16AndMult16 Description
The dual multiply 16-bit operands instruction is a dual (two instances issued in parallel) of the 16 x 16-Bit Multiply
(Mult16) instruction. One of the parallel issue instructions executes on MAC1 with its results placed in a high half
data register. The other parallel issue instruction executes on MAC0 with its results placed in a low half data register.
For more information about instruction operation, see that instruction's reference page.
The parallel issue instructions operate independently and may use the same (or different) data registers for the com-
putation operands.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ParaMult16AndMult16 Example
r3.h = r6.h * r4.l (fu) , r3.l = r3.h * r2.h ;
Abstract
This dual move and multiply-accumulate instruction multiplies two 16-bit half word operands. Then, the instruc-
tion stores, adds, or subtracts the product into a designated accumulator register with saturation. By default, the
instruction treats all operands as signed fractions with left-shift correction as required. A second (independent) move
operation occurs in parallel with the MAC operation.
See Also (Dual Move to Register and 16 x 16-Bit MAC with Move to Register (ParaMvAndMac16WithMv))
ParaMvAndMac16 Description
The dual multiply and multiply-accumulate to half register (with move) instruction is a parallel issue instruction
with either an instance of the Move 16-Bit Accumulator Section to High Half Register (MvA1ToDregH) instruction
(using MAC1) or an instance of the Move 32-Bit Accumulator Section to Odd Register (MvA1ToDregO) instruc-
tion (using MAC1) and an instance of the 16 x 16-Bit MAC (Mac16) instruction (using MAC0). For more infor-
mation about instruction operation, see that instruction's reference page.
The parallel issue instructions operate independently and may use the same (or different) data registers for the com-
putation operands.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ParaMvAndMac16 Example
r3.l = a1 , a0 += r6.h * r4.l (fu) ;
r2 = a1 , a0 += r6.h * r4.l (fu) ;
Dual Move to Register and 16 x 16-Bit MAC with Move to Register (Para-
MvAndMac16WithMv)
General Form
Abstract
This dual move and multiply-accumulate instruction multiplies two 16-bit half word operands. Then, the instruc-
tion stores, adds, or subtracts the product into a designated accumulator register with saturation. By default, the
instruction treats all operands as signed fractions with left-shift correction as required. A second (independent) move
operation occurs in parallel with the MAC operation.
See Also (Dual Move to Register and 16 x 16-Bit MAC (ParaMvAndMac16))
ParaMvAndMac16WithMv Description
The dual multiply and multiply-accumulate to half register (with move) instruction is a parallel issue instruction
with either an instance of the Move 16-Bit Accumulator Section to High Half Register (MvA1ToDregH) instruction
(using MAC1) or an instance of the Move 32-Bit Accumulator Section to Odd Register (MvA1ToDregO) instruc-
tion (using MAC1) and an instance of the 16 x 16-Bit MAC with Move to Register (Mac16WithMv) instruction
(using MAC0). For more information about instruction operation, see that instruction's reference page.
The parallel issue instructions operate independently and may use the same (or different) data registers for the com-
putation operands.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ParaMvAndMac16WithMv Example
r3.h = a1 , r1.h = (a0 += r6.h * r4.l) (fu) ;
r3 = a1 , r1.h = (a0 += r6.h * r4.l) (fu) ;
r3.h = a1 , r2 = (a0 += r6.h * r4.l) (fu) ;
r7 = a1 , r0 = (a0 += r6.h * r4.l) (fu) ;
Abstract
This instruction adds or subtracts two pointer registers.
See Also (32-bit Add then Shift (DagAddSubShift), 32-bit Add or Subtract Constant (DagAddImm), 32-bit Add
Shifted Pointer (PtrOp))
DagAdd32 Description
The DAG AddSub32 instruction adds or subtracts source pointer registers and places the result in a destination
pointer register.
The instruction versions that explicitly modify an index register (Ireg) support optional circular buffering. For
more information, see Addressing Circular Buffers in the Address Arithmetic Unit (AAU) chapter. Unless circular
buffering is desired, disable this feature prior to issuing this instruction by clearing the length register (Lreg) corre-
sponding to the Ireg used in this instruction. For example, if using the i2 to increment your address pointer, first
clear l2 to disable circular buffering. Failure to explicitly clear the corresponding Lreg beforehand can result in
unexpected Ireg values.
The circular address buffer registers (index, length, and base) are not initialized automatically by processor reset. The
recommended operation is that user software clears all the circular address buffer registers during boot-up to disable
circular buffering, then initializes these registers later, if needed.
When the bit reverse carry adder (BREV) is specified in the instruction syntax, the carry bit is propagated from left-
to-right, as shown in the Bit Addition Flow for the Bit Reverse (BREV) Case figure, instead of being propagated from
right-to-left (default operation). When bit reversal is used on the index register version of this instruction, circular
buffering is disabled to support operand addressing for FFT, DCT, and DFT algorithms. The pointer register ver-
sion of this instruction does not support circular buffering.
an a2 a1 a0
cn c2 c1
+ + + + c0
bn b2 b1 b0
Figure 8-7: Bit Addition Flow for the Bit Reverse (BREV) Case
This instruction has a special application, regarding load or store operations. Typically, programs use the index regis-
ter and pointer register versions of this instruction to increment or decrement indirect address pointers for load or
store operations.
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), but may not be issued in
parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
DagAdd32 Example
p5 = p3 + p0 ; /* dest_Preg = src1_Preg + src0_Preg */
p3 -= p0 ; /* dest_Preg_new = dest_Preg_old - src_Preg */
i1 -= m2 ; /* dest_Ireg_new = dest_Ireg_old - src_Mreg */
p3 += p0 (brev) ; /* dest_Preg_new = dest_Preg_old + src_Preg (bit reversed carry, only) */
i1 += m1 ; /* dest_Ireg_new = dest_Ireg_old + src_Mreg */
i0 += m0 (brev) ; /* optional bit reverse carry, only */
Abstract
This instruction allows the user to add a constant to a register.
See Also (32-bit Add or Subtract (DagAdd32), 32-bit Add then Shift (DagAddSubShift), 32-bit Add Shifted Pointer
(PtrOp))
DagAddImm Description
The DAG AddImm instruction adds or subtracts a source pointer registers and a constant value, then places the
result in a destination pointer register.
The instruction versions that explicitly modify an index register (Ireg) support optional circular buffering. For
more information, see Addressing Circular Buffers in the Address Arithmetic Unit (AAU) chapter. Unless circular
buffering is desired, disable this feature prior to issuing this instruction by clearing the length register (Lreg) corre-
sponding to the Ireg used in this instruction. For example, if using the i2 to increment your address pointer, first
clear l2 to disable circular buffering. Failure to explicitly clear the corresponding Lreg beforehand can result in
unexpected Ireg values.
The circular address buffer registers (index, length, and base) are not initialized automatically by processor reset. The
recommended operation is that user software clears all the circular address buffer registers during boot-up to disable
circular buffering, then initializes these registers later, if needed
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), but may not be issued in
parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
DagAddImm Example
p5 += -8 ; /* Preg = Preg + constant */
i0 += 2 ; /* Ireg = Ireg + 2 */
i1 += 4 ; /* Ireg = Ireg + 4 */
i2 -= 2 ; /* Ireg = Ireg - 2 */
i0 -= 4 ; /* Ireg = Ireg - 4 */
Abstract
This instruction adds then shift left one or two places. Saturation is not supported.
See Also (32-bit Add or Subtract (DagAdd32), 32-bit Add or Subtract Constant (DagAddImm), 32-bit Add Shifted
Pointer (PtrOp))
DagAddSubShift Description
The add with shift instruction adds two source pointer register, then applies a one- or two-bit logical shift left. The
left shift accomplishes a x2 or x4 multiplication on sign-extended numbers.
The add with shift instruction does not intrinsically modify values that are strictly input. However, the dest_reg
serves as an input as well as the result, so the dest_reg is intrinsically modified.
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), but may not be issued in
parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
DagAddSubShift Example
p3 = (p3 + p2) << 1 ;
/* dest_reg = (dest_reg + src_reg) x 2 */
/* p3 = (p3 + p2) * 2 */
Abstract
This instruction adds or subtracts pointer and DAG registers.
See Also (32-bit Add or Subtract (DagAdd32), 32-bit Add then Shift (DagAddSubShift), 32-bit Add or Subtract
Constant (DagAddImm))
PtrOp Description
The shift with add instruction combines a one- or two-bit logical shift left with an addition operation.
The instruction provides a shift-then-add method that supports a rudimentary multiplier sequence useful for array
pointer manipulation.
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), but may not be issued in
parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
PtrOp Example
p3 = p0 + (p3 << 1) ;
/* p3 = (p3 * 2) + p0 */
/* adder_pntr + (src_pntr * 2) */
p3 = p0 + (p3 << 2) ;
/* p3 = (p3 * 4) + p0 */
/* adder_pntr + (src_pntr * 4) */
Abstract
This instruction shifts a pointer register by the specified number of bits.
LShiftPtr Description
The logical shift pointer instruction logically shifts a pointer register by a specified distance and direction.
Logical shifts discard any bits shifted out of the register and backfill vacated bits with zeros.
The logical shift pointer instruction does not implicitly modify the input src_pntr value. However, the dest_pntr
can be the same pointer register as src_pntr. Doing so explicitly modifies the source register.
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), but may not be issued in
parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
LShiftPtr Example
p3 = p2 >> 1 ; /* pointer right shift by 1 */
p3 = p3 >> 2 ; /* pointer right shift by 2 */
p4 = p5 << 1 ; /* pointer left shift by 1 */
p0 = p1 << 2 ; /* pointer left shift by 2 */
Rotate Operations
These operations provide bitwise rotate operations on register and immediate value operands:
• 32-Bit Rotate (Shift_Rot32)
Shift (Dsp32Shf )
DREG Register Type = rot DREG Register Type by DREG_L Register Type
Shift Immediate (Dsp32ShfImm)
DREG Register Type = rot DREG Register Type by imm6 Register Type
Abstract
This instruction rotates the a register through the CC bit a specified distance and direction. The CC bit is in the
rotate chain.
Shift_Rot32 Description
The rotate data register instruction rotates a data register through the CC bit a specified distance and direction. The
CC bit is in the rotate chain. Consequently, the first value rotated into the register is the initial value of the CC bit.
Rotation shifts all the bits either right or left. Each bit that rotates out of the register (the LSB for rotate right or the
MSB for rotate left) is stored in the CC bit, and the CC bit is stored into the bit vacated by the rotate on the opposite
end of the register.
If 31 0
D-register: 1010 1111 0000 0000 0000 0000 0001 1010
CC bit: N (“1” or “0”)
If 31 0
D-register: 1010 1111 0000 0000 0000 0000 0001 1010
CC bit: N (“1” or “0”)
The sign of the rotate magnitude determines the direction of the rotation.
• Positive rotate magnitudes produce Left rotations.
• Negative rotate magnitudes produce Right rotations.
Valid rotate magnitudes are –32 through +31, zero included. The rotate instruction masks and ignores bits that are
more significant than those allowed. The distance is determined by the lower 6 bits (sign extended) of the
shift_magnitude.
Unlike shift operations, the rotate instruction loses no bits of the source register data. Instead, it rearranges them in a
circular fashion. However, the last bit rotated out of the register remains in the CC bit, and is not returned to the
register. Because rotates are performed all at once and not one bit at a time, rotating one direction or another regard-
less of the rotate magnitude produces no advantage. For instance, a rotate right by two bits is no more efficient than
a rotate left by 30 bits. Both methods produce identical results in identical execution time.
This instruction rotates all 32 bits of the data register.
The instruction does not implicitly modify the src_reg values. Optionally, dest_reg can be the same data regis-
ter as src_reg. Doing this explicitly modifies the source register.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Shift_Rot32 Example
r4 = rot r1 by 8 ; /* rotate left (Dreg = ROT Dreg BY imm6) */
r4 = rot r1 by -5 ; /* rotate right */
Shift (Dsp32Shf )
a0 = rot a0 by DREG_L Register Type
Abstract
This instruction rotates the accumulator through the CC bit a specified distance and direction. The CC bit is in the
rotate chain.
Shift_RotAcc Description
This instruction rotates the accumulator through the CC bit a specified distance and direction. The CC bit is in the
rotate chain. Consequently, the first value rotated into the register is the initial value of the CC bit and the last bit
rotated out ends up in CC. The sign of the rotate magnitude determines the direction of the rotation.
• Positive rotates left
• Negative rotates right
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Shift Operations
These operations provide arithmetic or logical shift operations on register and immediate value operands:
• 16-Bit Arithmetic Shift (AShift16)
• Vectored 16-Bit Arithmetic (AShift16Vec)
• Accumulator Arithmetic Shift (AShiftAcc)
• 32-Bit Arithmetic Shift (AShift32)
• 16-Bit Logical Shift (LShift16)
• Vectored 16-Bit Logical Shift (LShift16Vec)
• 32-Bit Logical Shift (LShift)
Shift (Dsp32Shf )
DREG_L Register Type = ashift DREG_L Register Type by DREG_L Register Type
DREG_L Register Type = ashift DREG_H Register Type by DREG_L Register Type
DREG_H Register Type = ashift DREG_L Register Type by DREG_L Register Type
DREG_H Register Type = ashift DREG_H Register Type by DREG_L Register Type
DREG_L Register Type = ashift DREG_L Register Type by DREG_L Register Type (s)
DREG_L Register Type = ashift DREG_H Register Type by DREG_L Register Type (s)
DREG_H Register Type = ashift DREG_L Register Type by DREG_L Register Type (s)
DREG_H Register Type = ashift DREG_H Register Type by DREG_L Register Type (s)
Shift Immediate (Dsp32ShfImm)
DREG_L Register Type = DREG_L Register Type AHSH4
DREG_L Register Type = DREG_H Register Type AHSH4
DREG_H Register Type = DREG_L Register Type AHSH4
DREG_H Register Type = DREG_H Register Type AHSH4
DREG_L Register Type = DREG_L Register Type AHSH4S
DREG_L Register Type = DREG_H Register Type AHSH4S
DREG_H Register Type = DREG_L Register Type AHSH4S
DREG_H Register Type = DREG_H Register Type AHSH4S
Abstract
This instruction shifts left or right and preserves the sign bit. For right shifts, the sign bit back-fills the left-most bit
vacated by the shift. For left shifts, if the shift causes the sign bit to be lost, the result will saturate to the maximum
positive or negative value depending on the lost sign bit.
See Also (Vectored 16-Bit Arithmetic (AShift16Vec))
AShift16 Description
The arithmetic shift low/high half data register destination instruction shifts a register contents a specified distance
(shift_magnitude) and direction (based on syntax and/or shift_magnitude sign) while preserving the sign
bit of the original number. This instruction provides arithmetic shift right, logical shift left, and arithmetic shift left
(with saturation) operations.
NOTE: For information about the difference between arithmetic and logical shift operations, the definitions for
Arithmetic Shift and Logical Shift on The Science Dictionary site are helpful.
The versions of this instruction using ashift syntax support the following shift operations:
• For a positive shift_magnitude, ashift produces an arithmetic shift right with sign bit preservation. The
sign bit value back-fills the left-most bit positions vacated by the arithmetic shift right.
• For a negative shift_magnitude, ashift produces a logical shift left, but does not guarantee sign bit pres-
ervation. If the negative shift_magnitude is too large, the ashift operation saturates the destination regis-
ter. A logical shift left that would otherwise lose non-sign bits off the left-hand side saturates to the maximum
positive or negative value instead.
NOTE: One may view the ashift operation as a multiplication or division operation. Viewed this way, the
shift_magnitude is the power of 2 multiplied by the src_reg number. Positive magnitudes cause
multiplication ( N x 2n ), and negative magnitudes produce division ( N x 2-n or N / 2n ).
The versions of this instruction using >>> syntax only support arithmetic right shift operations using positive
shift_magnitude values.
The versions of this instruction using <<< syntax only support logical left shift operations using positive
shift_magnitude values.
The versions of this instruction using << with (s) syntax only support arithmetic shift left operations (with satura-
tion) using positive shift_magnitude values.
The Arithmetic Shift (16 Bit Destination Register) Operations table provide more detailed information about arith-
metic shift operations.
Where permitted (optional) or required the saturation (s) option applies saturation of the result. For shift operations
without saturation enabled, values may be left-shifted so far that all the sign bits overflow and are lost. For shift oper-
ations with saturation enabled, a left shift that would otherwise shift nonsign bits off the left-hand side saturates to
the maximum positive or negative value instead. The result always keeps the same sign as the pre-shifted value when
saturation is enabled.
See the Saturation topic in the Introduction chapter for a description of saturation behavior.
This 32-bit instructions can sometimes save execution time over a 16-bit encoded instruction, because it can be is-
sued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
AShift16 Example
/* AShift16 syntax summary */
/* Dreg_lo_hi = ashift Dreg_lo_hi BY Dreg_lo (optional_sat) ; arithmetic or logical shift
with optional saturation*/
/* Dreg_lo_hi = Dreg_lo_hi >>> uimm4 (optional_sat) ; arithmetic shift right with optional
saturation*/
/* Dreg_lo_hi = Dreg_lo_hi <<< uimm4 ; logical shift left*/
/* Dreg_lo_hi = Dreg_lo_hi <<< 0 ; logical shift left*/
/* Dreg_lo_hi = Dreg_lo_hi << uimm4 (s) ; arithmetic shift left with saturation*/
/* Dreg_lo_hi = Dreg_lo_hi << 0 (s) ; arithmetic shift left with saturation*/
/* AShift16 syntax examples */
r3.l = r0.h >>> 7 ; /* arithmetic right shift, half-word */
r3.h = r0.h >>> 5 ; /* same as above; any combination of upper and lower half-words is
supported */
r3.l = r0.h >>> 7(s) ; /* arithmetic right shift, half-word, saturated */
r3.l = r0.h << 12 (s) ; /* arithmetic left shift */
r3.l = ashift r0.h by r7.l ; /* shift, half-word */
r3.h = ashift r0.l by r7.l ;
r3.h = ashift r0.h by r7.l ;
r3.l = ashift r0.l by r7.l ;
r3.l = ashift r0.h by r7.l(s) ; /* shift, half-word, saturated */
r3.h = ashift r0.l by r7.l(s) ; /* shift, half-word, saturated */
r3.h = ashift r0.h by r7.l(s) ;
r3.l = ashift r0.l by r7.l (s) ;
/* If r0.h = -64, then performing . . . */
r3.h = r0.h >>> 4 ;
/* . . . produces r3.h = -4, preserving the sign */
Shift (Dsp32Shf )
DREG Register Type = ashift DREG Register Type by DREG_L Register Type (v)
DREG Register Type = ashift DREG Register Type by DREG_L Register Type (v,s)
Shift Immediate (Dsp32ShfImm)
DREG Register Type = DREG Register Type AHSH4 (v)
DREG Register Type = DREG Register Type AHSH4VS
Abstract
This instruction shifts a 16-bit vector left or right by the value in the XOP register. When shifting right, the sign bit
will be replicated. If saturation is specified, ASHIFT lefts will saturate if any of the bits shifted off do not match the
original sign bit.
See Also (16-Bit Arithmetic Shift (AShift16))
AShift16Vec Description
The arithmetic shift data register destination (vector) instruction performs two independent shifts, shifting a con-
tents of a register's low half and high half a specified distance (shift_magnitude) and direction (based on syntax
and/or shift_magnitude sign) while preserving the sign bits of the original numbers. Although the two half-
word registers are shifted at the same time, the two numbers are kept separate. This instruction provides arithmetic
shift right and logical shift left operations.
NOTE: For information about the difference between arithmetic and logical shift operations, the definitions for
Arithmetic Shift and Logical Shift on The Science Dictionary site are helpful.
The versions of this instruction using ashift syntax support the following shift operations:
• For a positive shift_magnitude, ashift produces an arithmetic shift right with sign bit preservation. The
sign bit value back-fills the left-most bit positions vacated by the arithmetic shift right.
• For a negative shift_magnitude, ashift produces a logical shift left, but does not guarantee sign bit pres-
ervation. If the negative shift_magnitude is too large, the ashift operation saturates the destination regis-
ter. A logical shift left that would otherwise lose non-sign bits off the left-hand side saturates to the maximum
positive or negative value instead.
NOTE: One may view the ashift operation as a multiplication or division operation. Viewed this way, the
shift_magnitude is the power of 2 multiplied by the src_reg number. Positive magnitudes cause
multiplication ( N x 2n ), and negative magnitudes produce division ( N x 2-n or N / 2n ).
The versions of this instruction using >>> syntax only support arithmetic right shift operations using positive
shift_magnitude values.
The versions of this instruction using <<< syntax only support logical left shift operations using positive
shift_magnitude values.
The versions of this instruction using << with (v,s) syntax only support logical shift left operations (with satura-
tion) using positive shift_magnitude values.
The Arithmetic Shift (16 Bit Destination Register) Operations table provide more detailed information about arith-
metic shift operations.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
AShift16Vec Example
/* AShift16Vec syntax summary */
Dreg1 = ashift Dreg1 by Dreg0_lo (v) /* arithmetic/logical shift, vector (dual) */
Dreg1 = ashift Dreg1 by Dreg0_lo (v,s) /* arithmetic/logical shift, vector (dual) */
Dreg = Dreg <<< 0 (v) /* logical shift left, vector (dual) */
Dreg = Dreg <<< UImm4 (v) /* logical shift left, vector (dual) */
Dreg = Dreg >>> UImm4N (v) /* arithmetic shift right, vector (dual) */
Dreg = Dreg << 0 (v,s) /* logical shift left with saturation, vector (dual) */
Dreg = Dreg << UImm4 (v,s) /* logical shift left with saturation, vector (dual) */
Dreg = Dreg >>> UImm4N (v,s) /* arithmetic shift right with saturation, vector (dual) */
/* AShift16Vec syntax examples */
r4=r5>>3 (v) ; /* logical right shift immediate R5.H and R5.L by 3 bits */
r4=r5<<3 (v) ; /* logical left shift immediate R5.H and R5.L by 3 bits */
r2=lshift r7 by r5.l (v) ;
/* logically shift (right or left, depending on sign of r5.l) R7.H and R7.L by magnitude of
R5.L */
Abstract
This instruction shifts left or right and preserves the sign bit. For right shifts, the sign bit back-fills the left-most bit
vacated by the shift. For left shifts, if the shift causes the sign bit to be lost, the result will saturate to the maximum
positive or negative value depending on the lost sign bit.
See Also (Accumulator Arithmetic Shift (AShiftAcc))
AShift32 Description
The arithmetic shift data register destination instruction shifts a register contents a specified distance
(shift_magnitude) and direction (based on syntax and/or shift_magnitude sign) while preserving the sign
bit of the original number. This instruction provides arithmetic shift right, logical shift left, and arithmetic shift left
(with saturation) operations.
NOTE: For information about the difference between arithmetic and logical shift operations, the definitions for
Arithmetic Shift and Logical Shift on The Science Dictionary site are helpful.
The versions of this instruction using ashift syntax support the following shift operations:
• For a positive shift_magnitude, ashift produces an arithmetic shift right with sign bit preservation. The
sign bit value back-fills the left-most bit positions vacated by the arithmetic shift right.
• For a negative shift_magnitude, ashift produces a logical shift left, but does not guarantee sign bit pres-
ervation. If the negative shift_magnitude is too large, the ashift operation saturates the destination regis-
ter. A logical shift left that would otherwise lose non-sign bits off the left-hand side saturates to the maximum
positive or negative value instead.
NOTE: One may view the ashift operation as a multiplication or division operation. Viewed this way, the
shift_magnitude is the power of 2 multiplied by the src_reg number. Positive magnitudes cause
multiplication ( N x 2n ), and negative magnitudes produce division ( N x 2-n or N / 2n ).
The versions of this instruction using >>>= and >>> syntax only support arithmetic right shift operations using
positive shift_magnitude values.
The versions of this instruction using <<< syntax only support logical left shift operations using positive
shift_magnitude values.
The versions of this instruction using << with (s) syntax only support arithmetic shift left operations (with satura-
tion) using positive shift_magnitude values.
The Arithmetic Shift (32 Bit Destination Register) Operations table provide more detailed information about arith-
metic shift operations.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
AShift32 Example
/* AShift32 syntax summary */
/* Dreg >>>= Dreg ; arithmetic shift right */
/* Dreg >>>= UImm5 ; arithmetic shift right */
/* Dreg = ashift Dreg by Dreg_lo (optional_sat) ; arithmetic/logical shift with optional
saturation */
/* Dreg = Dreg <<< 0 ; logical shift left */
/* Dreg = Dreg <<< UImm5 ; logical shift left */
/* Dreg = Dreg >>> UImm5N ; arithmetic shift right */
/* Dreg = Dreg << 0 (s) ; arithmetic shift left with saturation */
/* Dreg = Dreg << UImm5 (s) ; arithmetic shift left with saturation */
/* Dreg = Dreg >>> UImm5 ; arithmetic shift right */
Shift (Dsp32Shf )
a0 = ashift a0 by DREG_L Register Type
a1 = ashift a1 by DREG_L Register Type
Shift Immediate (Dsp32ShfImm)
a0 = a0 ASH5
a1 = a1 ASH5
Abstract
This instruction shifts left or right and preserves the sign bit. For right shifts, the sign bit back-fills the left-most bit
vacated by the shift.
See Also (32-Bit Arithmetic Shift (AShift32))
AShiftAcc Description
The arithmetic shift accumulator register destination instruction shifts a register contents a specified distance
(shift_magnitude) and direction (based on syntax and/or shift_magnitude sign) while preserving the sign
bit of the original number. This instruction provides arithmetic shift right and logical shift left operations.
NOTE: For information about the difference between arithmetic and logical shift operations, the definitions for
Arithmetic Shift and Logical Shift on The Science Dictionary site are helpful.
The versions of this instruction using ashift syntax support the following shift operations:
• For a positive shift_magnitude, ashift produces an arithmetic shift right with sign bit preservation. The
sign bit value back-fills the left-most bit positions vacated by the arithmetic shift right.
• For a negative shift_magnitude, ashift produces a logical shift left, but does not guarantee sign bit pres-
ervation. If the negative shift_magnitude is too large, the ashift operation saturates the destination regis-
ter. A logical shift left that would otherwise lose non-sign bits off the left-hand side saturates to the maximum
positive or negative value instead.
NOTE: One may view the ashift operation as a multiplication or division operation. Viewed this way, the
shift_magnitude is the power of 2 multiplied by the src_reg number. Positive magnitudes cause
multiplication ( N x 2n ), and negative magnitudes produce division ( N x 2-n or N / 2n ).
The versions of this instruction using >>> syntax only support arithmetic right shift operations using positive
shift_magnitude values.
The versions of this instruction using <<< syntax only support logical left shift operations using positive
shift_magnitude values.
The Arithmetic Shift (Accumulator Destination Register) Operations table provide more detailed information about
arithmetic shift operations.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
AShiftAcc Example
/* AShiftAcc syntax summary */
/* a0 = ashift a0 by Dreg_lo ; arithmetic/logical shift */
/* a1 = ashift a1 by Dreg_lo ; arithmetic/logical shift */
/* a0 = a0 <<< 0 ; logical shift left */
Shift (Dsp32Shf )
DREG_L Register Type = lshift DREG_L Register Type by DREG_L Register Type
DREG_L Register Type = lshift DREG_H Register Type by DREG_L Register Type
DREG_H Register Type = lshift DREG_L Register Type by DREG_L Register Type
DREG_H Register Type = lshift DREG_H Register Type by DREG_L Register Type
Shift Immediate (Dsp32ShfImm)
DREG_L Register Type = DREG_L Register Type LHSH4
DREG_L Register Type = DREG_H Register Type LHSH4
DREG_H Register Type = DREG_L Register Type LHSH4
DREG_H Register Type = DREG_H Register Type LHSH4
Abstract
This instruction shifts a register half by the specified number of bits and returns the shifted value.
See Also (Vectored 16-Bit Logical Shift (LShift16Vec))
LShift16 Description
The logical shift low/high half data register destination instruction shifts a register contents a specified distance
(shift_magnitude) and direction (based on syntax and/or shift_magnitude sign), discarding any bits shifted
out of the register and backfilling vacated bits with zeros. This instruction provides logical shift right and logical
shift left operations.
NOTE: For information about the difference between arithmetic and logical shift operations, the definitions for
Arithmetic Shift and Logical Shift on The Science Dictionary site are helpful.
The versions of this instruction using lshift syntax support the following shift operations:
• For a positive shift_magnitude, lshift produces an logical shift right, discarding any bits shifted out of
the register and backfilling vacated bits with zeros.
• For a negative shift_magnitude, lshift produces a logical shift left, discarding any bits shifted out of the
register and backfilling vacated bits with zeros.
NOTE: Shift magnitudes that exceed the size of the destination register produce all zeros in the result. For exam-
ple, shifting a 16-bit register value by 20 bit places (a valid operation) produces 0x0000.
The versions of this instruction using >> syntax only support logical shift right operations using positive
shift_magnitude values.
The versions of this instruction using << syntax only support logical shift left operations using positive
shift_magnitude values.
The Logical Shift (16 Bit Destination Register) Operations table provide more detailed information about logical
shift operations.
This 32-bit instructions can sometimes save execution time over a 16-bit encoded instruction, because it can be is-
sued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LShift16 Example
/* LShift16 syntax summary */
/* DDST_Lo_Hi = lshift DSRC1_Lo_Hi by DSRC0_Lo ; logical shift */
/* DDST_Lo_Hi = DSRC_Lo_Hi << 0 ; logical shift left */
/* DDST_Lo_Hi = DSRC_Lo_Hi << UImm4 ; logical shift lef */
/* DDST_Lo_Hi = DSRC_Lo_Hi >> UImm4N ; logical shift right */
/* LShift16 syntax examples */
r3.l = r0.l >> 4 ; /* logical shift right, half-word register */
r3.l = r0.h >> 4 ; /* logical shift right; half-word register combinations are arbitrary */
r3.h = r0.l << 12 ; /* logical shift left, half-word register */
r3.h = r0.h << 14 ; /* logical shift left; half-word register combinations are arbitrary */
r3.l = lshift r0.l by r2.l ; /* logical shift, direction controlled by sign of R2.L */
r3.h = lshift r0.l by r2.l ;
/* If r0.h = -64 (or 0xFFC0), then performing . . . */
r3.h = r0.h >> 4 ;
/* . . . produces r3.h = 0x0FFC (or 4092), losing the sign */
Shift (Dsp32Shf )
DREG Register Type = lshift DREG Register Type by DREG_L Register Type (v)
Shift Immediate (Dsp32ShfImm)
DREG Register Type = DREG Register Type LHSH4 (v)
Abstract
This instruction shifts a 16-bit vector left or right by the value in the XOP register.
See Also (16-Bit Logical Shift (LShift16))
LShift16Vec Description
The logical shift data register destination (vector) instruction performs two independent shifts, shifting a contents of
a register's low half and high half a specified distance (shift_magnitude) and direction (based on syntax and/or
shift_magnitude sign), discarding any bits shifted out of the two half registers and backfilling vacated bits with
zeros. This instruction provides logical shift right and logical shift left operations.
NOTE: For information about the difference between arithmetic and logical shift operations, the definitions for
Arithmetic Shift and Logical Shift on The Science Dictionary site are helpful.
The versions of this instruction using lshift syntax support the following shift operations:
• For a positive shift_magnitude, lshift produces an logical shift right, discarding any bits shifted out of
the register and backfilling vacated bits with zeros.
• For a negative shift_magnitude, lshift produces a logical shift left, discarding any bits shifted out of the
register and backfilling vacated bits with zeros.
NOTE: Shift magnitudes that exceed the size of the destination register produce all zeros in the result. For exam-
ple, shifting a 16-bit register value by 20 bit places (a valid operation) produces 0x0000.
The versions of this instruction using >> syntax only support logical shift right operations using positive
shift_magnitude values.
The versions of this instruction using << syntax only support logical shift left operations using positive
shift_magnitude values.
The Logical Shift (16 Bit Destination Register) Operations table provide more detailed information about logical
shift operations.
The dest_reg and src_reg may be a 32-bit half data register. The shift operations are applied to the 16-bit half
registers within the src_reg.
For 16-bit src_reg, valid shift magnitudes are –16 through +15, zero included.
The data register versions of this instruction shift 16 bits for half-word registers.
The half data register versions of this instruction do not implicitly modify the src_reg values. Optionally,
dest_reg may be the same data register as src_reg. Doing this explicitly modifies the source register.
This 32-bit instructions can sometimes save execution time over a 16-bit encoded instruction, because it can be is-
sued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LShift16Vec Example
/* LShift16Vec syntax summary */
/* DDST = lshift DSRC1 by DSRC0_L (v) ; logical shift, vector (dual) */
/* DDST = DSRC << 0 (v) ; logical shift left, vector (dual) */
/* DDST = DSRC << UImm4 (v) ; logical shift left, vector (dual) */
/* DDST = DSRC >> UImm4N (v) ; logical shift right, vector (dual) */
/* LShiftVec syntax examples */
r4=r5>>3 (v) ; /* logical right shift immediate R5.H and R5.L by 3 bits */
r4=r5<<3 (v) ; /* logical left shift immediate R5.H and R5.L by 3 bits */
r2=lshift r7 by r5.l (v) ;
/* logically shift (right or left, depending on sign of r5.l) R7.H and R7.L by magnitude of
R5.L */
Abstract
This instruction shifts a register by the specified number of bits and returns the shifted value.
LShift Description
The logical shift data register destination instruction shifts a register contents a specified distance (shift_magni-
tude) and direction (based on syntax and/or shift_magnitude sign), discarding any bits shifted out of the regis-
ter and backfilling vacated bits with zeros. This instruction provides logical shift right and logical shift left opera-
tions.
NOTE: For information about the difference between arithmetic and logical shift operations, the definitions for
Arithmetic Shift and Logical Shift on The Science Dictionary site are helpful.
The versions of this instruction using lshift syntax support the following shift operations:
• For a positive shift_magnitude, lshift produces an logical shift right, discarding any bits shifted out of
the register and backfilling vacated bits with zeros.
• For a negative shift_magnitude, lshift produces a logical shift left, discarding any bits shifted out of the
register and backfilling vacated bits with zeros.
NOTE: Shift magnitudes that exceed the size of the destination register produce all zeros in the result. For exam-
ple, shifting a 16-bit register value by 20 bit places (a valid operation) produces 0x0000.
The versions of this instruction using >>= and >> syntax only support logical shift right operations using positive
shift_magnitude values.
The versions of this instruction using <<= and << syntax only support logical shift left operations using positive
shift_magnitude values.
The Logical Shift (16 Bit Destination Register) Operations table provide more detailed information about logical
shift operations.
The versions of this instruction using >>= and <<= syntax are 16-bit instructions (which takes up less memory space
over a 32-bit encoded instruction), but may not be issued in parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LShift Example
/* LShift syntax summary */
/* DDST >>= DSRC ; logical shift right */
/* DDST <<= DSRC ; logical shift left */
/* DDST >>= SRCI ; logical shift right */
/* DDST <<= SRCI ; logical shift left */
/* DDST = lshift DSRC1 by DSRC0_L ; logical shift */
/* DDST = DSRC << 0 ; logical shift left */
/* DDST = DSRC << UImm5 ; logical shift left */
/* DDST = DSRC >> UImm5N ; logical shift right */
/* LShift syntax examples */
r3 >>= 17 ; /* logical shift right */
r3 <<= 17 ; /* logical shift left */
r3 = r6 >> 4 ; /* logical shift right, 32-bit word */
r3 = r6 << 4 ; /* logical shift left, 32-bit word */
r3 >>= r0 ; /* logical shift right */
r3 <<= r1 ; /* logical shift left */
Shift (Dsp32Shf )
a0 = lshift a0 by DREG_L Register Type
a1 = lshift a1 by DREG_L Register Type
Shift Immediate (Dsp32ShfImm)
a0 = a0 LSH5
a1 = a1 LSH5
Abstract
This instruction shifts an accumulator left by the specified number of bits and returns the shifted value.
LShiftA Description
The logical shift accumulator register destination instruction shifts a register contents a specified distance
(shift_magnitude) and direction (based on syntax and/or shift_magnitude sign), discarding any bits shifted
out of the register and backfilling vacated bits with zeros. This instruction provides logical shift right and logical
shift left operations.
NOTE: For information about the difference between arithmetic and logical shift operations, the definitions for
Arithmetic Shift and Logical Shift on The Science Dictionary site are helpful.
The versions of this instruction using lshift syntax support the following shift operations:
• For a positive shift_magnitude, lshift produces an logical shift right, discarding any bits shifted out of
the register and backfilling vacated bits with zeros.
• For a negative shift_magnitude, lshift produces a logical shift left, discarding any bits shifted out of the
register and backfilling vacated bits with zeros.
NOTE: Shift magnitudes that exceed the size of the destination register produce all zeros in the result. For exam-
ple, shifting a 16-bit register value by 20 bit places (a valid operation) produces 0x0000.
The versions of this instruction using >> syntax only support logical shift right operations using positive
shift_magnitude values.
The versions of this instruction using << syntax only support logical shift left operations using positive
shift_magnitude values.
The Logical Shift (Accumulator Destination Register) Operations table provide more detailed information about
logical shift operations.
The dest_reg and src_reg must be the same 40-bit accumulator register.
For 32-bit src_reg, valid shift magnitudes are –32 through +31, zero included.
The accumulator versions shift all 40 bits of those registers.
The accumulator versions of this instruction always implicitly modify the src_reg values.
This 32-bit instructions can sometimes save execution time over a 16-bit encoded instruction, because it can be is-
sued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LShiftA Example
/* LShiftA syntax summary */
/* a0_1 = lshift a0_1 by DSRC0_L ; logical shift */
/* a0_1 = a0_1 << 0 ; logical shift left */
/* a0_1 = a0_1 << UImm5 ; logical shift left */
/* a0_1 = a0_1 >> UImm5N ; logical shift right */
/* LShiftA syntax examples */
a0 = a0 >> 7 ; /* Accumulator right shift */
a1 = a1 >> 25 ; /* Accumulator right shift */
a0 = a0 << 7 ; /* Accumulator left shift */
a1 = a1 << 14 ; /* Accumulator left shift */
a0 = lshift a0 by r7.l ;
a1 = lshift a1 by r7.l ;
Sequencer Instructions
The sequencer instructions provide program flow control operations, which execute on the control unit in the pro-
cessor core. Users can take advantage of these instructions to force new values into the program counter and change
program flow, branch conditionally, set up loops, and call and return from subroutines.
SP
I3 L3 B3 M3 FP
I2 L2 B2 M2 P5
I1 L1 B1 M1 DAG1 P4
I0 L0 B0 M0 P3
DAG0
P2
DA1 32
P1
DA0 32
P0
TO MEMORY
32 32
RAB PREG
SD 32
LD1 32 32 ASTAT
LD0 32
32
SEQUENCER
R7.H R7.L
R6.H R6.L
R5.H R5.L ALIGN
16 32 16
R4.H R4.L
8 8 8 8
R3.H R3.L
R2.H R2.L DECODE
R1.H R1.L BARREL
R0.H R0.L 40 SHIFTER 40 LOOP BUFFER
72
40 A0 A1 40 CONTROL
UNIT
32 32
Branch Operations
These operations provide branching of program flow operations, unconditionally or with conditional operands:
• Conditional Jump Immediate (BrCC)
• Jump (Jump)
• Jump Immediate (JumpAbs)
• Call (Call)
• Return from Branch (Return)
• Hardware Loop Set Up (LoopSetup)
Abstract
The Jump instruction forces a new value into the Program Counter (PC) to change program flow. This branches
based on the value of the CC0 status bit. The BP option helps the processor improve branch instruction perform-
ance. The default is branch predicted-not-taken.
See Also (Jump (Jump), Jump Immediate (JumpAbs))
BrCC Description
The branch CC (conditional jump) instruction forces a new value into the Program Counter (PC) to change the
program flow, based on the value of the CC bit.
• For if CC, a CC bit = 1 causes a branch to an address, computed by adding the signed, even offset to the
current PC value.
• For if !cc, a cc bit = 0 causes a branch to an address, computed by adding the signed, even relative offset to
the current PC value.
The range of valid offset values for the jump is –1024 through 1022.
The branch prediction option, (bp), helps the processor improve branch instruction performance. The default is
branch predicted-not-taken. By appending (bp) to the instruction, the branch becomes predicted-taken.
Typically, code analysis shows that a good default condition is to predict branch-taken for branches to a prior ad-
dress (backwards branches), and to predict branch-not-taken for branches to subsequent addresses (forward branch-
es).
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), but may not be issued in
parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
BrCC Example
if cc jump 0xFFFFFE08 (bp) ;
/* offset is negative in 11 bits, so target address is a backwards branch, branch predicted
*/
if cc jump 0x0B4 ;
/* offset is positive, so target offset address is a forwards branch, branch not predicted
*/
if !cc jump 0xFFFFFC22 (bp) ;
/* negative offset in 11 bits, so target address is a backwards branch, branch predicted */
if !cc jump 0x120 ;
/* positive offset, so target address is a forwards branch, branch not predicted */
if cc jump dest_label ;
/* assembler resolved target, abstract offsets */
Jump (Jump)
General Form
Abstract
The Jump instruction forces a new value into the Program Counter (PC) to change program flow.
See Also (Conditional Jump Immediate (BrCC), Jump Immediate (JumpAbs))
Jump Description
The jump pointer instruction forces a new value into the Program Counter (PC) to change program flow.
The new address may be indirect (provided by a pointer register) or may be indexed (PC plus an offset provided by a
pointer register). In the indirect and indexed versions of the instruction, the value in the pointer register (Preg) must
be an even number (bit 0 of the register =0) to maintain 16-bit address alignment. Otherwise, an odd offset in the
pointer register causes the processor to generate an address alignment exception.
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), but may not be issued in
parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
Jump Example
jump (p5) ;
/* P5 contains the absolute address of the target */
jump (pc + p2) ;
/* P2 relative absolute address of the target and then a presentation of the absolute
values for target */
Abstract
The Jump instruction forces a new value into the Program Counter (PC) to change program flow.
See Also (Conditional Jump Immediate (BrCC), Jump (Jump))
JumpAbs Description
The jump absolute instruction forces a new value into the Program Counter (PC) to change program flow.
The new address may be a label (a program label that provides a signed, even, PC-relative offset) or may be an
immediate value (provides a signed, even, PC-relative offset). In the jump label versions of the instruction, the
instruction may be mapped to the smallest of jump.s, jump.l, or jump.a. In the jump immediate versions of
the instruction, the instruction should not be mapped to jump.a due to potential ambiguity of the offset (relative
versus absolute).
This instruction encodes as a 16-bit instruction or 32-bit instruction, depending on the size of the offset value. This
instruction may not be issued in parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
JumpAbs Example
jump get_new_sample ;
/* assembler resolved target, abstract offsets */
jump 0x224 ;
/* offset is positive in 13 bits, so target address is PC + 0x224, a forward jump */
jump.s 0x224 ;
/* same as above with jump “short” syntax */
jump.l 0xFFFACE86 ;
/* offset is negative in 25 bits, so target address is PC + 0x1FA CE86, a backwards jump */
Call (Call)
General Form
Abstract
The Call instruction branches to the address specified and then updates the RETS register with the address of the
instruction directly following the Call instruction.
See Also (Return from Branch (Return))
Call Description
The CALL instruction calls a subroutine from an address that may be indirect (provided by a pointer register), may
be indexed (PC plus an offset provided by a pointer register), may be a label (a program label that provides a sign-
ed, even, PC-relative offset), or may be an immediate value (provides a signed, even, PC-relative offset). In the indi-
rect and indexed versions of the instruction, the value in the pointer register (Preg) must be an even number (bit 0 of
the register =0) to maintain 16-bit address alignment. Otherwise, an odd offset in the pointer register causes the
processor to generate an address alignment exception.
After the CALL instruction executes and execution of the subroutine is completed, the program sequencer resumes
program execution at the instruction address pointed to by the RETS register. The address write to the RETS register
occurs when the CALL instruction is committed. Even when used as the last instruction of a loop, the CALL instruc-
tion functions correctly. If the CALL were placed at a loop end, the RETS register contains the loop top address.
This instruction encodes as a 16-bit instruction or 32-bit instruction, depending on the size of the offset value. This
instruction may not be issued in parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
Call Example
call ( p5 ) ;
call ( pc + p2 ) ;
call 0x123456 ;
call get_next_sample ;
Abstract
Each of these instructions branch to the address specified in their return registers. The interrupt return instructions
will also clear their interrupts corresponding bit in the IPEND register.
See Also (Call (Call))
Return Description
The return instruction forces a return from a subroutine, maskable interrupt or NMI routine, exception routine, or
emulation routine. The Types of Return Instructions table provides a description of the operations provided by each
type of return. Note that the interrupt return instructions also clear their interrupt's corresponding bit in the IPEND
register.
RTI Forces a return from an interrupt routine by loading the value of the RETI register into the PC.
When an interrupt is generated, the processor enters a non-interruptible state. Saving RETI to the
stack re-enables interrupt detection so that subsequent, higher priority interrupts can be serviced (or
nested) during the current interrupt service routine. If RETI is not saved to the stack, higher priority
interrupts are recognized but not serviced until the current interrupt service routine concludes. Re-
storing RETI back off the stack at the conclusion of the interrupt service routine masks subsequent
interrupts until the RTI instruction executes. In any case, RETI is protected against inadvertent cor-
ruption by higher priority interrupts.
RTX Forces a return from an exception routine by loading the value of the RETX register into the PC.
RTN Forces a return from a non-maskable interrupt (NMI) routine by load- ing the value of the RETN
register into the PC.
Mnemonic Description
RTE Forces a return from an emulation routine and emulation mode by load- ing the value of the RETE
register into the PC. Because only one emulation routine can run at a time, nesting is not an issue,
and saving the value of the RETE register is unnecessary.
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), but may not be issued in
parallel with other instructions.
The Required Modes for Return Instructions table identifies the modes required for each return instruction.
Return Example
rts ;
rti ;
rtx ;
rtn ;
rte ;
lsetup (uimm4s2o4 Register Type, uimm10s2o4 Register Type) LC = PREG Register Type *2
lsetup (uimm4s2o4 Register Type, uimm10s2o4 Register Type) LC = PREG Register Type >>1 *3
*1 Provides encoding for: LOOP loop_name LC0 ; LOOP_BEGIN loop_name ; LOOP_END loop_name ;
*2 Provides encoding for: LOOP loop_name LC0 = Preg ; LOOP_BEGIN loop_name ; LOOP_END loop_name ;
*3 Provides encoding for: LOOP loop_name LC0 = Preg >> 1 ; LOOP_BEGIN loop_name ; LOOP_END loop_name ;
Abstract
The zero-overhead loop set up instruction provides a flexible, count-based, hardware loop mechanism, implement-
ing efficient, zero-overhead software loops. The term "zero-overhead" means the software does not incur a perform-
ance or code size penalty by decrementing the loop counter, evaluating a loop condition, calculating the target ad-
dress, and branching to the address.
See Also (none)
LoopSetup Description
The zero-overhead loop setup instruction provides a flexible, counter- based, hardware loop mechanism that pro-
vides efficient, zero-overhead software loops. In this context, zero-overhead means that the software in the loops does
not incur a performance or code size penalty by decrementing a counter, evaluating a loop condition, then calculat-
ing and branching to a new target address.
NOTE: When the Begin_Loop address is the next sequential address after the LSETUP instruction, the loop has
zero overhead. If the Begin_Loop address is not the next sequential address after the LSETUP instruc-
tion, there is some overhead that is incurred on loop entry only.
The architecture includes two sets of three registers each to support two independent, nestable loops. The registers
are Loop_Top (LTx), Loop_Bottom (LBx) and Loop_Count (LCx). The LT0, LB0, and LC0 registers describe
Loop0, and the LT1, LB1, and LC1 registers describe Loop1.
The LOOP and LSETUP instructions permit initializing all three registers using a single instruction. The size of the
LOOP and LSETUP instructions only supports a finite number of bits, so the loop range is limited. However, LT0
and LT1, LB0 and LB1 and LC0 and LC1 can be initialized manually using move instructions if loop length and
repetition count need to be beyond the limits supported by the LOOP and LSETUP syntax. A single loop (initialized
using this method) can span the entire 4G bytes of memory space.
NOTE: When initializing LT0 and LT1, LB0 and LB1, and LC0 and LC1 manually, make sure that Loop_Top
(LTx) and Loop_Bottom (LBx) are configured before setting Loop_Count (LCx) to the desired loop
count value.
The instruction syntax supports an optional initialization value from a pointer register (Preg) or pointer register div-
ided by 2.
NOTE: The LOOP, LOOP_BEGIN, LOOP_END legacy syntax from previous Blackfin processors is supported by the
Blackfin+ processor assembler. The legacy syntax is encoded as LSETUP syntax, which contains the same
information in a more compact form.
If LCx is nonzero when the fetch address equals LBx, the processor decrements LCx and places the address in LTx
into the PC. The loop always executes once through because Loop_Count is evaluated at the end of the loop.
There are two special cases for small loop count values. A value of 0 in Loop_Count causes the hardware loop mech-
anism to neither decrement or loopback, causing the instructions enclosed by the loop pointers to be executed as
straight-line code. A value of 1 in Loop_Count causes the hardware loop mechanism to decrement only (not loop-
back), also causing the instructions enclosed by the loop pointers to be executed as straight-line code.
In the instruction syntax, the designation of the loop counter–LC0 or LC1– determines which loop level is initial-
ized. Consequently, to initialize Loop0, code LC0; to initialize Loop1, code LC1.
In the case of nested loops that end on the same instruction, the processor requires Loop0 to describe the outer loop
and Loop1 to describe the inner loop. The user is responsible for meeting this requirement.
For example, if LB0=LB1, then the processor assumes loop 1 is the inner loop and loop 0 the outer loop.
Just like entries in any other register, loop register entries can be saved and restored. If nesting beyond two loop
levels is required, the user can explicitly save the outermost loop register values, re-use the registers for an inner loop,
and then restore the outermost loop values before terminating the inner loop. In such a case, remember that loop 0
must always be outside of loop 1. Alternately, the user can implement the outermost loop in software with the Con-
ditional Jump structure.
Begin_Loop, the value loaded into LTx, is a 5-bit, PC-relative, even offset from the current instruction to the first
instruction in the loop. The user is required to preserve half-word alignment by maintaining even values in this reg-
ister. The offset is interpreted as a one’s-complement, unsigned number, eliminating backwards loops.
End_Loop, the value loaded into LBx, is an 11-bit, unsigned, even, PC-relative offset from the current instruction
to the last instruction of the loop. When using the LSETUP instruction, Begin_Loop and End_Loop are typically
address labels. The linker replaces the labels with offset values.
A loop counter register (LC0 or LC1) counts the trips through the loop. The register contains a 32-bit unsigned
value, supporting as many as 4,294,967,294 trips through the loop. The loop is disabled (subsequent executions of
the loop code pass through without reiterating) when the loop counter equals 0.
If no LoopStartLabel is specified then the loop start is implied to be the instruction following the LSETUP
instruction.
The Z suffix (LSETUPZ), means that the entire loop will be skipped if the count starts at zero. The LEZ suffix
(LSETUPLEZ) means that entire loop will be skipped if the starting count is Less than or Equal to Zero. When a
LSETUPZ instruction with a loop count of zero commits the LSBit of it's associated LT register will be set. This is
used to mark this as an lsetupz should we interrupt the loop. The end of the loop will clear the bit.
It is important to understand the following LSETUP operations and how these affect loop operations:
• If a start address is specified in the LSETUP instruction, the address is a 5-bit, PC-relative, unsigned, even offset
(4 to 30) address. If a start address is not specified in the LSETUP instruction, the address used is the address of
the instruction following the LSETUP instruction. The absolute start address is computed and stored in LT0 or
LT1 register on LSETUP instruction commit.
• The end address is an 11-bit, PC-relative, unsigned, even offset (4 through 2046) address. The absolute end
address is computed and stored in the LB0 or LB1 register on LSETUP instruction commit.
• The values in the loop counter 0 (LC0) and loop counter 1 (LC1) registers are treated as 32-bit unsigned values,
except in the LSETUPLEZ version of the instruction. For LSETUPLEZ, the loop count value is treated as a sign-
ed value. The value in the LC0 or LC1 register decrements each time a loop bottom instruction is executed,
until the count reaches 0. When executing a loop end instruction, when the value in LC0 or LC1 is not 0 or 1,
a loop back operation occurs.
• A loop is disabled when the its loop count (LC0 or LC1) equals 0.
• The sequencer treats a constant loop count of -1 as special and loads the counter with the value 0xffffffff.
LoopSetup Example
/* examples for three-part loop setup ... */
/* LOOP loop_name loop_counter */
/* LOOP_BEGIN loop_name */
/* LOOP_END loop_name */
lsetup ( 4, 4 ) lc0 ;
lsetup ( poll_bit, end_poll_bit ) lc0 ;
lsetup ( 4, 6 ) lc1 ;
lsetup ( FIR_filter, bottom_of_FIR_filter ) lc1 ;
lsetup ( 4, 8 ) lc0 = p1 ;
lsetup ( 4, 8 ) lc0 = p1>>1 ;
Abstract
This instruction moves CC to a 32-bit D Register. The register will either be 1 on 0.
CCToDreg Description
The move CC to data register instruction moves either the the status of the control code (CC) bit or moves the
negated status of the CC bit to a data register.
When copying the CC bit into a 32-bit register, the operation moves the CC bit into the least significant bit of the
register, zero-extended to 32 bits. The two cases are as follows.
• If CC = 0, the data register becomes 0x00000000.
• If CC = 1, the data register becomes 0x00000001.
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), but may not be issued in
parallel with other instructions.
CCToDreg Example
r0 = cc ;
r1 =! cc ;
Abstract
This instruction moves CC to another ASTAT bit. It is illegal to use the CC bit as source and destination in the
same instruction, i.e., CC=CC or CC&=CC.
See Also (Move Status to CC (MvToCC), Move Status to CC (MvToCC_STAT))
CCToStat16 Description
The move CC to arithmetic status register instruction sets or clears status bits based on the logic operations and the
status of the control code (CC) bit.
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), but may not be issued in
parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
CCToStat16 Example
az = cc ; /* status bit equals cc */
an |= cc ; /* status bit equals status bit OR cc */
ac0 &= cc ; /* status bit equals status bit AND cc */
av0 ^= cc ; /* status bit equals status bit XOR cc */
Abstract
This instruction moves a status bit or LSB of a register to CC. It is illegal to use the CC bit as source and destination
in the same instruction (for example, CC=CC or CC&=CC are illegal).
See Also (Move CC To/From ASTAT (CCToStat16), Move Status to CC (MvToCC_STAT))
MvToCC Description
The move data register to CC instruction either moves an OR of all bits in the data register or moves the negated
state of the control code (CC) bit to the CC bit. When copying a data register to the CC bit, the operation sets the
CC bit to 1 if any bit in the source data register is set; that is, if the register is nonzero. Otherwise, the operation
clears the CC bit.
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), but may not be issued in
parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
MvToCC Example
cc = r4 ;
cc = !cc ;
Abstract
This instruction moves a status bit or LSB of a register to CC. It is illegal to use the CC bit as source and destination
in the same instruction (for example, CC=CC or CC&=CC are illegal).
See Also (Move CC To/From ASTAT (CCToStat16), Move Status to CC (MvToCC))
MvToCC_STAT Description
The move status bit to CC instruction sets or clears the control code (CC) bit based on the logic operations and the
status of the status bits.
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), but may not be issued in
parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
MvToCC_STAT Example
cc = av1 ; /* cc equals status bit */
cc |= aq ; /* cc equals cc OR status bit */
cc &= an ; /* cc equals cc AND status bit */
cc ^= ac1 ; /* cc equals cc XOR status bit */
Abstract
This instruction compares two pointer registers.
See Also (32-Bit Register Compare and Set CC (CompRegisters), Accumulator Compare and Set CC (CompAccu-
mulators))
CCFlagP Description
The compare pointer and move CC instruction sets or clears the control code (CC) bit based on a comparison of
two values. The input operands are pointer registers (Preg).
The compare operations are nondestructive on the input operands and affect only the CC bit and the status bits.
The value of the CC bit determines all subsequent conditional branching.
The various forms of the compare pointer instruction perform 32-bit signed compare operations on the input oper-
ands or an unsigned compare operation (if the (IU) optional mode is appended). The compare operations perform
a subtraction and discard the result of the subtraction without affecting user registers. The compare operation that
you specify determines the value of the CC bit.
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), but may not be issued in
parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
CCFlagP Example
cc = p3 == p2 ; /* equal, register, signed */
cc = p0 == 1 ; /* equal, immediate, signed */
cc = p0 < p3 ; /* less than, register, signed */
cc = p2 < -4 ; /* less than, immediate, signed */
cc = p1 <= p0 ; /* less than or equal, register, signed */
cc = p4 <= 3 ; /* less than or equal, immediate, signed */
cc = p5 < p3 (iu) ; /* less than, register, unsigned */
cc = p1 < 0x7 (iu) ; /* less than, immediate, unsigned */
cc = p2 <= p0 (iu) ; /* less than or equal, register, unsigned */
cc = p3 <= 2 (iu) ; /* less than or equal, immediate unsigned */
Abstract
This instruction compares the two accumulators ands sets CC.
See Also (32-Bit Register Compare and Set CC (CompRegisters), 32-Bit Pointer Register Compare and Set CC
(CCFlagP))
CompAccumulators Description
The Compare Accumulator instruction sets the Control Code (CC) bit based on a comparison of two values. The
input operands are Accumulators.
These instructions perform 40-bit signed compare operations on the Accumulators. The compare operations per-
form the subtraction A0–A1 and discard the result of the subtraction without affecting user registers. The compare
operation that you specify determines the value of the CC bit.
No unsigned compare operations or immediate compare operations are performed for the Accumulators. The com-
pare operations are nondestructive on the input operands, and affect only the CC bit and the status bits. All
subsequent conditional branching is based on the value of the CC bit.
The Compare Accumulator instruction uses the values shown in the Compare Accumulator Instruction Values table
in compare operations after the A0–A1 subtraction is performed.
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), but may not be issued in
parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
CompAccumulators Example
cc = a0 == a1 ; /* equal, signed */
cc = a0 < a1 ; /* less than, accumulator, signed */
cc = a0 <= a1 ; /* less than or equal, accumulator, signed */
Abstract
This instruction compares two 32-bit registers and sets CC or sets CC if a register is non-zero.
See Also (Accumulator Compare and Set CC (CompAccumulators), 32-Bit Pointer Register Compare and Set CC
(CCFlagP))
CompRegisters Description
The Compare Data Register instruction sets the Control Code (CC) bit based on a comparison of two values. The
input operands are D-registers.
The compare operations are nondestructive on the input operands and affect only the CC bit and the status bits.
The value of the CC bit determines all subsequent conditional branching.
The various forms of the Compare Data Register instruction perform 32-bit signed compare operations on the input
operands or an unsigned compare operation, if the (IU) optional mode is appended. The compare operations per-
form a subtraction and discard the result of the subtraction without affecting user registers. The compare operation
that you specify determines the value of the CC bit.
The Compare Data Register instruction uses the values shown in the Compare Data Register Values table in signed
and unsigned compare operations.
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), but may not be issued in
parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
CompRegisters Example
cc = r3 == r2 ; /* equal, register, signed */
cc = r7 == 1 ; /* equal, immediate, signed */
/* If r0 = 0x8FFF FFFF and r3 = 0x0000 0001, then the signed operation . . . */
cc = r0 < r3 ; /* less than, register, signed */
/* . . . produces cc = 1, because r0 is treated as a negative value */
cc = r2 < -4 ; /* less than, immediate, signed */
cc = r6 <= r1 ; /* less than or equal, register, signed */
cc = r4 <= 3 ; /* less than or equal, immediate, signed */
/* If r0 = 0x8FFF FFFF and r3 = 0x0000 0001,then the unsigned operation . . . */
cc = r0 < r3 (iu) ; /* less than, register, unsigned */
/* . . . produces CC = 0, because r0 is treated as a large unsigned value */
cc = r1 < 0x7 (iu) ; /* less than, immediate, unsigned */
cc = r2 <= r0 (iu) ; /* less than or equal, register, unsigned (a) */
cc = r3 <= 2 (iu) ; /* less than or equal, immediate unsigned (a) */
Abstract
The CLI instruction disables or clears general interrupts, and the STI instruction enables interrupts.
IMaskMv Description
The enable interrupts instruction (sti) globally enables interrupts by restoring the previous state of the interrupt
system from a data register into the IMASK register.
The disable interrupts instruction (cli) globally disables general interrupts by clearing the IMASK register to all ze-
ros. In addition, the instruction copies the previous contents of IMASK into a user-specified register in order to save
the state of the interrupt system. The disable interrupts instruction does not mask NMI, reset, exceptions, and emu-
lation.
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), but may not be issued in
parallel with other instructions.
The enable interrupts and disable interrupts instructions executes only in Supervisor mode. If execution is attempted
in User mode, the instruction produces an Illegal Use of Protected Resource exception.
These instructions have some special applications. The clear interrupts instruction is often issued immediately before
an idle instruction, so it stores the interrupt state before entering the idle state. The enable interrupts instruction is
often located after an idle instruction, so it executes after a wake-up event from the idle state.
IMaskMv Example
sti r3 ; /* previous state of IMASK restored from Dreg */
cli r3 ; /* previous state of IMASK moved to Dreg (a) */
Abstract
The SEI instruction vectors to a fixed location in security firmware. The TRAP instruction raises interrupt 15 to
notify the operating system that the user code needs a system service. The EMUEXCPT instruction allows processor
to enter emulation mode.
Mode Description
The force emulation instruction forces an emulation exception, allowing the processor to enter emulation mode.
When emulation is enabled, the processor immediately takes an exception into emulation mode. When emulation is
disabled, EMUEXCPT behaves the same as a NOP instruction. The emulation exception is the highest priority event in
processor.
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), but may not be issued in
parallel with other instructions.
Mode Example
emuexcpt ;
Abstract
The EXCPT instruction forces the specified exception (range 0 through 15).
Raise Description
The force interrupt / reset / exception instruction forces a specified interrupt or reset or exception to occur. Typical-
ly, it is a software method of invoking a hardware event for debug purposes.
When the RAISE instruction is issued, the processor sets a bit in the ILAT register corresponding to the interrupt
vector specified by the uimm4 constant in the instruction. The interrupt executes when its priority is high enough to
be recognized by the processor. The RAISE instruction causes these events to occur given the uimm4 arguments
shown in the uimm4 Arguments and Events table.
When the EXCPT instruction is issued, the sequencer vectors to the exception handler that the user provides. Appli-
cation-level code uses the force exception instruction for operating system calls. The instruction does not set the
EVSW bit (bit 3) of the ILAT register.
uimm4 Event
8 IVG8
9 IVG9
10 IVG10
11 IVG11
12 IVG12
13 IVG13
14 IVG14
15 IVG15
The RAISE instruction cannot invoke exception (EXC) or emulation (EMU) events. Use the EXCPT and
EMUEXCPT instructions, respectively, for those events.
The RAISE instruction does not take effect before the write-back stage in the pipeline.
This 16-bit instruction takes up less memory space (over a 32-bit encoded instruction), but may not be issued in
parallel with other instructions.
The force interrupt / reset / exception instruction executes only in Supervisor mode. If execution is attempted in User
mode, the force interrupt / reset instruction produces an Illegal Use of Protected Resource exception.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Raise Example
raise 1 ; /* Invoke RST */
raise 6 ; /* Invoke IVTMR timer interrupt */
excpt 4 ;
Stack Operations
These operations provide memory stack management operations:
• Linkage (Linkage)
Linkage (Linkage)
General Form
Abstract
The linkage instruction controls the stack frame space on the stack and the Frame Pointer (FP) for that space. LINK
allocates the space and UNLINK de-allocates the space.
Linkage Description
The linkage instruction controls the stack frame space on the stack and the frame pointer (FP) for that space. LINK
allocates the space and UNLINK de-allocates the space.
LINK saves the current RETS and FP registers to the stack, loads the FP register with the new frame address, then
decrements the stack pointer (SP) by the user-supplied frame size value.
Typical applications follow the LINK instruction with a push multiple instruction to save pointer and data registers
to the stack.
The user-supplied argument for LINK determines the size of the allocated stack frame. LINK always saves RETS and
FP on the stack, so the minimum frame size is 2 words when the argument is zero. The maximum stack frame size is
218 + 8 = 262152 bytes in 4-byte increments.
UNLINK performs the reciprocal of LINK, de-allocating the frame space by moving the current value of FP into SP
and restoring previous values into FP and RETS from the stack.
The UNLINK instruction typically follows a pop multiple instruction that restores pointer and data registers previ-
ously saved to the stack.
The frame values remain on the stack until a subsequent push, push multiple or LINK operation overwrites them.
To preserve stack integrity, the FP must not be modified by user code between LINK and UNLINK execution.
Neither LINK nor UNLINK may be interrupted. Exceptions that occur while either of these instructions are execut-
ing cause the instruction to abort. For example, a load operation or a store operation might cause a protection viola-
tion while LINK is executing. In that case, SP and FP are reset to their original values prior to the execution of this
instruction. This measure ensures that the instruction can be restarted after the exception.
Note that when a LINK operation aborts due to an exception, the stack memory may already be changed due to
stores that have already completed before the exception. Similarly, an aborted UNLINK operation may leave the FP
and RETS registers changed because of a load that has already completed before the interruption.
The series of illustrations show how the stack contents change. After executing a LINK instruction, the stack con-
tains (for example) the contents shown in the Stack After Link Executes figure.
higher memory
...
... AFTER LINK EXECUTES
Saved RETS
Prior FP <- FP
Allocated
words for local
subroutine
variables <- SP = FP +– frame_size
...
lower memory
lower memory
Linkage Example
link 8 ; /* establish frame with 8 words allocated for local variables */
[ -- sp ] = (r7:0, p5:0) ; /* save D- and P-registers */
(r7:0, p5:0) = [ sp ++ ] ; /* restore D- and P-registers */
unlink ; /* close the frame* /
Abstract
Thisp instruction loads the contents of the stack indexed by the current stack pointer into a specified register.
See Also (Stack Push (Push), Stack Push/Pop Multiple Registers (PushPopMul16))
Pop Description
The pop instruction loads the contents of the stack—indexed by the current stack pointer (SP)—into a specified
register. The instruction post-increments the stack pointer to the next occupied location in the stack before conclud-
ing.
The stack grows down from high memory to low memory, therefore the decrement operation is used for pushing,
and the increment operation is used for popping values. The stack pointer always points to the last used location.
When a pop operation is issued, the value pointed to by the stack pointer is transferred and the SP is replaced by
SP + 4.
The following series of illustrations show what the stack would look like when a pop such as R3 = [ SP ++ ]
occurs.
higher memory
Word0
Word1 BEGINNING STATE
Word2 <------- SP
...
lower memory
higher memory
Word0
Word1 LOAD REGISTER R3 FROM STACK
Word2 <------ SP ========> R3 = Word2
...
lower memory
lower memory
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Pop Example
r0 = [sp++] ; /* Load Data Register instruction */
p4 = [sp++] ; /* Load Pointer Register instruction */
i1 = [sp++] ; /* Pop instruction */
reti = [sp++] ; /* Pop instruction; supervisor mode required */
Abstract
This instruction stores the contents of a specified register in the stack.
See Also (Stack Pop (Pop), Stack Push/Pop Multiple Registers (PushPopMul16))
Push Description
The push instruction stores the contents of a specified register in the stack. The instruction pre-decrements the stack
pointer (SP) to the next available location in the stack first. Push and push multiple are the only instructions that
perform pre-modify functions.
The stack grows down from high memory to low memory. Consequently, the decrement operation is used for push-
ing, and the increment operation is used for popping values. The stack pointer always points to the last used loca-
tion. Therefore, the effective address of the push is SP – 4.
The following illustration shows what the stack would look like when a series of pushes occur.
higher memory
P5 [--sp]=p5 ;
P1 [--sp]=p1 ;
R3 <-------- SP [--sp]=r3 ;
...
lower memory
This instruction may be used in either User or Supervisor mode for most cases, but explicit access to USP, SEQSTAT,
SYSCFG, RETI, RETX, RETN, RETE, and EMUDAT requires Supervisor mode. A protection violation exception results
if any of these registers are explicitly accessed from User mode.
Push Example
[ -- sp ] = r0 ;
[ -- sp ] = r1 ;
[ -- sp ] = p0 ;
[ -- sp ] = i0 ;
Abstract
This instruction pushes or pops the contents of multiple data and/or pointer registers to or from the stack.
See Also (Stack Pop (Pop), Stack Push (Push))
PushPopMul16 Description
The push multiple instruction saves the contents of multiple data and/or pointer registers to the stack, and the pop
multiple instruction restores the contents of multiple data and/or pointer registers from the stack. The range of reg-
isters to be pushed (saved) or popped (restored) always includes the highest index data register (R7) and/or highest
index pointer register (P5) plus any contiguous lower index registers specified by the user down to and including R0
and/or P0.
NOTE: Push and Push Multiple are the only instructions that perform pre-modify functions.
The stack grows down from high memory to low memory, therefore the decrement operation is the same used
for pushing, and the increment operation is used for popping values. The stack pointer always points to the
last used location, making the effective address of the push is SP – 4.
The Stack Following a Push Multiple illustration shows what the stack would look like when a push multiple
occurs.
higher memory
P3 [--sp]=(p5:3) ;
P4
P5 <-------- SP
...
lower memory
higher memory
Word0
Word1
Word2 BEGINNING STATE
Word3 <------ SP
...
lower memory
R3
R4
R6 LOAD REGISTER R7 FROM STACK
R7 <------ SP ========> R7 = Word3
...
lower memory
R4
R5 LOAD REGISTER R6 FROM STACK
R6 <------ SP ========> R6 = Word2
R7
...
lower memory
lower memory
The intended usage for the pop multiple instruction is to recover register values that were previously pushed
onto the stack. The user must exercise programming discipline to restore the stack values back to their intend-
ed registers from the first-in, last-out structure of the stack. Pop exactly the same registers that were pushed
onto the stack, but pop them in the opposite order.
PushPopMul16 Example
/* push multiple examples */
[ -- sp ] = (r7:5, p5:0) ; /* D-registers R4:0 excluded */
[ -- sp ] = (r7:5, p5:0) ; /* D-registers R4:0 excluded */
[ -- sp ] = (r7:2) ; /* R1:0 excluded */
[ -- sp ] = (p5:4) ; /* P3:0 excluded */
Synchronization Operations
These operations provide processor synchronization operations:
• Cache Control (CacheCtrl)
• Sync (Sync)
• SyncExcl (SyncExcl)
• NOP (NOP)
• 32-Bit No Operation (NOP32)
• TestSet (TestSet)
Abstract
These instructions provide the ability to manipulate the caches. The prefetch causes the data cache to prefetch the
cache line associated with the effective address provided as the contents of the p-register.
See Also (Sync (Sync))
CacheCtrl Description
These instructions provide the ability to manipulate the cachesa;
• prefetch causes the data cache to prefetch the cache line associated with the effective address provided as the
contents of the p-register.
• flushinv causes the data cache to invalidate a particular line in the cache.
• flush causes the a line of data in the cache to be syncronized with higher levels of memory.
• iflush causes the instruction cache to invalidate a particular line in the cache.
cache and if the address is cacheable (that is, if bit CPLB_L1_CHBL = 1). If the line is already in the cache or
if the cache is already fetching a line, the prefetch instruction performs no action, like a NOP.
This instruction may generate CPLB exceptions. For example, exception 0x26 can be generated upon execu-
tion of the PREFETCH[P0] instruction if P0 points to an invalid memory location. However, external memo-
ry will not be accessed when any of these exceptions are generated.
The instruction can post-increment the line pointer by the cache line size.
CacheCtrl Example
prefetch [ p2 ] ;
prefetch [ p0 ++ ] ;
flushinv [ p2 ] ;
flushinv [ p0 ++ ] ;
flush [ p2 ] ;
flush [ p0 ++ ] ;
iflush [ p2 ] ;
iflush [ p0 ++ ] ;
Sync (Sync)
General Form
Abstract
The instructions are DSYNC (Data Sync), SSYNC (System Sync), CSYNC (Core Sync), IDLE, and STI IDLE.
See Also (Cache Control (CacheCtrl))
Sync Description
The sync instructions (CSYNC, SSYNC, DSYNC, IDLE, and STI IDLE) provide the means to synchronize core, sys-
tem, and data operations across all clock domains of the processor.
NOTE: Blackfin+ processors (unlike previous on previous Blackfin processors) an IDLE instruction is
not required immediately following an SSYNC instruction.
An SSYNC instruction should be used when ordering is required between a memory write and a memory read.
For more information about these operations, see the memory or pointer instructions.
When strict ordering of instruction execution is required, by design, the Blackfin+ processor architecture al-
lows reads to take priority over writes when there are no dependencies between the address that are accessed.
In general, this execution order allows for increased performance. However, when an asynchronous memory
device is mapped to a Blackfin+ processor, it is sometimes necessary to ensure the write occurs before the read.
But, the Blackfin+ processor re-orders loads over stores if there is not a data dependency. In this case, an
SSYNC between the write and read will ensure proper ordering is preserved.
NOTE: Blackfin+ processors (unlike previous on previous Blackfin processors) an IDLE instruction is
not required immediately following an SSYNC instruction.
Sync Example
Example code sequence for IDLE
idle ;
Example code sequence for CSYNC---In this example, the CSYNC instruction ensures that the load instruction is not
executed speculatively. CSYNC ensures that the conditional branch is resolved and any entries in the processor store
buffer have been flushed. In addition, all speculative states or exceptions complete processing before CSYNC com-
pletes.
if cc jump away_from_here ;
/* produces speculative branch prediction */
csync ;
r0 = [p0] ; /* load */
Example code sequence for SSYNC---In this example, SSYNC ensures that the load instruction will not be executed
speculatively. The instruction ensures that the conditional branch is resolved and any entries in the processor store
buffer and write buffer have been flushed. In addition, all exceptions complete processing before SSYNC completes.
ADSP-BF7xx Blackfin+ Processor 8–159
Synchronization Operations
if cc jump away_from_here ;
/* produces speculative branch prediction */
ssync ;
r0 = [p0] ; /* load */
Example code sequence for DSYNC---In this example, DSYNC ensures that the load instruction will not be executed
speculatively. The instruction ensures that the conditional branch is resolved and any entries in the processor store
buffer and write buffer have been flushed. In addition, all exceptions complete processing before DSYNC completes.
if cc jump away_from_here ;
/* produces speculative branch prediction */
dsync ;
r0 = [p0] ; /* load */
SyncExcl (SyncExcl)
General Form
Abstract
This instruction synchronizes the processor state with the exclusive state, capturing any pending write response and
releasing exclusive memory access to a memory location.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
NOP (NOP)
General Form
Abstract
This instruction increments the PC (and does nothing else).
See Also (32-Bit No Operation (NOP32))
NOP Description
The No Op instruction increments the PC and does nothing else.
Typically, the No Op instruction allows previous instructions time to complete before continuing with subsequent
instructions. Other uses are to produce specific delays in timing loops or to act as hardware event timers and rate
generators when no timers and rate generators are available.
This 16-bit instruction may be issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
NOP Example
nop ;
Abstract
This instruction increments the PC (and does nothing else).
See Also (NOP (NOP))
NOP32 Description
The No Op instruction increments the PC and does nothing else.
Typically, the No Op instruction allows previous instructions time to complete before continuing with subsequent
instructions. Other uses are to produce specific delays in timing loops or to act as hardware event timers and rate
generators when no timers and rate generators are available.
MNOP can be used to issue loads or store instructions in parallel without invoking a 32-bit MAC or ALU opera-
tion.
This 32-bit instruction may be issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
NOP32 Example
mnop ;
mnop || /* a 16-bit instr. */ || /* a 16-bit instr. */ ;
TestSet (TestSet)
General Form
Abstract
This instruction loads a byte, tests whether it is zero, then sets the most significant bit of the byte in memory. CC is
set if the byte is originally zero, and cleared if the byte is originally nonzero. The sequence of memory transactions
are atomic.
TestSet Description
The testset instruction is an atomic operation. (This sequence may be aborted by an interrupt, but will restart
from the beginning upon return from interrupt. A byte protected in this manner may be used as a semaphore.) This
instruction is primarily provided for backward compatability and it is recomended to use exclusive load and store
instructions which make more efficient use of system resources (if that is possible). The testset instruction reads
an indirectly addressed memory byte, tests whether it is zero, and then writes the byte back to memory with the
most significant bit (MSB) set, all as one indivisible operation. If the byte is originally zero, the instruction sets the
CC bit. If the byte is originally nonzero, the instruction clears the CC bit.
The TESTSET instruction is never executed speculatively. It is supported by bus-locked memory transactions on the
system bus, so no other user of the bus, such as another core, can access memory between the test and set portions of
this instruction. The TESTSET instruction can be interrupted by the core. If this happens the system bus is released
and the TESTSET instruction is executed again upon return from the interrupt.
TESTSET should not be used in L1 SRAM or cacheable memory as its behavior in these regions varies between
different derivatives. TESTSET must not be used with MMRs, I/O Device space or extended data access addresses.
This 16-bit instruction may not be issued in parallel with certain other 16-bit instructions.
This instruction may be used in either User or Supervisor mode.
TestSet Example
testset (P3) ; /* test and set byte addressed by P3 */
SP
I3 L3 B3 M3 FP
I2 L2 B2 M2 P5
I1 L1 B1 M1 DAG1 P4
I0 L0 B0 M0 P3
DAG0
P2
DA1 32
P1
DA0 32
P0
TO MEMORY
32 32
RAB PREG
SD 32
LD1 32 32 ASTAT
LD0 32
32
SEQUENCER
R7.H R7.L
R6.H R6.L
R5.H R5.L ALIGN
16 32 16
R4.H R4.L
8 8 8 8
R3.H R3.L
R2.H R2.L DECODE
R1.H R1.L BARREL
R0.H R0.L 40 SHIFTER 40 LOOP BUFFER
72
40 A0 A1 40 CONTROL
UNIT
32 32
Abstract
This instruction loads the accumulator register with the immediate value 0 (initializes the result register).
See Also (32-Bit Accumulator Register (.x) Initialization (LdImmToAxX), 32-Bit Accumulator Register (.w) Initiali-
zation (LdImmToAxW))
LdImmToAx Description
The load immediate to accumulator instruction loads an immediate value (0) into an accumulator register. This op-
eration initializes (or clears) the accumulator register.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
LdImmToAx Example
a0 = 0 ;
a1 = 0 ;
Abstract
This instruction initializes the lower 32-bits (.w) of the accumulator register from a 32-bit immediate value.
See Also (32-Bit Accumulator Register (.x) Initialization (LdImmToAxX), Accumulator Register Initialization
(LdImmToAx))
LdImmToAxW Description
The load immediate to accumulator 32-bit section instruction loads a immediate value, or explicit constant, into the
the A0.w or A1.w register.
The instruction loads the 32-bit accumulator section from a 32-bit quantity, depending on the size of the immediate
data.
The load operation uses the 32 bits of the input immediate value and leaves the unspecified portion of the accumu-
lator register intact.
This 64-bit instruction may not be issued in parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
LdImmToAxW Example
a0.w = 0x7FFFFFFF ;
a1.w = 0x80000000 ;
a0.w = MyResult ;
a1.w = MyOtherResult ;
Abstract
This instruction initializes the upper 8-bit (.x)of the accumulator register from a 32-bit immediate value.
See Also (32-Bit Accumulator Register (.w) Initialization (LdImmToAxW), Accumulator Register Initialization
(LdImmToAx))
LdImmToAxX Description
The load immediate to accumulator 8-bit section instruction loads a immediate value, or explicit constant, into the
the A0.x or A1.x register.
The instruction loads the 8-bit accumulator section from a 32-bit quantity, depending on the size of the immediate
data.
The load operation uses the least significant 8 bits of the input immediate value and leaves the unspecified portion
of the accumulator register intact.
This 64-bit instruction may not be issued in parallel with other instructions.
LdImmToAxX Example
a0.x = 0x7FFFFFFF ;
a1.x = 0x80000000 ;
a0.x = MyResult ;
a1.x = MyOtherResult ;
Abstract
This instruction loads a low-half register or a high-half register with a 16-bit immediate value.
LdImmToDregHL Description
The load immediate to high/low half register instruction loads an immediate value, or explicit constant, into a high
or low half register. The instruction loads a 16-bit quantity, depending on the size of the immediate data.
The 16-bit half-words are be loaded into either the high half or low half of a register. The load operation leaves the
unspecified half of the register intact.
Loading a 32-bit value into a register using this load immediate instruction requires two separate instructions—one
for the high and one for the low half. For example, to load the address foo into register P3, write:
p3.h = foo ;
p3.l = foo ;
The assembler automatically selects the correct half-word portion of the 32-bit literal for inclusion in the instruction
word.
This 32-bit instruction may not be issued in parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
LdImmToDregHL Example
r7.h = 63 ;
p3.l = 12 ;
i0.l = 4 ;
m2.h = 8 ;
l3.h = 0xbcde ;
Abstract
This instruction initializes a 32-bit register to an immediate value. For the smaller instructions, where the immediate
is less than 32, you can specify if you want the immediate value sign or zero extended to fill the register.
LdImmToReg Description
The load immediate to register instruction loads an immediate value, or explicit constants, into a register.
The instruction loads a 7-, 16-, or 32-bit quantity, depending on the size of the immediate data.
The zero-extended (z) versions of this instruction fill the upper bits of the destination register with zeros. The sign-
extended (x) versions of this instruction fill the upper bits with the sign of the constant value.
The instruction opcode size varies with the immediate value size as follows:
LdImmToReg Example
r7 = 63 (z) ;
p3 = 12 (z) ;
r0 = -344 (x) ;
r7 = 436 (z) ;
m2 = 0x89ab (z) ;
p1 = 0x1234 (z) ;
m3 = 0x3456 (x) ;
Abstract
This instruction loads the accumulator 0 and 1 registers (A0, A1) with the immediate value 0 (initializes both result
registers).
LdImmToAxDual Description
The dual load immediate to accumulator instruction loads an immediate value (0) into both accumulator registers.
This operation initializes (or clears) both of the accumulator registers.
This 32-bit instruction can sometimes save execution time (over a 16-bit encoded instruction) because it can be
issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
LdImmToAxDual Example
a1 = a0 = 0 ;
Load/Store (LdSt)
DREG Register Type = b[PREG Register Type++] (z)
DREG Register Type = b[PREG Register Type--] (z)
DREG Register Type = b[PREG Register Type] (z)
DREG Register Type = b[PREG Register Type++] (x)
DREG Register Type = b[PREG Register Type--] (x)
DREG Register Type = b[PREG Register Type] (x)
Long Load/Store with indexed addressing (LdStIdxI)
DREG Register Type = b[PREG Register Type + imm16reloc Register Type] (z)
DREG Register Type = b[PREG Register Type + imm16reloc Register Type] (x)
Load/Store 32-bit Absolute Address (LdStAbs)
DREG Register Type = b[uimm32 Register Type] (z)
DREG Register Type = b[uimm32 Register Type] (x)
Abstract
This instruction loads a register with an 8-bit value from memory. The value is sign or zero extended in the register.
See Also (32-Bit Load from Memory (LdM32bitToDreg), 16-Bit Load from Memory (LdM16bitToDregH), 16-Bit
Load from Memory (LdM16bitToDregL), 16-Bit Load from Memory to 32-Bit Register (LdM16bitToDreg))
LdM08bitToDreg Description
The load byte to data register instruction loads an 8-bit byte value from a memory location into a 32-bit data regis-
ter. The address of the memory location is identified with a pointer register, a pointer plus an offset, or a 32-bit
absolute address. The byte value is sign-extended (x) or zero-extended (z) to 32 bits in the destination data regis-
ter. The address used in this instruction has no restrictions for memory address alignment. This instruction supports
the following options.
• Post-increment the source pointer by 1 byte [Preg ++]
• Post-decrement the source pointer by 1 byte [Preg --]
• Offset the source pointer with a 16-bit signed constant [Preg + Offset]
The instruction opcode size varies with the address type as follows:
• Load byte to register using a pointer register for the address encodes as a 16-bit instruction.
• Load byte to register using a pointer register with 16-bit offset for the address encodes as a 32-bit instruction.
• Load byte to register using a 32-bit absolute address encodes as a 64-bit instruction.
The 16-bit load byte instructions may be issued in parallel with certain other instructions. The 32- and 64-bit load
byte instructions may not be issued in parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
LdM08bitToDreg Example
r3 = b [ p0 ] (z) ;
r7 = b [ p1 ++ ] (z) ;
r2 = b [ sp -- ] (z) ;
r0 = b [ p4 + 0xFFFF800F ] (z) ;
r3 = b [ p0 ] (x) ;
r7 = b [ p1 ++ ](x) ;
r2 = b [ sp -- ] (x) ;
r0 = b [ p4 + 0xFFFF800F ](x) ;
Abstract
This instruction loads a register with a 16-bit value from memory. The value is sign or zero extended in the register.
See Also (8-Bit Load from Memory to 32-bit Register (LdM08bitToDreg), 32-Bit Load from Memory (LdM32bit-
ToDreg), 16-Bit Load from Memory (LdM16bitToDregH), 16-Bit Load from Memory (LdM16bitToDregL))
LdM16bitToDreg Description
The load word to data register instruction loads a 16-bit value from a memory location into a 32-bit data register.
The address of the memory location is identified with a pointer register, a pointer plus an offset, or a 32-bit absolute
address. The word value is sign-extended (x) or zero-extended (z) to 32 bits in the destination data register. The
address used in this instruction is restricted to even memory address alignment (2-byte half-word address align-
ment). Failure to maintain proper alignment causes a misaligned memory access exception. This instruction sup-
ports the following options.
• Post-increment the source pointer by 2 bytes [Preg ++]
• Post-decrement the source pointer by 2 bytes [Preg --]
• Offset the source pointer with a 5-bit signed constant [Preg + SmallOffset]
• Offset the source pointer with a 16-bit signed constant [Preg + LargeOffset]
• Offset the source pointer with second pointer [Preg ++ Preg]
The syntax of the form:
Dest = w [ Src_1 ++ Src_2 ] ;
is indirect, post-increment index addressing. The form is shorthand for the following sequence.
Dest = w [Src_1] ; /* load the 32-bit destination, indirect*/
Src_1 += Src_2 ; /* post-increment Src_1 by a quantity indexed by Src_2 */
where:
• Dest is the destination register. (Dreg in the syntax example).
• Src_1 is the first source register on the right-hand side of the equation.
• Src_2 is the second source register.
Indirect and post-increment index addressing supports customized indirect address cadence. The indirect, post-in-
crement index version must have separate pointer registers for the input operands. If a common pointer is used for
the inputs, the instruction functions as a simple, non-incrementing load. For example,
r0 = w [p2 ++ p2] (z) ;
functions as:
r0 = w [p2] (z) ;
The instruction opcode size varies with the address type as follows:
• Load word to register using a pointer register for the address or using a pointer register with small offset for the
address encodes as a 16-bit instruction.
• Load word to register using a pointer register with 16-bit offset for the address or using a pointer register offset
by a second pointer for the address encodes as a 32-bit instruction.
• Load word to register using a 32-bit absolute address encodes as a 64-bit instruction.
The 16-bit load word instructions may be issued in parallel with certain other instructions. The 32- and 64-bit load
word instructions may not be issued in parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
LdM16bitToDreg Example
r3 = w [ p0 ] (z) ;
r7 = w [ p1 ++ ] (z) ;
r2 = w [ sp -- ] (z) ;
r6 = w [ p2 + 12 ] (z) ;
r0 = w [ p4 + 0x8004 ] (z) ;
r1 = w [ p0 ++ p1 ] (z) ;
r3 = w [ p0 ] (x) ;
r7 = w [ p1 ++ ] (x) ;
r2 = w [ sp -- ] (x) ;
r6 = w [ p2 + 12 ] (x) ;
r0 = w [ p4 + 0x800E ] (x) ;
r1 = w [ p0 ++ p1 ] (x) ;
Abstract
This instruction loads a high-half register with a 16-bit value from memory.
See Also (8-Bit Load from Memory to 32-bit Register (LdM08bitToDreg), 32-Bit Load from Memory (LdM32bit-
ToDreg), 16-Bit Load from Memory (LdM16bitToDregL), 16-Bit Load from Memory to 32-Bit Register
(LdM16bitToDreg))
LdM16bitToDregH Description
The load word to high-half data register instruction loads a 16-bit value from a memory location into a 16-bit high-
half data register. The operation does not affect the related low-half register. The address of the memory location is
identified with an index register, a pointer register, a pointer plus an offset, or a 32-bit absolute address. The address
used in this instruction is restricted to even memory address alignment (2-byte half-word address alignment). Failure
to maintain proper alignment causes a misaligned memory access exception. This instruction supports the following
options.
• Post-increment the source index by 2 bytes [Ireg ++]
• Post-decrement the source index by 2 bytes [Ireg --]
• Offset the source pointer with second pointer [Preg ++ Preg]
The instruction versions that explicitly modify an index register (Ireg) support optional circular buffering. See the
description of Automatic Circular Addressing in the Address Arithmetic Unit chapter for more information.
NOTE: Unless circular buffering is desired, disable it prior to issuing this instruction by clearing the length regis-
ter (Lreg) corresponding to the Ireg used in this instruction. For example, if you use I2 to increment
your address pointer, first clear L2 to disable circular buffering. Failure to explicitly clear Lreg beforehand
can result in unexpected Ireg values. The circular address buffer registers (index, length, and base) are
not initialized automatically by reset. Typically, user software clears all the circular address buffer registers
during boot-up to disable circular buffering, then initializes them later, if needed
is indirect, post-increment index addressing. The form is shorthand for the following sequence.
Dest_hi = w [Src_1] ; /* load the 16-bit destination, indirect*/
Src_1 += Src_2 ; /* post-increment Src_1 by a quantity indexed by Src_2 */
where:
• Dest_hi is the destination high-half register. (Dreg_hi in the syntax example).
• Src_1 is the first source register on the right-hand side of the equation.
• Src_2 is the second source register.
Indirect and post-increment index addressing supports customized indirect address cadence. The indirect, post-in-
crement index version must have separate pointer registers for the input operands. If a common pointer is used for
the inputs, the instruction functions as a simple, non-incrementing load. For example,
r0.h = w [p2 ++ p2] ;
functions as:
r0.h = w [p2] ;
The instruction opcode size varies with the address type as follows:
• Load word to high-half register using an index register or a pointer register for the address encodes as a 16-bit
instruction.
• Load word to high-half register using a pointer register offset by a second pointer for the address encodes as a
16-bit instruction.
• Load word to high-half register using a 32-bit absolute address encodes as a 64-bit instruction.
The 16-bit load word instructions may be issued in parallel with certain other instructions. The 64-bit load word
instructions may not be issued in parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
LdM16bitToDregH Example
r3.h = w [ i1 ] ;
r7.h = w [ i3 ++ ] ;
r1.h = w [ i0 -- ] ;
r2.h = w [ p4 ] ;
r5.h = w [ p2 ++ p0 ] ;
Abstract
This instruction loads a low-half register with a 16-bit value from memory.
See Also (8-Bit Load from Memory to 32-bit Register (LdM08bitToDreg), 32-Bit Load from Memory (LdM32bit-
ToDreg), 16-Bit Load from Memory (LdM16bitToDregH), 16-Bit Load from Memory to 32-Bit Register
(LdM16bitToDreg))
LdM16bitToDregL Description
The load word to low-half data register instruction loads a 16-bit value from a memory location into a 16-bit low-
half data register. The operation does not affect the related high-half register. The address of the memory location is
identified with an index register, a pointer register, a pointer plus an offset, or a 32-bit absolute address. The address
used in this instruction is restricted to even memory address alignment (2-byte half-word address alignment). Failure
to maintain proper alignment causes a misaligned memory access exception. This instruction supports the following
options.
• Post-increment the source index by 2 bytes [Ireg ++]
• Post-decrement the source index by 2 bytes [Ireg --]
• Offset the source pointer with second pointer [Preg ++ Preg]
The instruction versions that explicitly modify an index register (Ireg) support optional circular buffering. See the
description of Automatic Circular Addressing in the Address Arithmetic Unit chapter for more information.
NOTE: Unless circular buffering is desired, disable it prior to issuing this instruction by clearing the length regis-
ter (Lreg) corresponding to the Ireg used in this instruction. For example, if you use I2 to increment
your address pointer, first clear L2 to disable circular buffering. Failure to explicitly clear Lreg beforehand
can result in unexpected Ireg values. The circular address buffer registers (index, length, and base) are
not initialized automatically by reset. Typically, user software clears all the circular address buffer registers
during boot-up to disable circular buffering, then initializes them later, if needed
is indirect, post-increment index addressing. The form is shorthand for the following sequence.
Dest_lo = w [Src_1] ; /* load the 16-bit destination, indirect*/
Src_1 += Src_2 ; /* post-increment Src_1 by a quantity indexed by Src_2 */
where:
• Dest_lo is the destination low-half register. (Dreg_lo in the syntax example).
• Src_1 is the first source register on the right-hand side of the equation.
• Src_2 is the second source register.
Indirect and post-increment index addressing supports customized indirect address cadence. The indirect, post-in-
crement index version must have separate pointer registers for the input operands. If a common pointer is used for
the inputs, the instruction functions as a simple, non-incrementing load. For example,
r0.l = w [p2 ++ p2] ;
functions as:
r0.l = w [p2] ;
The instruction opcode size varies with the address type as follows:
• Load word to low-half register using an index register or a pointer register for the address encodes as a 16-bit
instruction.
• Load word to low-half register using a pointer register offset by a second pointer for the address encodes as a
16-bit instruction.
• Load word to low-half register using a 32-bit absolute address encodes as a 64-bit instruction.
The 16-bit load word instructions may be issued in parallel with certain other instructions. The 64-bit load word
instructions may not be issued in parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
LdM16bitToDregL Example
r3.l = w[ i1 ] ;
r7.l = w[ i3 ++ ] ;
r1.l = w[ i0 -- ] ;
r2.l = w[ p4 ] ;
r5.l = w[ p2 ++ p0 ] ;
Abstract
This instruction loads a register with a 32-bit value from memory.
See Also (8-Bit Load from Memory to 32-bit Register (LdM08bitToDreg), 16-Bit Load from Memory (LdM16bit-
ToDregH), 16-Bit Load from Memory (LdM16bitToDregL), 16-Bit Load from Memory to 32-Bit Register
(LdM16bitToDreg))
LdM32bitToDreg Description
The load 32-bit data to data register instruction loads a 32-bit value from a memory location into a data register.
The address of the memory location is identified with an index register, an index register plus an offset, a pointer
register, a pointer plus an offset, or a 32-bit absolute address. The address used in this instruction is restricted to
even memory address alignment (4-byte word address alignment). Failure to maintain proper alignment causes a
misaligned memory access exception. This instruction supports the following options.
• Post-increment the source index by 4 bytes [Ireg ++]
• Post-decrement the source index by 4 bytes [Ireg --]
• Offset the source index with a modifier [Ireg ++ Mreg]
• Post-increment the source pointer by 4 bytes [Preg ++]
• Post-decrement the source pointer by 4 bytes [Preg --]
• Offset the source frame pointer with a 5-bit signed constant [FP - SmallOffset]
• Offset the source pointer with a 5-bit signed constant [Preg + SmallOffset]
• Offset the source pointer with a 16-bit signed constant [Preg + LargeOffset]
• Offset the source pointer with second pointer [Preg ++ Preg]
The instruction versions that explicitly modify an index register (Ireg) support optional circular buffering. See the
description of Automatic Circular Addressing in the Address Arithmetic Unit chapter for more information.
NOTE: Unless circular buffering is desired, disable it prior to issuing this instruction by clearing the length regis-
ter (Lreg) corresponding to the Ireg used in this instruction. For example, if you use I2 to increment
your address pointer, first clear L2 to disable circular buffering. Failure to explicitly clear Lreg beforehand
can result in unexpected Ireg values. The circular address buffer registers (index, length, and base) are
not initialized automatically by reset. Typically, user software clears all the circular address buffer registers
during boot-up to disable circular buffering, then initializes them later, if needed
is indirect, post-increment index addressing. The form is shorthand for the following sequence.
Dest = [Src_1] ; /* load the 32-bit destination, indirect*/
Src_1 += Src_2 ; /* post-increment Src_1 by a quantity indexed by Src_2 */
where:
• Dest is the destination high-half register. (Dreg in the syntax example).
• Src_1 is the first source register on the right-hand side of the equation.
• Src_2 is the second source register.
Indirect and post-increment index addressing supports customized indirect address cadence. The indirect, post-in-
crement index version must have separate pointer registers for the input operands. If a common pointer is used for
the inputs, the instruction functions as a simple, non-incrementing load. For example,
r0 = [p2 ++ p2] ;
functions as:
r0 = [p2] ;
The instruction opcode size varies with the address type as follows:
• Load 32-bit data to register using an index register or a pointer register for the address encodes as a 16-bit
instruction.
• Load 32-bit data to register using an index register offset by a modifier register or a pointer register offset by a
second pointer for the address encodes as a 16-bit instruction.
• Load 32-bit data to register using a pointer or frame pointer register with a small offset for the address encodes
as a 16-bit instruction.
• Load 32-bit data to register using a pointer register with a large offset for the address encodes as a 32-bit in-
struction.
• Load 32-bit data to register using a 32-bit absolute address encodes as a 64-bit instruction.
The 16-bit load 32-bit data to register instructions may be issued in parallel with certain other instructions. The 32-
and 64-bit load 32-bit data to register instructions may not be issued in parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
LdM32bitToDreg Example
r3 = [ p0 ] ;
r7 = [ p1 ++ ] ;
r2 = [ sp -- ] ;
r6 = [ p2 + 12 ] ;
r0 = [ p4 + 0x800C ] ;
r1 = [ p0 ++ p1 ] ;
r5 = [ fp -12 ] ;
r2 = [ i2 ] ;
r0 = [ i0 ++ ] ;
r0 = [ i0 -- ] ;
/* Before indirect post-increment indexed addressing*/
r7 = 0 ;
i3 = 0x4000 ; /* Memory location contains 15, for example.*/
m0 = 4 ;
r7 = [i3 ++ m0] ;
/* Afterwards . . .*/
/* r7 = 15 from memory location 0x4000*/
/* i3 = i3 + m0 = 0x4004*/
/* m0 still equals 4*/
Load/Store (Ldp)
PREG Register Type = [PREG Register Type++]
PREG Register Type = [PREG Register Type--]
PREG Register Type = [PREG Register Type]
Load/Store indexed with small immediate offset (LdpII)
PREG Register Type = [PREG Register Type + uimm4s4 Register Type]
Long Load/Store with indexed addressing (LdStIdxI)
PREG Register Type = [PREG Register Type + imm16s4 Register Type]
Abstract
This instruction loads a pointer register with 7-bit immediate value.
LdM32bitToPreg Description
The load 32-bit data to pointer register instruction loads a 32-bit value from a memory location into a pointer regis-
ter. The address of the memory location is identified with a pointer register or a pointer plus an offset. The address
used in this instruction is restricted to even memory address alignment (4-byte word address alignment). Failure to
maintain proper alignment causes a misaligned memory access exception. This instruction supports the following
options.
• Post-increment the source pointer by 4 bytes [Preg ++]
• Post-decrement the source pointer by 4 bytes [Preg --]
• Offset the source pointer with a 5-bit signed constant [Preg + SmallOffset]
• Offset the source pointer with a 16-bit signed constant [Preg + LargeOffset]
The instruction opcode size varies with the address type as follows:
• Load 32-bit data to register using a pointer register for the address encodes as a 16-bit instruction.
• Load 32-bit data to register using a pointer register with a small offset for the address encodes as a 16-bit in-
struction.
• Load 32-bit data to register using a pointer register with a large offset for the address encodes as a 32-bit in-
struction.
The 16-bit load 32-bit data to pointer instructions may be issued in parallel with certain other instructions. The 32-
bit load 32-bit data to pointer instructions may not be issued in parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
LdM32bitToPreg Example
p3 = [ p2 ] ;
p5 = [ p0 ++ ] ;
p2 = [ sp -- ] ;
p3 = [ p2 + 8 ] ;
p0 = [ p2 + 0x4008 ] ;
p1 = [ fp - 16 ] ;
When the memory management unit cannot complete an memory load (exclusive) a number of exceptions and er-
rors may be issued, in addition to those that may be caused by a regular memory load. The Exceptions/Errors from
Unsuccessful Memory Load (Exclusive) Operations table lists these exceptions and errors.
Abstract
This instruction loads a register with an 8-bit value from memory, using an exclusive memory read. The value is sign
or zero extended in the register.
See Also (32-Bit Load from Memory (LdX32bitToDreg), 16-Bit Load from Memory (LdX16bitToDregH), 16-Bit
Load from Memory (LdX16bitToDregL), 16-Bit Load from Memory to 32-Bit Register (LdX16bitToDreg))
LdX08bitToDreg Description
The load data register from memory (8-bit transfer) checks for exclusive access to a memory location and reads a
data value (loading it into the least significant byte of the register with zero- or sign-extension) from the location
only if exclusive access is still held.
For more information about memory load (exclusive) operations, see Memory Load (Exclusive) Operations.
LdX08bitToDreg Example
r1 = b[p4] (x,excl); /* load exclusive 8-bits sign sign to D-register */
Abstract
This instruction loads a register with a 16-bit value from memory, using an exclusive memory read. The value is sign
or zero extended in the register.
See Also (8-Bit Load from Memory to 32-bit Register (LdX08bitToDreg), 32-Bit Load from Memory (LdX32bitTo-
Dreg), 16-Bit Load from Memory (LdX16bitToDregH), 16-Bit Load from Memory (LdX16bitToDregL))
LdX16bitToDreg Description
The load data register from memory (16-bit transfer) checks for exclusive access to a memory location and reads a
data value (loading it into the least significant 16-bits of the register with zero- or sign-extension) from the location
only if exclusive access is still held.
For more information about memory load (exclusive) operations, see Memory Load (Exclusive) Operations.
LdX16bitToDreg Example
r2 = w[p4] (z,excl); /* load exclusive 16-bits zero extend to D-register */
r3 = w[p4] (x,exc;) /* load exclusive 16-bits sign extend to D-register */
Abstract
This instruction loads a high-half register with a 16-bit value from memory, using an exclusive memory read.
See Also (8-Bit Load from Memory to 32-bit Register (LdX08bitToDreg), 32-Bit Load from Memory (LdX32bitTo-
Dreg), 16-Bit Load from Memory (LdX16bitToDregL), 16-Bit Load from Memory to 32-Bit Register (LdX16bit-
ToDreg))
LdX16bitToDregH Description
The load high half data register from memory (16-bit transfer) checks for exclusive access to a memory location and
reads a data value (loading it into the half register) from the location only if exclusive access is still held.
For more information about memory load (exclusive) operations, see Memory Load (Exclusive) Operations.
LdX16bitToDregH Example
r1.h = w[p4] (excl); /* load exclusive 16-bits zero sign to high D-register half */
Abstract
This instruction loads a low-half register with a 16-bit value from memory, using an exclusive memory read.
See Also (8-Bit Load from Memory to 32-bit Register (LdX08bitToDreg), 32-Bit Load from Memory (LdX32bitTo-
Dreg), 16-Bit Load from Memory (LdX16bitToDregH), 16-Bit Load from Memory to 32-Bit Register (LdX16bit-
ToDreg))
LdX16bitToDregL Description
The load low half data register from memory (16-bit transfer) checks for exclusive access to a memory location and
reads a data value (loading it into the half register) from the location only if exclusive access is still held.
For more information about memory load (exclusive) operations, see Memory Load (Exclusive) Operations.
LdX16bitToDregL Example
r1.l = w[p4] (excl); /* load exclusive 16-bits zero sign to low D-register half */
Abstract
This instruction loads a register with a 32-bit value from memory, using an exclusive memory read.
See Also (8-Bit Load from Memory to 32-bit Register (LdX08bitToDreg), 16-Bit Load from Memory (LdX16bitTo-
DregH), 16-Bit Load from Memory (LdX16bitToDregL), 16-Bit Load from Memory to 32-Bit Register (LdX16bit-
ToDreg))
LdX32bitToDreg Description
The load data register from memory (32-bit transfer) checks for exclusive access to a memory location and reads a
data value (loading it into the register) from the location only if exclusive access is still held.
For more information about memory load (exclusive) operations, see Memory Load (Exclusive) Operations.
LdX32bitToDreg Example
r0 = [p4] (excl); /* load exclusive 32-bits to D-register */
Pack Operations
These operations provide byte packing and unpacking operations on register and register pair operands:
• Pack 8-Bit to 32-Bit (BytePack)
• Spread 8-Bit to 16-Bit (ByteUnPack)
• Pack 16-Bit to 32-Bit (Pack16Vec)
Abstract
This instruction takes the low bytes from each 16-bit register half of two registers and combines them to create a
single 32-bit register. Used to re-order data.
See Also (Spread 8-Bit to 16-Bit (ByteUnPack), Pack 16-Bit to 32-Bit (Pack16Vec))
BytePack Description
The Quad 8-Bit Pack instruction packs four 8-bit values, half-word aligned, contained in two source registers into
one register, byte aligned. The Source Registers Contain figure and Destination Register Receives figure show the
packing pattern.
31 24 23 16 15 8 7 0
src_reg_0: byte1 byte0
src_reg_1: byte3 byte2
BytePack Example
r2 = bytepack (r4,r5) ;
/* Assume the following: ... */
/* R4 = 0xFEED FACE */
/* R5 = 0xBEEF BADD */
/* Then, this instruction returns: ... */
/* R2 = 0xEFDD EDCE */
Abstract
This instruction spreads four bytes to four zero extended 16-Bit values. The lower two bits of I0 are used to [[extrac-
tBytes | extract four contiguous bytes]] from the input register pair.
See Also (Pack 8-Bit to 32-Bit (BytePack), Pack 16-Bit to 32-Bit (Pack16Vec))
ByteUnPack Description
The Quad 8-Bit Unpack instruction copies four contiguous bytes from a pair of source registers, adjusting for byte
alignment. The instruction loads the selected bytes into two arbitrary data registers on half-word alignment. The
two LSBs of the I0 register determine the source byte alignment, as shown in the I-register Bits and the Byte Align-
ment, no (r) option figure. This figure shows the default source order case---not the (r) syntax---and the data con-
tained in the source register pair. This instruction prevents exceptions that would otherwise be caused by misaligned
32-bit memory loads issued in parallel.
Figure 8-25: I-register Bits and the Byte Alignment, no (r) option
The (r) syntax reverses the order of the source registers within the pair. Typical high performance applications can-
not afford the overhead of reloading both register pair operands to maintain byte order for every calculation. Instead,
they alternate and load only one register pair operand each time and alternate between the forward and reverse byte
order versions of this instruction. By default, the low order bytes come from the low register in the register pair. The
(r) option causes the low order bytes to come from the high register. In the optional reverse source order case (for
example, using the (r) syntax), the only difference is the source registers swap places in their byte ordering. Assume
the source register pair contains the data shown in the I-register Bits and the Byte Alignment, with (r) option figure.
The bytes selected are src_reg_pair_ LO src_reg_pair_ HI
Two LSB’s of I0 or I1 byte 7 byte 6 byte 5 byte 4 byte 3 byte 2 byte 1 byte 0
00b: byte 3 byte 2 byte 1 byte 0
01b: byte 4 byte 3 byte 2 byte 1
10b: byte 5 byte 4 byte 3 byte 2
11b: byte 6 byte 5 byte 4 byte 3
Figure 8-26: I-register Bits and the Byte Alignment, with (r) option
The four bytes, now byte aligned, are copied into the destination registers on half-word alignment, as shown in the
Source Register Contains figure and the Destination Registers Receive figure.
31 24 23 16 15 8 7 0
Aligned bytes : by te_D by te_C by te_B b yte_A
ByteUnPack Example
(r6,r5) = byteunpack r1:0 ; /* non-reversing sources */
Shift (Dsp32Shf )
DREG Register Type = pack (DREG_L Register Type, DREG_L Register Type)
DREG Register Type = pack (DREG_L Register Type, DREG_H Register Type)
DREG Register Type = pack (DREG_H Register Type, DREG_L Register Type)
DREG Register Type = pack (DREG_H Register Type, DREG_H Register Type)
Abstract
This instruction packs two 16-bit half registers into one 32-bit register.
See Also (Pack 8-Bit to 32-Bit (BytePack), Spread 8-Bit to 16-Bit (ByteUnPack))
Pack16Vec Description
The vector pack instruction packs two 16-bit half-word numbers into the halves of a 32-bit data register as shown in
the Source Registers Contain figure and the Destination Register Contains figure.
31 24 23 16 15 8 7 0
src_half_0 half_word_0
src_half_1 half_word_1
Pack16Vec Example
r3=pack(r4.l, r5.l) ; /* pack low / low half-words */
/* Special Applications */
/* If r4.l = 0xDEAD and r5.l = 0xBEEF, then . . . */
r3 = pack (r4.l, r5.l) ;
/* . . . produces r3 = 0xDEAD BEEF */
/* example needed here */
Abstract
This instruction stores the most significant 16-bit value from a register to memory.
See Also (8-Bit Store to Memory (StDregToM08bit), 32-Bit Store to Memory (StDregToM32bit), 16-Bit Store to
Memory (StDregLToM16bit))
StDregHToM16bit Description
The store word from high-half data register instruction stores a 16-bit value from a high-half data register to a mem-
ory location. The operation does not affect the related low-half register. The address of the memory location is iden-
tified with an index register, a pointer register, a pointer plus an offset, or a 32-bit absolute address. The address used
in this instruction is restricted to even memory address alignment (2-byte half-word address alignment). Failure to
maintain proper alignment causes a misaligned memory access exception. This instruction supports the following
options.
• Post-increment the source index by 2 bytes [Ireg ++]
• Post-decrement the source index by 2 bytes [Ireg --]
• Offset the source pointer with second pointer [Preg ++ Preg]
The instruction versions that explicitly modify an index register (Ireg) support optional circular buffering. See the
description of Automatic Circular Addressing in the Address Arithmetic Unit chapter for more information.
NOTE: Unless circular buffering is desired, disable it prior to issuing this instruction by clearing the length regis-
ter (Lreg) corresponding to the Ireg used in this instruction. For example, if you use I2 to increment
your address pointer, first clear L2 to disable circular buffering. Failure to explicitly clear Lreg beforehand
can result in unexpected Ireg values. The circular address buffer registers (index, length, and base) are
not initialized automatically by reset. Typically, user software clears all the circular address buffer registers
during boot-up to disable circular buffering, then initializes them later, if needed
is indirect, post-increment index addressing. The form is shorthand for the following sequence.
w [Dest_1] = Src_hi ; /* store to the 16-bit destination, indirect*/
Dest_1 += Dest_2 ; /* post-increment Src_1 by a quantity indexed by Src_2 */
where:
• Src_hi is the source high-half register. (Dreg_hi in the syntax example).
• Dest_1 is the first destination register on the left-hand side of the equation.
• Dest_2 is the second destination register.
Indirect and post-increment index addressing supports customized indirect address cadence. The indirect, post-in-
crement index version must have separate pointer registers for the input operands. If a common pointer is used for
the inputs, the instruction functions as a simple, non-incrementing load. For example,
w [p2 ++ p2] = r0.h ;
functions as:
w [p2] = r0.h ;
The instruction opcode size varies with the address type as follows:
• Store word to memory using an index register or a pointer register for the address encodes as a 16-bit instruc-
tion.
• Store word to memory using a pointer register offset by a second pointer for the address encodes as a 16-bit
instruction.
• Store word to memory using a 32-bit absolute address encodes as a 64-bit instruction.
The 16-bit load word instructions may be issued in parallel with certain other instructions. The 64-bit load word
instructions may not be issued in parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
StDregHToM16bit Example
w[ i1 ] = r3.h ;
w[ i3 ++ ] = r7.h ;
w[ i0 -- ] = r1.h ;
w[ p4 ] = r2.h ;
w[ p2 ++ p0 ] = r5.h ;
Abstract
This instruction stores the least significant 16-bit value from a register to memory.
See Also (8-Bit Store to Memory (StDregToM08bit), 32-Bit Store to Memory (StDregToM32bit), 16-Bit Store to
Memory (StDregHToM16bit))
StDregLToM16bit Description
The store word from low-half data register instruction stores a 16-bit value either from a low-half data register or
from the least significant 16 bits of a data register to a memory location. The operation does not affect the related
high-half register. The address of the memory location is identified with an index register, a pointer register, a point-
er plus an offset, or a 32-bit absolute address. The address used in this instruction is restricted to even memory ad-
dress alignment (2-byte half-word address alignment). Failure to maintain proper alignment causes a misaligned
memory access exception. This instruction supports the following options.
• Post-increment the source index by 2 bytes [Ireg ++]
• Post-decrement the source index by 2 bytes [Ireg --]
• Offset the source pointer with second pointer [Preg ++ Preg]
The instruction versions that explicitly modify an index register (Ireg) support optional circular buffering. See the
description of Automatic Circular Addressing in the Address Arithmetic Unit chapter for more information.
NOTE: Unless circular buffering is desired, disable it prior to issuing this instruction by clearing the length regis-
ter (Lreg) corresponding to the Ireg used in this instruction. For example, if you use I2 to increment
your address pointer, first clear L2 to disable circular buffering. Failure to explicitly clear Lreg beforehand
can result in unexpected Ireg values. The circular address buffer registers (index, length, and base) are
not initialized automatically by reset. Typically, user software clears all the circular address buffer registers
during boot-up to disable circular buffering, then initializes them later, if needed
is indirect, post-increment index addressing. The form is shorthand for the following sequence.
w [Dest_1] = Src_hi ; /* store to the 16-bit destination, indirect*/
Dest_1 += Dest_2 ; /* post-increment Src_1 by a quantity indexed by Src_2 */
where:
• Src_hi is the source high-half register. (Dreg_hi in the syntax example).
• Dest_1 is the first destination register on the left-hand side of the equation.
• Dest_2 is the second destination register.
Indirect and post-increment index addressing supports customized indirect address cadence. The indirect, post-in-
crement index version must have separate pointer registers for the input operands. If a common pointer is used for
the inputs, the instruction functions as a simple, non-incrementing load. For example,
w [p2 ++ p2] = r0.h ;
functions as:
w [p2] = r0.h ;
The instruction opcode size varies with the address type as follows:
• Store word to memory using an index register or a pointer register for the address encodes as a 16-bit instruc-
tion.
• Store word to memory using a pointer register offset by a second pointer for the address encodes as a 16-bit
instruction.
• Store word to memory using a 32-bit absolute address encodes as a 64-bit instruction.
The 16-bit load word instructions may be issued in parallel with certain other instructions. The 64-bit load word
instructions may not be issued in parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
StDregLToM16bit Example
w [ i1 ] = r3.l ;
w [ p0 ] = r3 ;
w [ i3 ++ ] = r7.l ;
w [ i0 -- ] = r1.l ;
w [ p4 ] = r2.l ;
w [ p1 ++ ] = r7 ;
w [ sp -- ] = r2 ;
w [ p2 + 12 ] = r6 ;
w [ p4 - 0x200C ] = r0 ;
w [ p2 ++ p0 ] = r5.l ;
Load/Store (LdSt)
b[PREG Register Type++] = DREG Register Type
b[PREG Register Type--] = DREG Register Type
b[PREG Register Type] = DREG Register Type
Long Load/Store with indexed addressing (LdStIdxI)
b[PREG Register Type + imm16reloc Register Type] = DREG Register Type
Abstract
This instruction stores the least significant 8-bit value from a register to memory.
See Also (32-Bit Store to Memory (StDregToM32bit), 16-Bit Store to Memory (StDregLToM16bit), 16-Bit Store to
Memory (StDregHToM16bit))
StDregToM08bit Description
The store byte from data register instruction stores the least significant 8 bits from a 32-bit data register byte to a
memory location. The address of the memory location is identified with a pointer register, a pointer plus an offset,
or a 32-bit absolute address. The address used in this instruction has no restrictions for memory address alignment.
This instruction supports the following options.
• Post-increment the source pointer by 1 byte [Preg ++]
• Post-decrement the source pointer by 1 byte [Preg --]
• Offset the source pointer with a 16-bit signed constant [Preg + Offset]
The instruction opcode size varies with the address type as follows:
• Store byte to memory using a pointer register for the address encodes as a 16-bit instruction.
• Store byte to memory using a pointer register with 16-bit offset for the address encodes as a 32-bit instruction.
• Store byte to memory using a 32-bit absolute address encodes as a 64-bit instruction.
The 16-bit store byte instructions may be issued in parallel with certain other instructions. The 32- and 64-bit load
byte instructions may not be issued in parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
StDregToM08bit Example
b [ p0 ] = r3 ;
b [ p1 ++ ] = r7 ;
b [ sp -- ] = r2 ;
b [ p4 + 0x100F ] = r0 ;
b [ p4 - 0x53F ] = r0 ;
Abstract
This instruction stores the 32-bit value from a register to memory.
See Also (8-Bit Store to Memory (StDregToM08bit), 16-Bit Store to Memory (StDregLToM16bit), 16-Bit Store to
Memory (StDregHToM16bit))
StDregToM32bit Description
The store 32-bit data from data register instruction stores a 32-bit value from a data register into a memory location.
The address of the memory location is identified with an index register, an index register plus an offset, a pointer
register, a pointer plus an offset, or a 32-bit absolute address. The address used in this instruction is restricted to
even memory address alignment (4-byte word address alignment). Failure to maintain proper alignment causes a
misaligned memory access exception. This instruction supports the following options.
• Post-increment the source index by 4 bytes [Ireg ++]
• Post-decrement the source index by 4 bytes [Ireg --]
• Offset the source index with a modifier [Ireg ++ Mreg]
8–196 ADSP-BF7xx Blackfin+ Processor
Memory Store Operations
NOTE: Unless circular buffering is desired, disable it prior to issuing this instruction by clearing the length regis-
ter (Lreg) corresponding to the Ireg used in this instruction. For example, if you use I2 to increment
your address pointer, first clear L2 to disable circular buffering. Failure to explicitly clear Lreg beforehand
can result in unexpected Ireg values. The circular address buffer registers (index, length, and base) are
not initialized automatically by reset. Typically, user software clears all the circular address buffer registers
during boot-up to disable circular buffering, then initializes them later, if needed
is indirect, post-increment index addressing. The form is shorthand for the following sequence.
[Dest_1] = Src ; /* store to the 32-bit destination, indirect*/
Dest_1 += Dest_2 ; /* post-increment Src_1 by a quantity indexed by Src_2 */
where:
• Src is the source register. (Dreg in the syntax example).
• Dest_1 is the first destination register on the right-hand side of the equation.
• Dest_2 is the second destination register.
Indirect and post-increment index addressing supports customized indirect address cadence. The indirect, post-in-
crement index version must have separate pointer registers for the input operands. If a common pointer is used for
the inputs, the instruction functions as a simple, non-incrementing load. For example,
[p2 ++ p2] = r0 ;
functions as:
[p2] = r0 ;
The instruction opcode size varies with the address type as follows:
• Store 32-bit data to memory using an index register or a pointer register for the address encodes as a 16-bit
instruction.
• Store 32-bit data to memory using an index register offset by a modifier register or a pointer register offset by a
second pointer for the address encodes as a 16-bit instruction.
• Store 32-bit data to memory using a pointer or frame pointer register with a small offset for the address enco-
des as a 16-bit instruction.
• Store 32-bit data to memory using a pointer register with a large offset for the address encodes as a 32-bit in-
struction.
• Store 32-bit data to memory using a 32-bit absolute address encodes as a 64-bit instruction.
The 16-bit store 32-bit data to register instructions may be issued in parallel with certain other instructions. The 32-
and 64-bit store 32-bit data to register instructions may not be issued in parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
StDregToM32bit Example
[ p0 ] = r3 ;
[ p1 ++ ] = r7 ;
[ sp -- ] = r2 ;
[ p2 + 12 ] = r6 ;
[ p4 - 0x1004 ] = r0 ;
[ p0 ++ p1 ] = r1 ;
[ fp - 28 ] = r5 ;
[ i2 ] = r2 ;
[ i0 ++ ] = r0 ;
[ i0 -- ] = r0 ;
[ i3 ++ m0 ] = r7 ;
Load/Store (LdSt)
[PREG Register Type++] = PREG Register Type
[PREG Register Type--] = PREG Register Type
[PREG Register Type] = PREG Register Type
Load/Store indexed with small immediate offset (LdStII)
[PREG Register Type + uimm4s4 Register Type] = PREG Register Type
Long Load/Store with indexed addressing (LdStIdxI)
[PREG Register Type + imm16s4 Register Type] = PREG Register Type
Abstract
This instruction stores the 32-bit value from a pointer register to memory.
StPregToM32bit Description
The store 32-bit data from pointer register instruction stores a 32-bit value from a pointer register into a memory
location. The address of the memory location is identified with a pointer register or a pointer plus an offset. The
address used in this instruction is restricted to even memory address alignment (4-byte word address alignment).
Failure to maintain proper alignment causes a misaligned memory access exception. This instruction supports the
following options.
• Post-increment the source pointer by 4 bytes [Preg ++]
• Post-decrement the source pointer by 4 bytes [Preg --]
• Offset the source pointer with a 5-bit signed constant [Preg + SmallOffset]
• Offset the source pointer with a 16-bit signed constant [Preg + LargeOffset]
The instruction opcode size varies with the address type as follows:
• Store 32-bit data to memory using a pointer register for the address encodes as a 16-bit instruction.
• Store 32-bit data to memory using a pointer register with a small offset for the address encodes as a 16-bit
instruction.
• Store 32-bit data to memory using a pointer register with a large offset for the address encodes as a 32-bit in-
struction.
The 16-bit store 32-bit data from pointer instructions may be issued in parallel with certain other instructions. The
32-bit store 32-bit data from pointer instructions may not be issued in parallel with other instructions.
This instruction may be used in either User or Supervisor mode.
StPregToM32bit Example
[ p2 ] = p3 ;
[ sp ++ ] = p5 ;
[ p0 -- ] = p2 ;
[ p2 + 8 ] = p3 ;
[ p2 + 0x4444 ] = p0 ;
[ fp -12 ] = p1 ;
to write data to the memory location. When the memory location is in non-shareable memory, the instruction is
considered to have exclusive access to the location simply based on the SEQSTAT.XMONITOR bit, and the store ex-
clusive instruction performs the write in exactly the same manner as a regular memory store to the same memory
location. When the memory location is in shareable memory, the store exclusive instruction performs the write with
an exclusive transaction on the memory bus. This transaction may itself fail to update the location if another core
has established exclusive access or written to the location since the current task executed the prior load exclusive
instruction. For more information about illegal, non-shareable, and shareable memory regions, see the Exclusive
Loads and Stores section of the Memory chapter.
The memory store (exclusive) operations include the following instructions:
• 8-Bit Store to Memory (StDregToX08bit)
• 32-Bit Store to Memory (StDregToX32bit)
• 16-Bit Store to Memory (StDregLToX16bit)
• 16-Bit Store to Memory (StDregHToX16bit)
The store exclusive instruction terminates before the success or failure of the write transaction is known. The state of
the write transaction is tracked in the SEQSTAT.XWACTIVE and SEQSTAT.XWAVAIL bits which are updated asyn-
chronously to the core pipeline once the write response has been received from the system. These bits should not be
tested directly, instead the SYNCEXCL (Synchronize Exclusive State) instruction should be used to waits for the write
to complete and set ASTAT.CC according to whether the store exclusive instruction successfully wrote to the memo-
ry location. The Exclusive Related Bits in Status Registers (SEQSTAT and ASTAT) table shows how exclusive access
status changes during an exclusive memory store operation.
Table 8-25: Exclusive Related Bits in Status Registers (SEQSTAT and ASTAT)
Name Description Condition
SEQSTAT.XMONITOR Exclusive monitor (0=open, 1=exclusive) Not updated
=0 on start of instruction, CC=0
=1 on start of instruction, attempt update (Preg,val), ASTAT.CC=1,
SEQSTAT.XWACTIVE=1
=0 on completion of instruction
=1 while active
ASTAT.CC Condition Code (0=no write attempted, 1=write attempted) Always updated
SEQSTAT.XWAVAIL Exclusive write resp. (0=no status, 1=available) Always updated
=1 on completion of write transaction
*1 If XWACTIVE is not 0 when the instruction starts, the instruction throws an exception.
When the memory management unit cannot complete an memory store (exclusive) a number of exceptions and er-
rors may be issued, in addition to those that may be caused by a regular memory store. The Exceptions/Errors from
Unsuccessful Memory Store (Exclusive) Operations table lists these exceptions and errors.
Abstract
This instruction stores the most significant 16-bit value from a register to memory, using an exclusive memory write.
See Also (8-Bit Store to Memory (StDregToX08bit), 32-Bit Store to Memory (StDregToX32bit), 16-Bit Store to
Memory (StDregLToX16bit))
StDregHToX16bit Description
The store high half data register to memory (16-bit transfer) checks for exclusive access to a memory location and
writes a data value (contents of half register) to the location only if exclusive access is still held.
For more information about memory store (exclusive) operations, see Memory Store (Exclusive) Operations.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
StDregHToX16bit Example
CC = (W[P4] = R3.H)(EXCL); /* store exclusive from high 16-bits of a D-register */
Abstract
This instruction stores the least significant 16-bit value from a register to memory, using an exclusive memory write.
See Also (8-Bit Store to Memory (StDregToX08bit), 32-Bit Store to Memory (StDregToX32bit), 16-Bit Store to
Memory (StDregHToX16bit))
StDregLToX16bit Description
The store low half data register to memory (16-bit transfer) checks for exclusive access to a memory location and
writes a data value (contents of half register) to the location only if exclusive access is still held.
For more information about memory store (exclusive) operations, see Memory Store (Exclusive) Operations.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
StDregLToX16bit Example
CC = (W[P4] = R3)(EXCL); /* store exclusive from low 16-bits of a D-register */
CC = (W[P4] = R3.L)(EXCL); /* alternate syntax for same instruction */
Abstract
This instruction stores the least significant 8-bit value from a register to memory, using an exclusive memory write.
See Also (32-Bit Store to Memory (StDregToX32bit), 16-Bit Store to Memory (StDregLToX16bit), 16-Bit Store to
Memory (StDregHToX16bit))
StDregToX08bit Description
The store data register to memory (8-bit transfer) checks for exclusive access to a memory location and writes a data
value (least significant byte of data register contents) to the location only if exclusive access is still held.
For more information about memory store (exclusive) operations, see Memory Store (Exclusive) Operations.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
StDregToX08bit Example
CC = (B[P4] = R6)(EXCL); /* store exclusive from low 8-bits of a D-register */
Abstract
This instruction stores the 32-bit value from a register to memory, using an exclusive memory write.
See Also (8-Bit Store to Memory (StDregToX08bit), 16-Bit Store to Memory (StDregLToX16bit), 16-Bit Store to
Memory (StDregHToX16bit))
StDregToX32bit Description
The store data register to memory (32-bit transfer) checks for exclusive access to a memory location and writes a
data value (contents of register) to the location only if exclusive access is still held.
For more information about memory store (exclusive) operations, see Memory Store (Exclusive) Operations.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
StDregToX32bit Example
CC = ([P2] = R0)(EXCL); /* store exclusive all 32-bits of a D-register */
SP
I3 L3 B3 M3 FP
I2 L2 B2 M2 P5
I1 L1 B1 M1 DAG1 P4
I0 L0 B0 M0 P3
DAG0
P2
DA1 32
P1
DA0 32
P0
TO MEMORY
32 32
RAB PREG
SD 32
LD1 32 32 ASTAT
LD0 32
32
SEQUENCER
R7.H R7.L
R6.H R6.L
R5.H R5.L ALIGN
16 32 16
R4.H R4.L
8 8 8 8
R3.H R3.L
R2.H R2.L DECODE
R1.H R1.L BARREL
R0.H R0.L 40 SHIFTER 40 LOOP BUFFER
72
40 A0 A1 40 CONTROL
UNIT
32 32
Shift (Dsp32Shf )
DREG_L Register Type = expadj (DREG Register Type, DREG_L Register Type)
DREG_L Register Type = expadj (DREG Register Type, DREG_L Register Type) (v)
DREG_L Register Type = expadj (DREG_L Register Type, DREG_L Register Type)
DREG_L Register Type = expadj (DREG_H Register Type, DREG_L Register Type)
Abstract
This instruction identifies the largest magnitude of a fractional number (YOP) and a reference exponent and returns
the smaller of the two exponents. The exponent is the number of sign bits minus one. Exponents are unsigned inte-
gers. The input values can be a 32-bit register, a 16-bit half register, or a 16-bit vector.
Shift_ExpAdj32 Description
The exponent detection instruction identifies the largest magnitude of two or three fractional numbers based on
their exponents. It compares the magnitude of one or two sample values to a reference exponent and returns the
smallest of the exponents.
The exponent is the number of sign bits minus one. In other words, the exponent is the number of redundant sign
bits in a signed number. Exponents are unsigned integers. The exponent detection instruction accommodates the
two special cases (0 and –1) and always returns the smallest exponent for each case.
The reference exponent and destination exponent are 16-bit half-word unsigned values. The sample number can be
either a word or half-word. The exponent detection instruction does not implicitly modify input values. The
dest_reg and exponent_register can be the same data register. Doing this explicitly modifies the expo-
nent_register.
The valid range of exponents is 0 through 31, with 31 representing the smallest 32-bit number magnitude and 15
representing the smallest 16-bit number magnitude.
Exponent detection supports three types of samples—one 32-bit sample, one 16-bit sample (either upper-half or
lower-half word), and two 16-bit samples that occupy the upper-half and lower-half words of a single 32-bit register.
One special application of EXPADJ is to use this instruction to detect the exponent of the largest magnitude number
in an array. The detected value may then be used to normalize the array on a subsequent pass with a shift operation.
Typically, use this feature to implement block floating-point capabilities.
This 16-bit instruction may be issued in parallel with certain other 16-bit other instructions.
This instruction may be used in either User or Supervisor mode.
Shift_ExpAdj32 Example
r5.l = expadj (r4, r2.l) ;
/* ... Assume R4 = 0x0000 0052 and R2.L = 12. Then R5.L becomes 12. */
/* ... Assume R4 = 0xFFFF 0052 and R2.L = 12. Then R5.L becomes 12. */
/* ... Assume R4 = 0x0000 0052 and R2.L = 27. Then R5.L becomes 24. */
/* ... Assume R4 = 0xF000 0052 and R2.L = 27. Then R5.L becomes 3. */
/* ... Assume R4.L = 0x0765 and R2.L = 12. Then R5.L becomes 4. */
/* ... Assume R4.L = 0xC765 and R2.L = 12. Then R5.L becomes 1. */
DCT Operations
These operations provide addition and/or subtract operations with prescale and rounding on register operands:
• 32-Bit Prescale Up Add/Sub to 16-bit (AddSubRnd12)
• 32-Bit Prescale Down Add/Sub to 16-Bit (AddSubRnd20)
Abstract
This instruction shifts then adds or subtracts two 32-bit numbers, then it extracts sixteen bits of result. The instru-
ciont pre-shifts the operands four bits to the left, then it does the add/sub, round, and extract. The instruction sup-
ports only biased rounding, which adds a half LSB (bit 15) before truncating bits 15-0. The RND_MOD bit in the
ASTAT register has no bearing on the rounding behavior of this instruction.
See Also (32-Bit Prescale Down Add/Sub to 16-Bit (AddSubRnd20))
AddSubRnd12 Description
The add/subtract prescale up instruction combines two 32-bit values to produce a 16-bit result as follows:
• Prescale up both input operand values by shifting them four places to the left
• Add or subtract the operands, depending on the instruction version used
• Round and saturate the upper 16 bits of the result
• Extract the upper 16 bits to the dest_reg
The instruction supports only biased rounding. The RND_MOD bit in the ASTAT register has no bearing on the
rounding behavior of this instruction. See the Saturation topic in the Introduction chapter for a description of satura-
tion behavior.
This 32-bit instruction can be issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
A special applications of the add/subtract prescale up instruction is to use this instruction to provide an IEEE 1180–
compliant 2D 8x8 inverse discrete cosine transform.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
AddSubRnd12 Example
r1.l = r6+r7(rnd12) ;
r1.l = r6-r7(rnd12) ;
r1.h = r6+r7(rnd12) ;
r1.h = r6-r7(rnd12) ;
Abstract
This instruction shifts then adds or subtracts two 32-bit numbers, then it extracts sixteen bits of result. The instruc-
tion arithmetically pre-shifts the operands four bits to the right. It adds or subtracts them, rounds the upper 16-Bits
of the result then extracts the upper 16-Bits to the result. The instruction supports only biased rounding, which
adds a half LSB (bit 15) before truncating bits 15-0. The RND_MOD bit in the ASTAT register has no bearing on
the rounding behavior of this instruction.
See Also (32-Bit Prescale Up Add/Sub to 16-bit (AddSubRnd12))
AddSubRnd20 Description
The add/subtract prescale down instruction combines two 32-bit values to produce a 16-bit result as follows:
• Prescale down both input operand values by arithmetically shifting them four places to the right
• Add or subtract the operands, depending on the instruction version used
• Round the upper 16 bits of the result
• Extract the upper 16 bits to the dest_reg
The instruction supports only biased rounding. The RND_MOD bit in the ASTAT register has no bearing on the
rounding behavior of this instruction. See the Saturation topic in the Introduction chapter for a description of satura-
tion behavior.
This 32-bit instruction can be issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
A special applications of the add/subtract prescale down instruction is to use this instruction to provide an IEEE
1180–compliant 2D 8x8 inverse discrete cosine transform.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
AddSubRnd20 Example
r1.l = r6+r7(rnd20) ;
r1.l = r6-r7(rnd20) ;
r1.h = r6+r7(rnd20) ;
r1.h = r6-r7(rnd20) ;
Divide Operations
These operations provide division primitive operations on register operands:
• DIVS and DIVQ Divide Primitives (Divide)
Abstract
The DIVQ instruction is a simple non-restoring divide primitive. It takes two operands, src and dst, where the
source operand is the denominator and dst is the numerator. The denominator or divisor is subtracted or added
repeatedly from the numerator which becomes the dividend. The algorithm uses a status bit AQ (quotient bit),
which determines how the ALU will compute the next bit of the quotient. If the AQ bit is 1 then an add is per-
formed otherwise the dividend is subtracted from the partial remainder. The DIVS instruction is the initializing in-
struction for DIVQ. It sets the AQ flag based on the signs of the 32-bit dividend and the 16-bit divisor, left shifts
the dividend one bit, then copies AQ into the dividend LSB.
Divide Description
The Divide Primitive instruction versions are the foundation elements of a nonrestoring conditional add-subtract
division algorithm. See “Example” on page 15-24 for such a routine.
The dividend (numerator) is a 32-bit value. The divisor (denominator) is a 16-bit value in the lower half of divi-
sor_register. The high-order half-word of divisor_register is ignored entirely.
The division can either be signed or unsigned, but the dividend and divisor must both be of the same type. The
divisor cannot be negative. A signed division operation, where the dividend may be negative, begins the sequence
with the DIVS (“divide-sign”) instruction, followed by repeated execution of the DIVQ (“divide-quotient”) instruc-
tion. An unsigned division omits the DIVS instruction. In that case, the user must manually clear the AQ status bit
of the ASTAT register before issuing the DIVQ instructions.
Up to 16 bits of signed quotient resolution can be calculated by issuing DIVS once, then repeating the DIVQ in-
struction 15 times. A 16-bit unsigned quotient is calculated by omitting DIVS, clearing the AQ status bit, then
issuing 16 DIVQ instructions.
Less quotient resolution is produced by executing fewer DIVQ iterations.
The result of each successive addition or subtraction appears in dividend_register, aligned and ready for the next
addition or subtraction step. The contents of divisor_register are not modified by this instruction.
The final quotient appears in the low-order half-word of dividend_register at the end of the successive add/subtract
sequence.
DIVS computes the sign bit of the quotient based on the signs of the dividend and divisor. DIVS initializes the AQ
status bit based on that sign, and initializes the dividend for the first addition or subtraction. DIVS performs no
addition or subtraction.
DIVQ either adds (dividend + divisor) or subtracts (dividend – divisor) based on the AQ status bit, then reinitializes
the AQ status bit and dividend for the next iteration. If AQ is 1, addition is performed; if AQ is 0, subtraction is
performed.
See “Status Bits Affected” on page 15-4 for the conditions that set and clear the AQ status bit.
Both instruction versions align the dividend for the next iteration by left shifting the dividend one bit to the left
(without carry). This left shift accomplishes the same function as aligning the divisor one bit to the right, such as
one would do in manual binary division.
The format of the quotient for any numeric representation can be determined by the format of the dividend and
divisor. Let:
• NL represent the number of bits to the left of the binal point of the dividend, and
• NR represent the number of bits to the right of the binal point of the dividend (numerator);
• DL represent the number of bits to the left of the binal point of the divisor, and
• DR represent the number of bits to the right of the binal point of the divisor (denominator).
Then the quotient has NL – DL + 1 bits to the left of the binal point and NR – DR – 1 bits to the right of the binal
point. See the following example.
Dividend (numerator) BBBB B . BBB BBBB BBBB BBBB BBBB BBBB BBBB
NL bits NR bits
NL - DL +1 NR - DR - 1
(5 - 2 + 1) (27 - 14 - 1)
4.12 format
The algorithm overflows if the result cannot be represented in the format of the quotient as calculated above, or
when the divisor is zero or less than the upper 16 bits of the dividend in magnitude (which is tantamount to multi-
plication).
It is important to understand error conditions related to this instruction. Two special cases can produce invalid or
inaccurate results. Software can trap and correct both cases.
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Divide Example
/* Evaluate given a signed integer dividend and divisor */
p0 = 15 ; /* Evaluate the quotient to 16 bits. */
r0 = 70 ; /* Dividend, or numerator */
r1 = 5 ; /* Divisor, or denominator */
r0 <<= 1 ; /* Left shift dividend by 1 needed for integer division */
divs (r0, r1) ; /* Evaluate quotient MSB. Initialize AQ status bit and dividend for the
DIVQ loop. */
loop .div_prim lc0=p0 ; /* Evaluate DIVQ p0=15 times. */
loop_begin .div_prim ;
divq (r0, r1) ;
loop_end .div_prim ;
r0 = r0.l (x) ; /* Sign extend the 16-bit quotient to 32bits. */
/* r0 contains the quotient (70/5 = 14). */
Shift (Dsp32Shf )
DREG_L Register Type = cc = bxor (a0, a1, cc)
Abstract
This instruction (linear feedback shift register, LFSR) provides a bit-wise XOR reduction of A0 logically AND'ed
with A1 and the feedback bit (CC). The result is placed into both the CC flag and the least significant bit of the
destination register. The Accumulator is not modified by this operation.
See Also (40-Bit BXORShift LSFR with Feedback to the Accumulator (BXORShift_NF), 32-Bit BXOR or BXOR-
Shift LSFR without Feedback (BXOR_NF))
BXOR Description
Four Bit-Wise Exclusive-OR (BXOR) instructions support two different types of linear feedback shift register
(LFSR) implementations. The Type I LFSRs (no feedback) applies a 32-bit registered mask to a 40-bit state residing
in Accumulator A0, followed by a bit-wise XOR reduction operation. The result is placed in CC and a destination
register half. The Type I LFSRs (with feedback) applies a 40-bit mask in Accumulator A1 to a 40-bit state residing
in A0. The result is shifted into A0. In the following circuits describing the BXOR instruction group, a bit-wise
XOR reduction is defined as:
Out = ( ( ( ( ( B 0 ⊕ B1 ) ⊕ B2 ) ⊕ B3 ) ⊕ ...) ⊕ Bn – 1 )
D[0] D[1]
A0[0] A0[1]
XOR Reduction
0 + + + + CC dreg_lo
IN
After Operation
dreg_lo[15:0]
After Operation
dreg_lo[15:0]
CC + + + +
Left Shift by 1
A1[39] A1[38] A1[37] A1[0] Following XOR
Reduction
IN
A0[39] A0[38] A0[37] A0[0]
After Operation
A0[39:0]
After Operation
dreg_lo[15:0]
Figure 8-39: XOR of A0 AND A1, to CC Bit and LSB of Dest Register
The Accumulator A0 is not modified by this operation. The upper 15 bits of dreg_lo are overwritten with zero, and
dr[0] = IN.
Special Applications Linear feedback shift registers (LFSRs) can multiply and divide polynomials and are often used
to implement cyclical encoders and decoders.
LFSRs use the set of Bit-Wise XOR instructions to compute bit XOR reduction from a state masked by a polyno-
mial.
When implementing a CRC algorithm, it is known that there is an equivalence between polynomial division and
LFSR circuits. For example, CRC is defined as the remainder of the division of a message polynomial appended
with n zeros by the code generator polynomial:
Cn(x)={Mk(x)xn}modGn(x)
Where:
• Mk-1(x) is the message polynomial of length k:
Mk-1(x)=mk-1xk-1+mk-2xk-2+…+m0x0
• Gn(x) is the CRC generating polynomial of degree n, and n is also the CRC field length in bits:
Gn(x)=xn+gn-1xn-1+…+g0x0
• Cn(x) is the calculated CRC polynomial of degree n:
Cn(x)=xn+cn-1xn-1+…+c0x0
The division is performed modulo-2 over Galois field GF2. In the above equation, the message stream Mk is post-
fixed by n zeros before the actual division. This equation can be implemented by one of two types of n taps LFSR's.
The more familiar type of LFSR is called Type II (or internal) LFSR of the form:
g0 g1 g2 gn-1
gn-1 gn-2 g1 g0
i <= k
Mk, 0..0 S0 S1 S2 Sn-1
i>k
Cn(x) from clock (k+1) to (k+n)
Mk, 0..0 S0 S1 S2 S3 S4
i <= k
Mk, 0..0 S0 S1 S2 S3 S4
i>k
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
BXOR Example
The BXOR and BXORSHIFT instructions let you calculate a Type I CRC at a rate of two cycles per input bit, as in
the following example program.
// _CRC_BXOR - calculate CRC value of a message polynomial
// for a given generator polynomial.
#define MSG_LEN 32 // bits
#define CRC_LEN 16 // bits
_CRC_BXOR:
a1 = a0 = 0;
r1 = 0x8408 (z); // LFSR polynomial, reversed:
// x^16 + x^12 + x^5 + 1
a1.w = r1; // initialize LFSR mask
r2.h = 0xd065; // r2 = message
r2.l = 0x86c9;
p1 = MSG_LEN (z);
loop _MSG_loop lc0 = p1;
loop_begin _MSG_loop;
r2 = rot r2 by 1;
a0 = bxorshift(a0, a1, cc);
loop_end _MSG_loop;
r0 = 0; // initialize CRC
r2.l = cc = bxor(a0, r1);
r0 = rot r0 by 1;
p1 = CRC_LEN-1 (z);
loop _CRC_loop lc0 = p1;
loop_begin _CRC_loop;
r2.l = cc = bxorshift(a0, r1);
r0 = rot r0 by 1;
loop_end _CRC_loop;
// r0.l now contains the CRC
_CRC_BXOR.end:
Shift (Dsp32Shf )
a0 = bxorshift (a0, a1, cc)
Abstract
This instruction (linear feedback shift register, LFSR) provides a bit-wise XOR reduction of A0 logically AND'ed
with A1 and the feedback bit (CC). The result is left-shifted into the least significant bit of A0. The CC bit is not
modified.
See Also (40-Bit BXOR LSFR with Feedback to a Register (BXOR), 32-Bit BXOR or BXORShift LSFR without
Feedback (BXOR_NF))
BXORShift_NF Description
Linear feedback shift register (LFSR) instruction. Provides a bit-wise XOR reduction of A0 logically AND'ed with
A1 and the feedback bit (CC). The result is left-shifted into the least significant bit of A0. The CC bit is not modi-
fied.
The bit-wise XOR reduction is defined as:
out = (((((CC &oplus B0) &oplus B1) &oplus B2) &oplus ...) &oplus Bn-1)
For more information about this instruction, see 40-Bit BXOR LSFR with Feedback to a Register (BXOR).
BXORShift_NF Example
For examples using this instruction, see 40-Bit BXOR LSFR with Feedback to a Register (BXOR).
Shift (Dsp32Shf )
DREG_L Register Type = cc = bxorshift (a0, DREG Register Type)
DREG_L Register Type = cc = bxor (a0, DREG Register Type)
Abstract
This instruction (linear feedback shift register, LFSR) provides a bit-wise XOR reduction of A0 or A0 shifted left
one, logically AND'ed with a 32-bit data register. The result is placed into both the CC flag and the least significant
bit of the destination register.
See Also (40-Bit BXORShift LSFR with Feedback to the Accumulator (BXORShift_NF), 40-Bit BXOR LSFR with
Feedback to a Register (BXOR))
BXOR_NF Description
Linear feedback shift register (LFSR) instruction. Provides a bit-wise XOR reduction of A0 or A0 shifted left one,
logically AND'ed with a 32-bit data register. The result is placed into both the CC flag and the least significant bit
of the destination register.
A bit-wise XOR reduction is defined as:
out = %%(((((%%B0 &oplus B1) &oplus B2) &oplus B3) &oplus ...) &oplus Bn-1)
For more information about this instruction, see 40-Bit BXOR LSFR with Feedback to a Register (BXOR).
ASTAT Flags
The table shows the affected ASTAT flags. For more information, see Arithmetic Status Register .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
. ... ... ... ... ... VS V ... ... ... ... AV1S AV1 AV0S AV0
... ... AC1 AC0 ... ... ... RND_ ... AQ CC ... ... ... AN AZ
MOD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
BXOR_NF Example
For examples using this instruction, see 40-Bit BXOR LSFR with Feedback to a Register (BXOR).
Video Operations
These operations provide video application specific operations on register operands:
Abstract
This instruction adds two 8-bit unsigned values to two 16-bit signed values, then it clips the result back to the 8-bit
unsigned range. The instruction either adds Y[3] & Y[1] or Y[2] & Y[0] to the two 16-bt X values. The lower two
bits of I0 and I1 are used to [[extractBytes | extract four contiguous bytes]] from the input register pair. The results
are written back to either the lower or higher bytes of each 16-bit result half. The unused bytes are filled with zeros.
See Also (Vectored 8-Bit Add or Subtract to 16-Bit (Byteop16P/M) (AddSub4x8))
AddClip Description
The dual 16-bit add/clip instruction adds two 8-bit unsigned values to two 16-bit signed values, then limits (or
“clips”) the result to the 8-bit unsigned range 0 through 255, inclusive. The instruction loads the results as bytes on
half-word boundaries in one 32-bit destination register. Some syntax options load the upper byte in the half-word
and others load the lower byte, as shown in the next few figures.
31 24 23 16 15 8 7 0
aligned_s rc_reg_0 y1 y0
aligned_s rc_reg_1 z3 z2 z1 z0
Figure 8-44: The versions that load the result into the lower byte (LO) produce:
31 24 23 16 15 8 7 0
Figure 8-45: The versions that load the result into the higher byte (HI) produce:
In either case, the unused bytes in the destination register are filled with 0x00. The 8-bit and 16-bit addition is
performed as a signed operation. The 16-bit operand is sign-extended to 32 bits before adding.
The only valid input source register pairs are R1:0 and R3:2.
The dual 16-bit add/clip instruction provides byte alignment directly in the source register pairs src_reg_0 and
src_reg_1 based on index registers I0 and I1.
• The two LSBs of the I0 register determine the byte alignment for source register pair src_reg_0 (typically
R1:0).
• The two LSBs of the I1 register determine the byte alignment for source register pair src_reg_1 (typically
R3:2).
The relationship between the I-register bits and the byte alignment is illustrated in the I-register Bits and the Byte
Alignment (no reverse) figure.
In the default source order case (for example, not the ( – , R) syntax), assuming a source register pair contains the
following.
This instruction prevents exceptions that would otherwise be caused by misaligned 32-bit memory loads issued in
parallel.
Figure 8-46: I-register Bits and the Byte Alignment (no reverse)
Options The ( – , R) syntax reverses the order of the source registers within each register pair. Typical high perform-
ance applications cannot afford the overhead of reloading both register pair operands to maintain byte order for ev-
ery calculation. Instead, they alternate and load only one register pair operand each time and alternate between the
forward and reverse byte order versions of this instruction. By default, the low order bytes come from the low regis-
ter in the register pair. The ( – , R) option causes the low order bytes to come from the high register.
In the optional reverse source order case (for example, using the ( – , R) syntax), the only difference is the source
registers swap places within the register pair in their byte ordering. Assume a source register pair contains the data
shown in the I-register Bits and the Byte Alignment (with reverse) figure.
Figure 8-47: I-register Bits and the Byte Alignment (with reverse)
A special application of this instruction is support for video motion compensation algorithms. The instruction sup-
ports the addition of the residual to a video pixel value, followed by unsigned byte saturation.
This 16-bit instruction may be issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
AddClip Example
r3 = byteop3p (r1:0, r3:2) (lo) ;
r3 = byteop3p (r1:0, r3:2) (hi) ;
r3 = byteop3p (r1:0, r3:2) (lo, r) ;
r3 = byteop3p (r1:0, r3:2) (hi, r) ;
Abstract
This instruction (Byteop16M and ByteOp16P) adds or subtracts two unsigned quad byte vectors, adjusting for byte
alignment. The lower two bits of I0 and I1 are used to [[extractBytes | extract four contiguous bytes]] from the input
register pair
See Also (Vectored 8-Bit to 16-Bit Add then Clip to 8-Bit (Byteop3P) (AddClip))
AddSub4x8 Description
The quad 8-bit add instruction adds two unsigned quad byte number sets byte-wise, adjusting for byte alignment. It
then loads the byte-wise results as 16-bit, zero-extended, half-words in two destination registers, as shown in the
next figures.
31 24 23 16 15 8 7 0
aligned_s rc_reg_0 y3 y2 y1 y0
aligned_s rc_reg_1 z3 z2 z1 z0
dest_reg_0: y1 + z1 y0 + z0
dest_reg_1: y3 + z3 y2 + z2
Figure 8-50: I-register Bits and the Byte Alignment (No Reverse)
This instruction prevents exceptions that would otherwise be caused by misaligned 32-bit memory loads issued in
parallel.
Options
The (R) syntax reverses the order of the source registers within each register pair. Typical high performance applica-
tions cannot afford the overhead of reloading both register pair operands to maintain byte order for every calcula-
tion. Instead, they alternate and load only one register pair operand each time and alternate between the forward
and reverse byte order versions of this instruction. By default, the low order bytes come from the low register in the
register pair. The (R) option causes the low order bytes to come from the high register.
In the optional reverse source order case (for example, using the (R) syntax), the only difference is the source regis-
ters swap places within the register pair in their byte ordering. Assume a source register pair contains the data shown
in the I-register Bits and the Byte Alignment (With Reverse) figure.
Figure 8-51: I-register Bits and the Byte Alignment (With Reverse)
The mnemonic derives its name from the fact that the operands are bytes, the result is 16 bits, and the arithmetic
operation is “plus” for addition.
A special application of this instruction provides packed data arithmetic typical of video and image processing appli-
cations.
This 16-bit instruction may be issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
AddSub4x8 Example
(r1,r2)= byteop16p (r3:2,r1:0) ;
(r1,r2)= byteop16p (r3:2,r1:0) (r) ;
Abstract
This instruction disables alignment excptions. This instruction only affects misaligned loads that use I registers. The
address is forced to be 32-bit aligned.
See Also (Byte Align (Shift_Align))
DisAlignExcept Description
The disable alignment exception for load (DISALGNEXCPT) instruction prevents exceptions that would otherwise
be caused by misaligned 32-bit memory loads issued in parallel. This instruction only affects misaligned 32-bit load
instructions that use I-register indirect addressing.
In order to force address alignment to a 32-bit boundary, the two LSBs of the address are cleared before being sent
to the memory system. The I-register is not modified by the DISALIGNEXCPT instruction. Also, any modifica-
tions performed to the I-register by a parallel instruction are not affected by the DISALIGNEXCPT instruction.
A special applications of this instruction is to use the DISALGNEXCPT instruction when priming data registers for
Quad 8-Bit single-instruction, multiple-data (SIMD) instructions.
Quad 8-Bit SIMD instructions require as many as sixteen 8-bit operands, four D-registers worth, to be preloaded
with operand data. The operand data is 8 bits and not necessarily word aligned in memory. Thus, use DISALG-
NEXCPT to prevent spurious exceptions for these potentially misaligned accesses.
During execution, when Quad 8-Bit SIMD instructions perform 8-bit boundary accesses, they automatically pre-
vent exceptions for misaligned accesses. No user intervention is required.
This 16-bit instruction may be issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
DisAlignExcept Example
disalgnexcpt || r1 = [i0++] || r3 = [i1++] ;
/* three instructions in parallel */
disalgnexcpt || [p0 ++ p1] = r5 || r3 = [i1++] ;
/* alignment exception is prevented only for the load */
disalgnexcpt || r0 = [p2++] || r3 = [i1++] ;
/* alignment exception is prevented only for the I-reg load */
Shift (Dsp32Shf )
DREG Register Type = align8 (DREG Register Type, DREG Register Type)
DREG Register Type = align16 (DREG Register Type, DREG Register Type)
DREG Register Type = align24 (DREG Register Type, DREG Register Type)
Abstract
This instruction copies four contiguous bytes from a register pair. The bytes are offset by 8, 16, or 24 bits in the
register pair.
See Also (Disable Alignment Exception (DisAlignExcept))
Shift_Align Description
The Byte Align instruction copies a contiguous four-byte unaligned word from a combination of two data registers.
The instruction version determines the bytes that are copied; in other words, the byte alignment of the copied word.
Alignment options are shown in Table 18-1.
src_reg_1 src_reg_0
byte 7 b yte 6 b yte 5 b yte 4 byte 3 byte 2 byte 1 byte 0
dest_reg for ALIGN8: byte 4 b yte 3 byte 2 b yte 1
dest_ reg for ALIGN16 : byte 5 b yte 4 byte 3 b yte 2
dest_ reg for ALIGN24 : byte 6 b yte 5 b yte 4 b yte 3
Use the Byte Align instruction to align data bytes for subsequent single- instruction, multiple-data (SIMD) instruc-
tions.
The input values are not implicitly modified by this instruction. The destination register can be the same D-register
as one of the source registers. Doing this explicitly modifies that source register.
This 16-bit instruction may be issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
Shift_Align Example
// If r3 = 0x0011 2233 and r4 = 0x4455 6677, then . . .
r0 = align8 (r3, r4) ; /* produces r0 = 0x3344 5566, */
r0 = align16 (r3, r4) ; /* produces r0 = 0x2233 4455, and */
r0 = align24 (r3, r4) ; /* produces r0 = 0x1122 3344, */
Abstract
This instruction averages the upper two bytes and the lower two bytes of two two unsigned quad byte vectors adjust-
ing for byte alignment. The lower two bits of I0 are used to [[extractBytes | extract four contiguous bytes]] from the
input register pair. Note that this operation only uses I0, which is different than all the other byteop instructions. If
you specify round (RND), a round bit is added prior to the shift. The RND_MOD bit in the ASTAT register has
no bearing on the rounding behavior of this instruction. It returns the two averages in either Dest[3] & Dest[1] or
Dest[2] & Dest[0].
See Also (Vector Byte Average (Byteop1P) (Avg8Vec))
Avg4x8Vec Description
The quad 8-bit average half-word instruction finds the arithmetic average of two unsigned quad byte number sets
byte wise, adjusting for byte alignment. This instruction averages four bytes together. The instruction loads the re-
sults as bytes on half-word boundaries in one 32-bit destination register. Some syntax options load the upper byte in
the half-word and others load the lower byte, as shown in the next few figures.
31 24 23 16 15 8 7 0
aligned_s rc_reg_0 y3 y2 y1 y0
aligned_s rc_reg_1 z3 z2 z1 z0
Figure 8-54: The versions that load the result into the lower byte – RNDL and TL – produce:
31 24 23 16 15 8 7 0
dest_reg a vg(y3, y2, z3, z2) 0..... . 0 a vg(y1, y0, z1, z0) 0. ... . . 0
Figure 8-55: The versions that load the result into the higher byte – RNDH and TH – produce:
In either case, the unused bytes in the destination register are filled with 0x00.
Arithmetic average (or mean) is calculated by summing the four byte operands, then shifting right two places to
divide by four.
When the intermediate sum is not evenly divisible by 4, precision may be lost.
The user has two options to bias the result–truncation or biased rounding.
The RND_MOD bit in the ASTAT register has no bearing on the rounding behavior of this instruction.
The only valid input source register pairs are R1:0 and R3:2.
The quad 8-bit average half-word instruction provides byte alignment directly in the source register pairs src_reg_0
(typically R1:0) and src_reg_1 (typically R3:2) based only on the I0 register. The byte alignment in both source
registers must be identical since only one register specifies the byte alignment for them both.
The relationship between the I-register bits and the byte alignment is shown in the I-register Bits and the Byte
Alignment (no reverse) figure. In the default source order case (for example, not the (R) syntax), assume a source
register pair contains the data shown in the figure.
Figure 8-56: I-register Bits and the Byte Alignment (no reverse)
This instruction prevents exceptions that would otherwise be caused by misaligned 32-bit memory loads issued in
parallel.
The quad 8-bit average half-word instruction supports the options shown in the Options for Quad 8-Bit Average --
Half-Word table.
When used together, the order of the options in the syntax makes no difference. In the optional reverse source order
case (for example, using the (R) syntax), the only difference is the source registers swap places within the register pair
in their byte ordering. Assume a source register pair contains the data shown in the I-register Bits and the Byte
Alignment (with reverse) figure.
Figure 8-57: I-register Bits and the Byte Alignment (with reverse)
The mnemonic derives its name from the fact that the operands are bytes, the result is two half-words, and the basic
arithmetic operation is “plus” for addition. The single destination register indicates that averaging is performed.
A special applications of this instruction is support for binary interpolation used in fractional motion search and
motion compensation algorithms.
This 16-bit instruction may be issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
Avg4x8Vec Example
r3 = byteop2p (r1:0, r3:2) (rndl) ;
r3 = byteop2p (r1:0, r3:2) (rndh) ;
r3 = byteop2p (r1:0, r3:2) (tl) ;
r3 = byteop2p (r1:0, r3:2) (th) ;
r3 = byteop2p (r1:0, r3:2) (rndl, r) ;
r3 = byteop2p (r1:0, r3:2) (rndh, r) ;
r3 = byteop2p (r1:0, r3:2) (tl, r) ;
r3 = byteop2p (r1:0, r3:2) (th, r) ;
Abstract
This instruction computes the vector average of two unsigned quad byte vectors adjusting for byte alignment. The
lower two bits of I0 and I1 are used to [[extractBytes | extract four contiguous bytes]] from the input register pair.
By default, this instruction rounds by adding a one prior to shifting. If you specify truncate, the round bit is not
added. The RND_MOD bit in the ASTAT register has no bearing on the rounding behavior of this instruction.
See Also (Quad Byte Average (Byteop2P) (Avg4x8Vec))
Avg8Vec Description
The quad 8-bit average byte instruction computes the arithmetic average of two unsigned quad byte number sets
byte wise, adjusting for byte alignment. This instruction loads the byte-wise results as concatenated bytes in one 32-
bit destination register, as shown in the next figures.
31 24 23 16 15 8 7 0
aligned_s rc_reg_0 y3 y2 y1 y0
aligned_s rc_reg_1 z3 z2 z1 z0
Figure 8-60: I-register Bits and the Byte Alignment (no reverse)
This instruction prevents exceptions that would otherwise be caused by misaligned 32-bit memory loads issued in
parallel.
Options
The quad 8-bit average byte instruction supports the options shown in the Options for Quad 8-Bit Average – Byte
table.
In the optional reverse source order case (for example, using the (R) syntax), the only difference is the source regis-
ters swap places within the register pair in their byte ordering. Assume a source register pair contains the data shown
in the I-register Bits and the Byte Alignment (with reverse) figure.
The bytes selected are src_reg_pair_ LO src_reg_pair_ HI
Two LSB’s of I0 or I1 b yte 7 b yte 6 b yte 5 b yte 4 b yte 3 b yte 2 b yte 1 b yte 0
00b: byte 3 b yte 2 byte 1 b yte 0
01b: byte 4 b yte 3 byte 2 b yte 1
10b: byte 5 b yte 4 byte 3 b yte 2
11b: byte 6 b yte 5 b yte 4 b yte 3
Figure 8-61: I-register Bits and the Byte Alignment (with reverse)
The mnemonic derives its name from the fact that the operands are bytes, the result is one word, and the basic
arithmetic operation is “plus” for addition. The single destination register indicates that averaging is performed.
A special application of this instruction is support for binary interpolation used in fractional motion search and mo-
tion compensation algorithms.
This 16-bit instruction may be issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
Avg8Vec Example
r3 = byteop1p (r1:0, r3:2) ;
r3 = byteop1p (r1:0, r3:2) (r) ;
r3 = byteop1p (r1:0, r3:2) (t) ;
r3 = byteop1p (r1:0, r3:2) (t,r) ;
Abstract
This instruction adds the accumulator half words together, then it extracts the result to a register. Each half word is
sign extended to 32-bits before being added. This operation is used to sum the results of the video [[.:SAD8Vec|Sum
of Absolute Differences]] instruction.
See Also (Vectored 8-Bit Sum of Absolute Differences (SAD8Vec))
AddAccHalf Description
The dual 16-bit accumulator eExtraction with addition instruction adds together the upper half-words (bits
31through 16) and lower half-words (bits 15 through 0) of each Accumulator and loads each result into a 32-bit
destination register.
Each 16-bit half-word in each Accumulator is sign extended before being added together.
A special application of this instruction is to use the dual 16-bit accumulator extraction with addition instruction for
motion estimation algorithms in conjunction with the quad 8-bit subtract-absolute-accumulate instruction.
This 16-bit instruction may be issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
AddAccHalf Example
r4=a1.l+a1.h, r7=a0.l+a0.h ;
Abstract
This instruction does a vector 8-bit subtract, takes the absolute value of the differences and accumulates them ad-
justing for byte alignment. The lower two bits of I0 and I1 are used to [[extractBytes | extract four contiguous
bytes]] from the input register pair. The four 16-bit results are stored in A1.h, A1.l, A0.h and A0.l These will satu-
rate if the unsigned 16-Bit sections of the accumulator overflow.
See Also (Dual Accumulator Extraction with Addition (AddAccHalf ))
SAD8Vec Description
The quad 8-bit subtract-absolute-accumulate instruction subtracts four pairs of values, takes the absolute value of
each difference, and accumulates each result into a 16-bit Accumulator half. The results are placed in the upper- and
lower-half Accumulators A0.H, A0.L, A1.H, and A1.L.
Saturation is performed if an operation overflows a 16-bit Accumulator half.
This instruction supports the following byte-wise Sum of Absolute Difference (SAD) calculations.
N –1 N– 1
S AD = ∑ ∑ a (i ,j) – b( i, j)
i=0 j=0
A1.H +=| a(i, j+3) A1.L +=| a(i, j+2) A0.H +=| a(i, j+1) A0.L +=| a(i, j)
-b(i, j+3) | - b(i, j+2) | - b(i, j+1) | - b(i, j) |
• The two LSBs of the I0 register determine the byte alignment for source register pair src_reg_0 (typically
R1:0).
• The two LSBs of the I1 register determine the byte alignment for source register pair src_reg_1 (typically
R3:2).
The relationship between the I-register bits and the byte alignment is shown in the I-register Bits and the Byte
Alignment (no reverse) figure.
In the default source order case (for example, not the (R) syntax), assume a source register pair contain the data
shown in the figure.
Figure 8-64: I-register Bits and the Byte Alignment (no reverse)
This instruction prevents exceptions that would otherwise be caused by misaligned 32-bit memory loads issued in
parallel.
Options
The (R) syntax reverses the order of the source registers within each pair. Typical high performance applications can-
not afford the overhead of reloading both register pair operands to maintain byte order for every calculation. Instead,
they alternate and load only one register pair operand each time and alternate between the forward and reverse byte
order versions of this instruction. By default, the low order bytes come from the low register in the register pair. The
(R) option causes the low order bytes to come from the high register.
When reversing source order by using the (R) syntax, the source registers swap places within the register pair in their
byte ordering. If a source register pair contains the data shown in the I-register Bits and the Byte Alignment (with
reverse) figure, then the SAA instruction computes 12 pixel operations simultaneously–the three-operation subtract-
absolute-accumulate on four pairs of operand bytes in parallel.
Figure 8-65: I-register Bits and the Byte Alignment (with reverse)
A special application of this instruction is to use the quad 8-bit subtract-absolute-accumulate instruction for block-
based video motion estimation algorithms using block sum of absolute difference (SAD) calculations to measure dis-
tortion.
This 16-bit instruction may be issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
SAD8Vec Example
saa (r1:0, r3:2) || r0 = [i0++] || r2 = [i1++] ; /* parallel fill instructions */
saa (r1:0, r3:2) (R) || r1 = [i0++] || r3 = [i1++] ; /* reverse, parallel fill instructions
*/
saa (r1:0, r3:2) ; /* last SAA in a loop, no more fill required */
Viterbi Operations
These operations provide Viterbi application specific operations on register operands:
• 16-Bit Add on Sign (AddOnSign)
• 16-Bit Modulo Maximum with History (Shift_VitMax)
• Dual 16-Bit Modulo Maximum with History (Shift_DualVitMax)
Abstract
This instruction does a vector multiply of the signs SRC0.h and SRC0.l by the values of SRC1.h and SRC1.l Then,
it adds the two results and stores it in both of the destination half registers. This instruction does not saturate if the
result is greater than 16-Bits.
See Also (16-Bit Modulo Maximum with History (Shift_VitMax), Dual 16-Bit Modulo Maximum with History
(Shift_DualVitMax))
AddOnSign Description
The Add on Sign instruction performs a two step function, as follows.
Step 1
Multiply the arithmetic sign of a 16-bit half-word number in src0 by the corresponding half-word number in
src1. The arithmetic sign of src0 is either (+1) or (–1), depending on the sign bit of src0. The instruction
performs this operation on the upper and lower half-words of the same data registers.
The results of this step obey the signed multiplication rules summarized in the Signed Multiplication Rules
table. Y is the number in src0, and Z is the number in src1. The numbers in src0 and src1 may be positive or
negative. Note the result always bears the magnitude of Z with only the sign affected.
Step 2
Add the sign-adjusted src1 upper and lower half-word results together and store the same 16-bit sum in the
upper and lower halves of the destination register, as shown in the next figures.
31 24 23 16 15 8 7 0
src0: a1 a0
src1: b1 b0
AddOnSign Example
r7.h = r7.l = sign(r2.h) * r3.h + sign(r2.l) * r3.l ;
• If
R2.H = 2
R3.H = 23
R2.L = 2001
R3.L = 1234
then
• If
R2.H = –2
R3.H = 23
R2.L = 2001
R3.L = 1234
then
R7.H = 1211 (or 1234 – 23)
R7.L = 1211
• If
R2.H = 2
R3.H = 23
R2.L = –2001
R3.L = 1234
then
R7.H = –1211 (or (–1234) + 23)
R7.L = –1211
• If
R2.H = –2
R3.H = 23
R2.L = –2001
R3.L = 1234
then
R7.H = –1257 (or (–1234) – 23)
R7.L = –1257
Shift (Dsp32Shf )
DREG Register Type = vit_max (DREG Register Type, DREG Register Type) (asl)
DREG Register Type = vit_max (DREG Register Type, DREG Register Type) (asr)
Abstract
This instruction performs maximum value selection and history update. It is used to implement the selection func-
tion of a Viterbi decoder. It performs a dual maximum value selection storing the two results in one destination
register.
See Also (16-Bit Add on Sign (AddOnSign), 16-Bit Modulo Maximum with History (Shift_VitMax))
Shift_DualVitMax Description
Maximum value selection and history update. This instruction is used to implement the selection function of a Vi-
terbi decoder. It performs a dual maximum value selection storing the two results in one destination register. In ad-
dition shifts left A0 by two bit positions, and stores two bits in A0 representing the result of the two maximum value
selections in bit1 and bit0 of A0. No attempt to correct the selection on overflow should be made. This ensures that
overflowed path metrics compare correctly, as long as they are close to each other in magnitude.
Reg1: PM3 PM2 Reg0: PM1 PM0
| | | |
MAX----+ +--------MAX
| |
v v
RegOut: H L
If the user specifies ASR or ASL this will shift in two bits into the accumulator specifying which 16-bit half register
was the max. For ASR it will shift the history bits right, for ASL it will shift them left. To compute the maximum
this instruction uses a form of modulo arithmetic where 0x8000 > 0x7fff > 0 > 0xffff > 0x8000.
For more information, see the 16-Bit Modulo Maximum with History (Shift_VitMax) instruction.
Shift_DualVitMax Example
For examples using this instruction, see the 16-Bit Modulo Maximum with History (Shift_VitMax) instruction.
Shift (Dsp32Shf )
DREG_L Register Type = vit_max (DREG Register Type) (asl)
DREG_L Register Type = vit_max (DREG Register Type) (asr)
Abstract
If the user specifies ASR or ASL, this instruction shifts in a bit into the accumulator specifying which 16-bit half
register was the max. For ASR, it will shift the history bits right. For ASL, it will shift them left. To compute the
maximum, this instruction uses a form of [[.:modulo_comparisons|modulo arithmetic]] where 0x8000 > 0x7fff > 0
> 0xffff > 0x8000 .
See Also (16-Bit Add on Sign (AddOnSign), Dual 16-Bit Modulo Maximum with History (Shift_DualVitMax))
Shift_VitMax Description
The Compare-Select (VIT_MAX) instruction selects the maximum values of pairs of 16-bit operands, returns the
largest values to the destination register, and serially records in A0.W the source of the maximum.This operation
performs signed operations. The operands are compared as two’s-complements.
The Accumulator extension bits (bits 39–32) must be cleared before executing this instruction.
Conversely, the ASR version shifts A0 right two bit positions and appends two MSBs to indicate the source of
each maximum as shown in Table 19-8 and Table 19-9.
A0.X A0.W
A0 00000000 BBXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Notice that the history bit code depends on the A0 shift direction. The bit for src_reg_1 is always shifted onto
A0 first, followed by the bit for src_reg_0. The single operand versions behave similarly.
The path metrics are allowed to overflow, and maximum comparison is done on the two’s-complement circle. Such
comparison gives a better indication of the relative magnitude of two large numbers when a small number is added/
subtracted to both.
A special application of this instruction is to use the Compare-Select (VIT_MAX) instruction as a key element of
the Add-Compare-Select (ACS) function for Viterbi decoders. Combine it with a Vector Add instruction to calcu-
late a trellis butterfly used in ACS functions.
This 32-bit instruction may be issued in parallel with certain other instructions.
This instruction may be used in either User or Supervisor mode.
Shift_VitMax Example
• For:
r5 = vit_max(r3, r2)(asl) ; /* shift left, dual operation */
Assume:
R3 = 0xFFFF 0000
R2 = 0x0000 FFFF
A0 = 0x00 0000 0000
• For:
r7 = vit_max (r1, r0) (asr) ; /* shift right, dual operation */
Assume:
R1 = 0xFEED BEEF
R0 = 0xDEAF 0000
A0 = 0x00 0000 0000
• For:
r3.l = vit_max (r1)(asl) ; /* shift left, single operation */
Assume:
R1 = 0xFFFF 0000
A0 = 0x00 0000 0000
• For:
r3.l = vit_max (r1)(asr) ; /* shift right, single operation */
Assume:
R1 = 0x1234 FADE
A0 = 0x00 FFFF FFFF
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
OPC[3:0] DST[2:0]
SRC[2:0]
OPC[3:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
T OFF[9:0]
OFF[9:0]
The following table provides the opcode field values (T, B), the instruction syntax overview (Syntax), and a link to
the corresponding instruction reference page (Instruction)
T B Syntax Instruction
0 0 if !cc jump imm10s2 Register Type Conditional Jump Immediate
(BrCC)
0 1 if !cc jump imm10s2 Register Type Conditional Jump Immediate
(bp) (BrCC)
1 0 if cc jump imm10s2 Register Type Conditional Jump Immediate
(BrCC)
1 1 if cc jump imm10s2 Register Type Conditional Jump Immediate
(bp) (BrCC)
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
REG[2:0]
OPC[1:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
CBIT[4:0]
OP[1:0]
D OP Syntax Instruction
0 00 cc = CBIT Move Status to CC
(MvToCC_STAT)
0 01 cc |= CBIT Move Status to CC
(MvToCC_STAT)
0 10 cc &= CBIT Move Status to CC
(MvToCC_STAT)
0 11 cc ^= CBIT Move Status to CC
(MvToCC_STAT)
1 00 CBIT = cc Move CC To/From AS-
TAT (CCToStat16)
1 01 CBIT |= cc Move CC To/From AS-
TAT (CCToStat16)
1 10 CBIT &= cc Move CC To/From AS-
TAT (CCToStat16)
1 11 CBIT ^= cc Move CC To/From AS-
TAT (CCToStat16)
CBIT
CBIT Encode Table
CBIT Syntax
00000 az
00001 an
CBIT Syntax
00110 aq
01000 rnd_m
od
01100 ac0
01101 ac1
10000 av0
10001 av0s
10010 av1
10011 av1s
11000 v
11001 vs
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
I X[2:0]
OPC[2:0] Y[2:0]
OPC[2:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
T SRC[2:0]
DST[2:0]
T Syntax Instruction
1 if cc GDST = GSRC Conditional Move Register to
Register (MvRegToRegCond)
0 if !cc GDST = GSRC Conditional Move Register to
Register (MvRegToRegCond)
GDST
GDST Encode Table
D DST Syntax
0 --- DREG Register Type
1 --- PREG Register Type
GSRC
GSRC Encode Table
S SRC Syntax
0 --- DREG Register Type
1 --- PREG Register Type
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0
REG[2:0]
OPC[1:0]
PREGA
PREGA Encode Table
A Syntax
0 PREG Register Type
1 PREG Register Type++
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0
S SW[23:16]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
SW[15:0] SW[15:0]
S Syntax Instruction
0 jump.l imm24s2 Register Type Jump Immediate (JumpAbs)
1 call imm24nxs2 Register Type Call (Call)
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0
OPC[2:0] SRC0[2:0]
DST[2:0] SRC1[2:0]
DST[2:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
OPC DST[2:0]
SRC[6:0] SRC[6:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0
OPC DST[2:0]
SRC[6:0] SRC[6:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 1 1 1 0 1 1 0 0 0 0 0
I[1:0]
OPC[1:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 1 1 0 0 1 1 0 0 0 0 0
I[1:0]
M[1:0]
OP
BR
OP BR Syntax Instruction
0 0 IREG Register Type += MREG Register 32-bit Add or Subtract (DagAdd32)
Type
0 1 IREG Register Type += MREG Register 32-bit Add or Subtract (DagAdd32)
Type (brev)
1 0 IREG Register Type -= MREG Register 32-bit Add or Subtract (DagAdd32)
Type
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0
DEAD[2:0] AOPC[4:0]
HL
DEAD[2:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
AOP[1:0] SRC1[2:0]
S SRC0[2:0]
X DST1[2:0]
DST0[2:0]
DST1[2:0]
A0_HL
A0_HL Encode Table
HL Syntax
0 a0.l
HL Syntax
1 a0.h
A1_HL
A1_HL Encode Table
HL Syntax
0 a1.l
1 a1.h
AOPL
AOPL Encode Table
AOP Syntax
00 +|+
01 +|-
10 -|+
11 -|-
DDST0_HL
DDST0_HL Encode Table
HL Syntax
0 DREG_L Register Type
1 DREG_H Register Type
DSRC0_HL
DSRC0_HL Encode Table
HL Syntax
0 DREG_L Register Type
1 DREG_H Register Type
NSAT
NSAT Encode Table
S Syntax Description
0 (ns) The (ns) option directs the ALU not to saturate the result.
1 (s) The (s) option directs the ALU to saturate the result at 16 or 32 bits,
depending on the operand size.
PAIR0
PAIR0 Encode Table
SRC0 Syntax
00- r1:0
01- r3:2
PAIR1
PAIR1 Encode Table
SRC1 Syntax
00- r1:0
01- r3:2
RS
RS Encode Table
S Syntax Description
0 The default (no option select) operation directs the ALU to execute the
operation with no optional modification to the result.
1 (r) The (r) option directs the ALU to reverse the order of the source regis-
ters within each register pair.
RSC
RSC Encode Table
S Syntax Description
0 The default (no option select) operation directs the ALU to execute the
operation with no optional modification to the result.
1 ,r The ,r option directs the ALU to reverse the order of the source regis-
ters within each register pair.
SAT
SAT Encode Table
S Syntax Description
0 (ns) The (ns) option directs the ALU not to saturate the result.
1 (s) The (s) option directs the ALU to saturate the result at 16 or 32 bits, depending on the
operand size.
SAT2
SAT2 Encode Table
S Syntax Description
0 (ns) The (ns) option directs the ALU not to saturate the result.
1 (s) The (s) option directs the ALU to saturate the result at 16 or 32 bits,
depending on the operand size.
SMODE
SMODE Encode Table
SX
SX Encode Table
S X Syntax Description
0 0 The default (no option select) operation directs the ALU to execute the operation with no
optional modification to the result.
0 1 (co) The (co) option directs the ALU to swap (cross order) the order of the results in the desti-
nation register.
1 0 (s) The (s) option directs the ALU to saturate the result at 16 or 32 bits, depending on the
operand size.
1 1 (sco) The (sco) option directs the ALU to apply the combination of the (co) and (s) op-
tions.
SXA
SXA Encode Table
10 0 0 (asr) The (asr) option directs the ALU to arithmetic shift right, halving the
result (divide by 2) before storing in the destination register.
10 0 1 (co, asr) The (co, asr) option directs the ALU to to apply the combination of
the (asr) and (co) options
10 1 0 (s, asr) The (s, asr) option directs the ALU to arithmetic shift right (halving
the result; divide by 2) then saturate before storing in the destination
register.
10 1 1 (sco, asr) The (sco, asr) option directs the ALU to apply the combination of
the (asr) and (sco) options
11 0 0 (asl) The (asl) option directs the ALU to arithmetic shift left, doubling the
result (multiply by 2, truncated) before storing in the destination regis-
ter.
11 0 1 (co, asl) The (co, asl) option directs the ALU to to apply the combination of
the (asl) and (co) options
XMODE
XMODE Encode Table
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
MMOD[3:0] OP1[1:0]
W1
MM
MMOD[3:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
H01 SRC1[2:0]
H11 SRC0[2:0]
W0 DST[2:0]
OP0[1:0]
H00
H10
DST[2:0]
The following table provides the opcode field values (MMOD), the instruction syntax overview (Syntax), and a link
to the corresponding instruction reference page (Instruction)
MMOD Syntax
0--- TRADMAC
10-- TRADMAC
1100 TRADMAC
1101 CMPLXMAC
111- CMPLXMAC
CMODE
CMODE Encode Table
CMPLXMAC
CMPLXMAC Encode Table
CMPLXOP
CMPLXOP Encode Table
OP1 Syntax
00 cmul(DREG Register Type, DREG Register
Type)
01 cmul(DREG Register Type, DREG Register
Type*)
10 cmul(DREG Register Type*, DREG Regis-
ter Type*)
MAC0
MAC0 Encode Table
OP0 Syntax
00 a0 = MAC0S
01 a0 += MAC0S
OP0 Syntax
10 a0 -= MAC0S
MAC0S
MAC0S Encode Table
MAC1
MAC1 Encode Table
OP1 Syntax
00 a1 = MAC1S
01 a1 += MAC1S
10 a1 -= MAC1S
MAC1S
MAC1S Encode Table
MML
MML Encode Table
MM Syntax Description
0 The default (no option) operation directs the MAC to use signed fraction format. Multiply
1.15 * 1.15 formats to produce 1.31 results after shift correction. The special case of
0x8000 * 0x8000 is saturated to 0x7FFF FFFF to fit the 1.31 result. Sign extend 1.31 re-
sult to 9.31 format before copying or accumulating to Accumulator. Then, saturate Accu-
mulator to maintain 9.31 precision; Accumulator result is between minimum 0x80 0000
0000 and maximum 0x7F FFFF FFFF. To extract to half register, round Accumulator 9.31
format value at bit 16. (The ASTAT.RND_MOD bit controls the rounding.) Saturate the re-
sult to 1.15 precision and copy it to the destination register half. Result is between mini-
mum -1 and maximum 1-2-15 (or, expressed in hex, between minimum 0x8000 and maxi-
mum 0x7FFF). To extract to full register, saturate the result to 1.31 precision and copy it to
the destination register. Result is between minimum -1 and maximum 1-2-31 (or, expressed
in hex, between minimum 0x8000 0000 and maximum 0x7FFF FFFF).
1 (m) The (m) option directs the MAC to use mixed mode multiply format (valid only for
MAC1). When issued in a fraction mode instruction (with default, FU, T, TFU, or
S2RND mode), multiply 1.15 * 0.16 to produce 1.31 results. When issued in an integer
mode instruction (with IS, ISS2, or IH mode), multiply 16.0 * 16.0 (signed * unsigned) to
produce 32.0 results. No shift correction in either case. Src_reg_0 is the signed operand
and Src_reg_1 is the unsigned operand. Accumulation and extraction proceed according
to the other mode selection or default.
MMLMMOD0
MMLMMOD0 Encode Table
MMLMMOD1
MMLMMOD1 Encode Table
MMLMMODE
MMLMMODE Encode Table
MMOD0
MMOD0 Encode Table
MM Syntax Description
OD
0000 The default (no option) operation directs the MAC to use signed fraction format. Multiply
1.15 * 1.15 formats to produce 1.31 results after shift correction. The special case of
0x8000 * 0x8000 is saturated to 0x7FFF FFFF to fit the 1.31 result. Sign extend 1.31 re-
sult to 9.31 format before copying or accumulating to Accumulator. Then, saturate Accu-
mulator to maintain 9.31 precision; Accumulator result is between minimum 0x80 0000
0000 and maximum 0x7F FFFF FFFF. To extract to half register, round Accumulator 9.31
format value at bit 16. (The ASTAT.RND_MOD bit controls the rounding.) Saturate the re-
sult to 1.15 precision and copy it to the destination register half. Result is between mini-
mum -1 and maximum 1-2-15 (or, expressed in hex, between minimum 0x8000 and maxi-
mum 0x7FFF). To extract to full register, saturate the result to 1.31 precision and copy it to
the destination register. Result is between minimum -1 and maximum 1-2-31 (or, expressed
in hex, between minimum 0x8000 0000 and maximum 0x7FFF FFFF).
0011 (w32) The (w32) option directs the MAC to use signed fraction with 32-bit saturation. Multiply
1.15 x 1.15 to produce 1.31 format data after shift correction. Sign extend the result to
9.31 format before passing it to the accumulator. Saturate the accumulator after copying or
accumulating at bit 31 to maintain 1.31 precision. Result is between minimum -1 and
maximum 1-2-31 (or, expressed in hex, between minimum 0xFF 8000 0000 and maximum
0x00 7FFF FFFF).
0100 (fu) The (fu) option directs the MAC to use unsigned fraction format. Multiply 0.16* 0.16
formats to produce 0.32 results. No shift correction. The special case of 0x8000 * 0x8000
yields 0x4000 0000. No saturation is necessary since no shift correction occurs. Zero ex-
tend 0.32 result to 8.32 format before copying or accumulating to accumulator. Then, sat-
urate accumulator to maintain 8.32 precision; accumulator result is between minimum
0x00 0000 0000 and maximum 0xFF FFFF FFFF. To extract to half register, round accu-
mulator 8.32 format value at bit 16. (The ASTAT.RND_MOD bit controls the rounding.)
Saturate the result to 0.16 precision and copy it to the destination register half. Result is
between minimum 0 and maximum 1-2-16 (or, expressed in hex, between minimum
0x0000 and maximum 0xFFFF). To extract to full register, saturate the result to 0.32 preci-
sion and copy it to the destination register. Result is between minimum 0 and maximum
1-2-32 (or, expressed in hex, between minimum 0x0000 0000 and maximum 0xFFFF
FFFF).
1000 (is) The (is) option directs the MAC to use signed integer format. Multiply 16.0 * 16.0 for-
mats to produce 32.0 results. No shift correction. Sign extend 32.0 result to 40.0 format
before copying or accumulating to accumulator. Then, saturate accumulator to maintain
40.0 precision; accumulator result is between minimum 0x80 0000 0000 and maximum
0x7F FFFF FFFF. To extract to half register, extract the lower 16 bits of the accumulator.
Saturate for 16.0 precision and copy to the destination register half. Result is between mini-
mum -215 and maximum 215-1 (or, expressed in hex, between minimum 0x8000 and max-
imum 0x7FFF). To extract to full register, saturate for 32.0 precision and copy to the desti-
nation register. Result is between minimum -231 and maximum 231-1 (or, expressed in hex,
between minimum 0x8000 0000 and maximum 0x7FFF FFFF).
MMOD1
MMOD1 Encode Table
MMODE
MMODE Encode Table
NARROWING_CMODE
NARROWING_CMODE Encode Table
TRADMAC
TRADMAC Encode Table
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
MMOD[3:0] OP1[1:0]
W1
MM
MMOD[3:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
H01 SRC1[2:0]
H11 SRC0[2:0]
W0 DST[2:0]
OP0[1:0]
H00
H10
DST[2:0]
M32MMOD
M32MMOD Encode Table
M32MMOD1
M32MMOD1 Encode Table
M32MMOD2
M32MMOD2 Encode Table
MML
MML Encode Table
MM Syntax Description
0 The default (no option selected) operation directs the multiplier to use signed fraction for-
mat. Multiply 1.15 * 1.15 formats to produce 1.31 results after shift correction. The special
case of 0x8000 * 0x8000 is saturated to 0x7FFF FFFF to fit the 1.31 result. Sign extend
1.31 result to 9.31 format before copying or accumulating to accumulator. Then, saturate
accumulator to maintain 9.31 precision; accumulator result is between minimum 0x80
0000 0000 and maximum 0x7F FFFF FFFF. To extract to half register, round accumulator
9.31 format value at bit 16. (The ASTAT.RND_MOD bit controls the rounding.) Saturate
the result to 1.15 precision and copy it to the destination register half. Result is between
minimum -1 and maximum 1-2-15 (or, expressed in hex, between minimum 0x8000 and
maximum 0x7FFF). To extract to full register, saturate the result to 1.31 precision and
copy it to the destination register. Result is between minimum -1 and maximum 1-2-31 (or,
expressed in hex, between minimum 0x8000 0000 and maximum 0x7FFF FFFF).
1 (m) The (m) option directs the multiplier to use mixed mode multiply format (valid only for
MAC1). When issued in a fraction mode instruction (with default, FU, T, TFU, or
S2RND mode), multiply 1.15 * 0.16 to produce 1.31 results. When issued in an integer
MM Syntax Description
mode instruction (with IS, ISS2, or IH mode), multiply 16.0 * 16.0 (signed * unsigned) to
produce 32.0 results. No shift correction in either case. Src_reg_0 is the signed operand
and Src_reg_1 is the unsigned operand. Accumulation and extraction proceed according
to the other mode selection or default.
MMLMMOD1
MMLMMOD1 Encode Table
MMLMMODE
MMLMMODE Encode Table
MMOD1
MMOD1 Encode Table
MMODE
MMODE Encode Table
MUL0
MUL0 Encode Table
MUL1
MUL1 Encode Table
Shift (Dsp32Shf)
Dsp32Shf Instruction Syntax
Shift (Dsp32Shf)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0
SOPC[4:0]
DEAD[1:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
SOP[1:0] SRC1[2:0]
HLS[1:0] SRC0[2:0]
DST[2:0]
The following table provides the opcode field values (SOPC, SOP, HLS), the instruction syntax overview (Syntax),
and a link to the corresponding instruction reference page (Instruction)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
1 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0
SOPC[4:0]
DEAD[1:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
SOP[1:0] SRC[2:0]
HLS[1:0] IMM[5:0]
DST[2:0]
IMM[5:0]
AHSH4
AHSH4 Encode Table
AHSH4S
AHSH4S Encode Table
AHSH4VS
AHSH4VS Encode Table
ASH5
ASH5 Encode Table
ASH5S
ASH5S Encode Table
LHSH4
LHSH4 Encode Table
IMM Syntax
000000 << 0
00---- << uimm4nz Register Type
11---- >> uimm4nznegpos Register Type
LSH5
LSH5 Encode Table
IMM Syntax
000000 << 0
0----- << uimm5nz Register Type
1----- >> uimm5nznegpos Register Type
Load/Store (DspLdSt)
DspLdSt Instruction Syntax
Load/Store (DspLdSt)
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0
W REG[2:0]
AOP[1:0] I[1:0]
M[1:0]
AOP[1:0]
The following table provides the opcode field values (W, M, AOP), the instruction syntax overview (Syntax), and a
link to the corresponding instruction reference page (Instruction)
63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48
1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0
C REL
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
IMM[31:16] IMM[31:16]
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
IMM[15:0] IMM[15:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48
1 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0
REG[2:0]
GRP[2:0]
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
IMM[31:16] IMM[31:16]
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
IMM[15:0] IMM[15:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
DUMMY[15:0] DUMMY[15:0]
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0
REG[2:0]
GRP[1:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
HWORD[15:0] HWORD[15:0]
H Z S Syntax Instruction
0 0 0 DST_L = imm16 Register Type 16-Bit Register Initialization (LdImmTo-
DregHL)
H Z S Syntax Instruction
0 0 1 DST = imm16 Register Type (x) 32-Bit Register Initialization (LdImmToReg)
0 1 0 DST = rimm16 Register Type (z) 32-Bit Register Initialization (LdImmToReg)
1 0 0 DST_H = imm16 Register Type 16-Bit Register Initialization (LdImmTo-
DregHL)
DST
DST Encode Table
DST_H
DST_H Encode Table
DST_L
DST_L Encode Table
Load/Store (LdSt)
LdSt Instruction Syntax
Load/Store (LdSt)
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
SZ[1:0] REG[2:0]
W PTR[2:0]
AOP[1:0] Z
AOP[1:0]
63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48
1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
IMM[31:16] IMM[31:16]
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
IMM[15:0] IMM[15:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
SZ[1:0] REG[2:0]
W Z
W SZ Z Syntax Instruction
0 00 0 DREG Register Type = [uimm32 32-Bit Load from Memory
Register Type] (LdM32bitToDreg)
0 00 1 PREG Register Type = [uimm32 32-Bit Load from Memory
Register Type] (LdM32bitToDreg)
0 01 0 DREG Register Type = w[uimm32 16-Bit Load from Memory to 32-
Register Type] (z) Bit Register (LdM16bitToDreg)
0 01 1 DREG Register Type = w[uimm32 16-Bit Load from Memory to 32-
Register Type] (x) Bit Register (LdM16bitToDreg)
0 10 0 DREG Register Type = b[uimm32 8-Bit Load from Memory to 32-bit
Register Type] (z) Register (LdM08bitToDreg)
W SZ Z Syntax Instruction
0 10 1 DREG Register Type = b[uimm32 8-Bit Load from Memory to 32-bit
Register Type] (x) Register (LdM08bitToDreg)
0 11 0 DREG_L Register Type = 16-Bit Load from Memory
w[uimm32 Register Type] (LdM16bitToDregL)
0 11 1 DREG_H Register Type = 16-Bit Load from Memory
w[uimm32 Register Type] (LdM16bitToDregH)
1 00 0 [uimm32 Register Type] = DREG 32-Bit Store to Memory (StDreg-
Register Type ToM32bit)
1 00 1 [uimm32 Register Type] = PREG 32-Bit Store to Memory (StDreg-
Register Type ToM32bit)
1 01 0 w[uimm32 Register Type] = 16-Bit Store to Memory (StDregL-
DREG Register Type ToM16bit)
1 10 0 b[uimm32 Register Type] = DREG 8-Bit Store to Memory (StDreg-
Register Type ToM08bit)
1 11 1 w[uimm32 Register Type] = 16-Bit Store to Memory (StDregH-
DREG_H Register Type ToM16bit)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0
W REG[2:0]
Z PTR[2:0]
SZ[1:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
W REG[2:0]
OP[1:0] PTR[2:0]
OFF[3:0] OFF[3:0]
W OP Syntax Instruction
0 00 DREG Register Type = [PREG Register 32-Bit Load from Memory (LdM32bitTo-
Type + uimm4s4 Register Type] Dreg)
0 01 DREG Register Type = w[PREG Register 16-Bit Load from Memory to 32-Bit Regis-
Type + uimm4s2 Register Type] (z) ter (LdM16bitToDreg)
0 10 DREG Register Type = w[PREG Register 16-Bit Load from Memory to 32-Bit Regis-
Type + uimm4s2 Register Type] (x) ter (LdM16bitToDreg)
1 00 [PREG Register Type + uimm4s4 Register 32-Bit Store to Memory (StDregToM32bit)
Type] = DREG Register Type
1 01 w[PREG Register Type + uimm4s2 Register 16-Bit Store to Memory (StDregL-
Type] = DREG Register Type ToM16bit)
1 11 [PREG Register Type + uimm4s4 Register Store Pointer (StPregToM32bit)
Type] = PREG Register Type
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0
W REG[2:0]
OFF[4:0] G
OFF[4:0]
W G Syntax Instruction
0 0 DREG Register Type = [fp - 32-Bit Load from Memory
imm5nzs4negpos Register (LdM32bitToDreg)
Type]
1 0 [fp - imm5nzs4negpos Register 32-Bit Store to Memory
Type] = DREG Register Type (StDregToM32bit)
1 1 [fp - imm5nzs4negpos Register 32-Bit Store to Memory
Type] = PREG Register Type (StDregToM32bit)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0
W REG[2:0]
Z PTR[2:0]
SZ[1:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
OFF[15:0] OFF[15:0]
W SZ Z Syntax Instruction
0 00 0 DREG Register Type = [PREG Register 32-Bit Load from Memory (LdM32bitTo-
Type + imm16s4 Register Type] Dreg)
0 00 1 PREG Register Type = [PREG Register Type 32-Bit Pointer Load from Memory
+ imm16s4 Register Type] (LdM32bitToPreg)
0 01 0 DREG Register Type = w[PREG Register 16-Bit Load from Memory to 32-Bit Regis-
Type + imm16s2 Register Type] (z) ter (LdM16bitToDreg)
0 01 1 DREG Register Type = w[PREG Register 16-Bit Load from Memory to 32-Bit Regis-
Type + imm16s2 Register Type] (x) ter (LdM16bitToDreg)
0 10 0 DREG Register Type = b[PREG Register 8-Bit Load from Memory to 32-bit Register
Type + imm16reloc Register Type] (z) (LdM08bitToDreg)
0 10 1 DREG Register Type = b[PREG Register 8-Bit Load from Memory to 32-bit Register
Type + imm16reloc Register Type] (x) (LdM08bitToDreg)
1 00 0 [PREG Register Type + imm16s4 Register 32-Bit Store to Memory (StDregToM32bit)
Type] = DREG Register Type
1 00 1 [PREG Register Type + imm16s4 Register Store Pointer (StPregToM32bit)
Type] = PREG Register Type
1 01 0 w[PREG Register Type + imm16s2 Register 16-Bit Store to Memory (StDregL-
Type] = DREG Register Type ToM16bit)
1 10 0 b[PREG Register Type + imm16reloc Regis- 8-Bit Store to Memory (StDregToM08bit)
ter Type] = DREG Register Type
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
W PTR[2:0]
AOP[1:0] IDX[2:0]
REG[2:0] REG[2:0]
Load/Store (Ldp)
Ldp Instruction Syntax
Load/Store (Ldp)
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0
AOP[1:0] REG[2:0]
PTR[2:0]
AOP[1:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0
OFF[3:0] REG[2:0]
PTR[2:0]
OFF[3:0]
Syntax Instruction
PREG Register Type = [PREG 32-Bit Pointer Load from Memory
Register Type + uimm4s4 Register (LdM32bitToPreg)
Type]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 0 0 0 0 0 0 1 0 0 0
OFF[4:0] REG[2:0]
OFF[4:0]
Syntax Instruction
PREG Register Type = [fp - imm5nzs4neg- 32-Bit Load from Memory (LdM32bitTo-
pos Register Type] Dreg)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
FRM[15:0] FRM[15:0]
R Syntax Instruction
0 link uimm16s4 Register Type Linkage (Linkage)
1 unlink Linkage (Linkage)
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0
OPC[2:0] DST[2:0]
SRC[4:0]
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0
SOFF[3:0]
ROP[1:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
IMM EOFF[9:0]
REG[2:0]
LOP[1:0]
EOFF[9:0]
LC
LC Encode Table
C Syntax
0 lc0
1 lc1
63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
IBUS[63:48] IBUS[63:48]
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
IBUS[47:32] IBUS[47:32]
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
IBUS[31:16] IBUS[31:16]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
IBUS[15:0] IBUS[15:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Syntax Instruction
nop NOP (NOP)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
Syntax Instruction
mnop 32-Bit No Operation (NOP32)
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
REG[3:0]
OPC[3:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0
OPC[2:0] DST[2:0]
SRC[2:0]
OPC[2:0]
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
D PR[2:0]
DR[2:0]
W D P Syntax Instruction
0 0 1 (PREG_RANGE Register Type) = [sp++] Stack Push/Pop Multiple Registers (Push-
PopMul16)
0 1 0 (DREG_RANGE Register Type) = [sp++] Stack Push/Pop Multiple Registers (Push-
PopMul16)
0 1 1 (DREG_RANGE Register Type, Stack Push/Pop Multiple Registers (Push-
PREG_RANGE Register Type) = [sp++] PopMul16)
1 0 1 [--sp] = (PREG_RANGE Register Type) Stack Push/Pop Multiple Registers (Push-
PopMul16)
1 1 0 [--sp] = (DREG_RANGE Register Type) Stack Push/Pop Multiple Registers (Push-
PopMul16)
1 1 1 [--sp] = (DREG_RANGE Register Type, Stack Push/Pop Multiple Registers (Push-
PREG_RANGE Register Type) PopMul16)
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
REG[2:0]
GRP[2:0]
W Syntax Instruction
0 POPREG = [sp++] Stack Pop (Pop)
1 [--sp] = PUSHREG Stack Push (Push)
POPREG
POPREG Encode Table
PUSHREG
PUSHREG Encode Table
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
GD[2:0] SRC[2:0]
GS[2:0] DST[2:0]
GS[2:0]
Syntax Instruction
GDST = GSRC Move Register to Register (MvRegToReg)
GDST
GDST Encode Table
GD DST Syntax
000 --- DREG Register Type
001 --- PREG Register Type
010 0-- IREG Register Type
010 1-- MREG Register Type
011 0-- BREG Register Type
011 1-- LREG Register Type
100 000 a0.x
100 001 a0.w
100 010 a1.x
100 011 a1.w
100 110 astat
100 111 rets
110 --- SYSREG2 Register Type
111 --- SYSREG3 Register Type
GSRC
GSRC Encode Table
GS SRC Syntax
000 --- DREG Register Type
001 --- PREG Register Type
010 0-- IREG Register Type
010 1-- MREG Register Type
011 0-- BREG Register Type
011 1-- LREG Register Type
100 000 a0.x
100 001 a0.w
100 010 a1.x
100 011 a1.w
100 110 astat
GS SRC Syntax
100 111 rets
110 --- SYSREG2 Register Type
111 --- SYSREG3 Register Type
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
OFF[11:0] OFF[11:0]
Syntax Instruction
jump.s imm12nxs2 Register Type Jump Immediate (JumpAbs)
range allow_label
-0x80000000:0x7fffffff true
range allow_label
0x0:0xffffffff true
range
0x0:0xffff
range allow_label
-0x400:0x3fe:2 true
range allow_label
-0x1000:0xffe:2 true
range allow_label
-0x1000:0xffe:2 true
range allow_label
-0x1000:0xffe:2 true
range
-0x8000:0x7fff
range negated
0x1:0x8000 true
range
-0x8000:0x7fff
range
-0x10000:0xfffe:2
range negated
0x2:0x10000:2 true
range
-0x20000:0x1fffc:4
range negated
0x4:0x20000:4 true
range
-0x1000000:0xfffffe:2
range
-0x1000000:0xfffffe:2
range
-0x1000000:0xfffffe:2
range
-0x4:0x3
range
-0x80000000:0x7fffffff
range negated
0x4:0x80:4 true
range
-0x20:0x1f
range
-0x40:0x3f
range
0x0:0xffff
range
-0x80:-0x4:4
range
0x0:0xffff
range iencode
0x1:0x3ff,0xffffffff 0xffffffff:0
range allow_label
0x4:0x7fe:2 true
range
0x0:0x3fffc:4
range
0x0:0x7
range allow_label
0x0:0xffffffff true
range
0x0:0xf
range
0x1:0xf
range negated
0x1:0xf true
range
0x0:0x1e:2
range allow_label
0x4:0x1e:2 true
range
0x0:0x3c:4
range
0x0:0x1f
range
0x1:0x1f
range negated
0x1:0x1f true
Code Syntax
00 b0
01 b1
10 b2
11 b3
Code Syntax
00 b0.h
01 b1.h
10 b2.h
11 b3.h
Code Syntax
00 b0.l
01 b1.l
10 b2.l
11 b3.l
Code Syntax
000 r0
001 r1
010 r2
011 r3
100 r4
101 r5
110 r6
111 r7
Code Syntax
000 r0.b
001 r1.b
010 r2.b
011 r3.b
100 r4.b
101 r5.b
110 r6.b
111 r7.b
Code Syntax
000 r0
010 r2
100 r4
110 r6
Code Syntax
000 r0.h
001 r1.h
010 r2.h
011 r3.h
100 r4.h
101 r5.h
110 r6.h
111 r7.h
Code Syntax
000 r0.l
001 r1.l
010 r2.l
011 r3.l
100 r4.l
101 r5.l
110 r6.l
111 r7.l
Code Syntax
000 r1
010 r3
100 r5
110 r7
Code Syntax
000 r1:0
010 r3:2
100 r5:4
110 r7:6
Code Syntax
000 r7:0
001 r7:1
010 r7:2
011 r7:3
100 r7:4
101 r7:5
110 r7:6
111 r7:7
Code Syntax
00 i0
01 i1
10 i2
11 i3
Code Syntax
00 i0.h
01 i1.h
10 i2.h
11 i3.h
Code Syntax
00 i0.l
01 i1.l
10 i2.l
11 i3.l
Code Syntax
00 l0
01 l1
10 l2
Code Syntax
11 l3
Code Syntax
00 l0.h
01 l1.h
10 l2.h
11 l3.h
Code Syntax
00 l0.l
01 l1.l
10 l2.l
11 l3.l
Code Syntax
00 m0
01 m1
10 m2
11 m3
Code Syntax
00 m0.h
01 m1.h
10 m2.h
11 m3.h
Code Syntax
00 m0.l
01 m1.l
10 m2.l
11 m3.l
Code Syntax
000 p0
001 p1
010 p2
011 p3
100 p4
101 p5
110 sp
111 fp
Code Syntax
000 p0
001 p1
010 p2
011 p3
100 p4
101 p5
Code Syntax
000 p0.h
001 p1.h
010 p2.h
011 p3.h
100 p4.h
101 p5.h
110 sp.h
111 fp.h
Code Syntax
000 p0.l
001 p1.l
010 p2.l
011 p3.l
100 p4.l
101 p5.l
Code Syntax
110 sp.l
111 fp.l
Code Syntax
000 p5:0
001 p5:1
010 p5:2
011 p5:3
100 p5:4
101 p5:5
Code Syntax
000 lc0
001 lt0
010 lb0
011 lc1
100 lt1
101 lb1
110 cycles
111 cycles2
Code Syntax
000 usp
Code Syntax
001 seqstat
010 syscfg
011 reti
100 retx
101 retn
110 rete
111 emudat
It is possible to issue a 32-bit ALU/MAC instruction in parallel with only one 16-bit instruction using the fol-
lowing syntax. The result is still a 64-bit instruction with a 16-bit NOP automatically inserted into the unused
16-bit slot.
• A 32-bit ALU/MAC instruction || A 16-bit instruction ;
Alternately, it is also possible to issue two 16-bit instructions in parallel with one another without an active 32-
bit ALU/MAC instruction by using the MNOP instruction, shown below. Again, the result is still a 64-bit in-
struction.
• MNOP || A 16-bit instruction || A 16-bit instruction ;
See the MNOP (32-bit NOP) instruction description in NOP (NOP). The MNOP instruction does not have
to be explicitly included by the programmer; the software tools prepend it automatically. The MNOP instruc-
tion will appear in disassembled parallel 16-bit instructions.
*1 Multi-issue may not combine SHIFT/ROTATE with STORE using Preg + Offset operation.
16-Bit Instructions
The two 16-bit instructions in a multi-issue instruction must each be from the instructions shown in the Compati-
ble 16-Bit Instructions table.
The following additional restrictions also apply to the 16-bit instructions of the multi-issue instruction.
• Only one of the 16-bit instructions can be a store instruction.
• Only one of the 16-bit instructions may load a pointer register. This load must be encoded in DAG slot 0.
9 Debug
The Blackfin+ processor's debug functionality is used for software debugging. It also complements some services of-
ten found in an operating system (OS) kernel. The functionality is implemented in the processor hardware and is
grouped into multiple levels.
A summary of available debug features is shown in the Blackfin+ Debug Features table.
Watchpoint Unit
By monitoring the addresses on both the instruction bus and the data bus, the Watchpoint Unit provides several
mechanisms for examining program behavior. After counting the number of times a particular address is matched,
the unit schedules an event based on this count.
In addition, information that the Watchpoint Unit provides helps in the optimization of code. The unit also makes
it easier to maintain executables through code patching.
The Watchpoint Unit contains these memory-mapped registers (MMRs), which are accessible in Supervisor and
Emulator modes:
• Watchpoint Status register (WPSTAT)
• Six Instruction Watchpoint Address registers (WPIA[5:0])
• Six Instruction Watchpoint Address Count registers (WPIACNT[5:0])
• Instruction Watchpoint Address Control register (WPIACTL)
• Two Data Watchpoint Address registers (WPDA[1:0])
Instruction Watchpoints
Each instruction watchpoint is controlled by three bits in the WPIACTL register, as shown in the WPIACTL Control
Bits table.
When two watchpoints are associated to form a range, two additional bits are used, as shown in the WPIACTL
Watchpoint Range Control Bits table.
Code patching allows software to replace sections of existing code with new code. The watchpoint registers are used
to trigger an exception at the start addresses of the earlier code. The exception routine then vectors to the location in
memory that contains the new code.
On the processor, code patching can be achieved by writing the start address of the earlier code to one of the WPIAx
registers and setting the corresponding EMUSWx bit to trigger an exception. In the exception service routine, the
WPSTAT register is read to determine which watchpoint triggered the exception. Next, the code writes the start ad-
dress of the new code in the RETX register and then returns from the exception to the new code. Because the excep-
tion mechanism is used for code patching, event service routines of the same or higher priority (exception, NMI,
and reset routines) cannot be patched.
A write to the WPSTAT MMR clears all the sticky status bits, though the data value written is ignored.
WPIAx Registers
When the Watchpoint Unit is enabled, the values in the Instruction Watchpoint Address registers (WPIAx) are com-
pared to the address on the instruction bus. Corresponding count values in the Instruction Watchpoint Address
Count registers (WPIACNTx) are decremented each time a match is identified. For more information, see Watch-
point Instruction Address Register .
WPIACNTx Registers
When the Watchpoint Unit is enabled, the count values in the Instruction Watchpoint Address Count registers
(WPIACNT[5:0]) are decremented each time the address or the address bus matches a value in the WPIAx registers.
Load the WPIACNTx register with a value that is one less than the number of times the watchpoint must match
before triggering an event. The WPIACNTx register will decrement to 0x0000 when the programmed count expires.
For more information, see the Watchpoint Instruction Address Count Register.
WPIACTL Register
Three bits in the Instruction Watchpoint Address Control register ( WPIACTL) control each instruction watchpoint.
For more information about the bits in this register, see Watchpoint Unit and Watchpoint Instruction Address Con-
trol Register.
When the two watchpoints are associated to form a range, two additional bits are used. See the WPDACTL Watch-
point Control Bits table.
WPDAx Registers
When the Watchpoint Unit is enabled, the values in the Data Watchpoint Address registers (WPDAx) are compared
to the address on the data buses. Corresponding count values in the Data Watchpoint Address Count registers
(WPDACNTx) are decremented each time a match is identified. For more information, see the Watchpoint Data Ad-
dress Register.
WPDACNTx Registers
When the Watchpoint Unit is enabled, the count values in the Data Watchpoint Address Count Value registers
(WPDACNTx) are decremented each time the address or the address bus matches a value in the WPDAx registers. Load
this WPDACNTx register with a value that is one less than the number of times the watchpoint must match the ad-
dress bus before triggering an event. The WPDACNTx register will decrement to 0x0000 when the programmed count
expires. For more information, see the Watchpoint Data Address Count Value Register .
WPDACTL Register
For more information about the bits in the Data Watchpoint Address Control register (WPDACTL), see Data Address
Watchpoints and Watchpoint Data Address Control Register.
WPSTAT Register
The Watchpoint Status register (WPSTAT) monitors the status of the watchpoints. It may be read and written in
Supervisor or Emulator modes only. When a watchpoint or watchpoint range matches, this register reflects the
source of the watchpoint. The status bits in the WPSTAT register are sticky, and all of them are cleared when the
register is written (with any value). For more information, see the Watchpoint Status Register.
Developers can use the PMU to count pipeline and memory stalls. The stall information can be used iteratively to
quickly locate areas to focus on during the software optimization process. The highest level of debugging efficiency
is achieved when using the PMU while running applications directly on hardware as opposed to predicting these
events in a simulation environment.
For example, the PMU can help to detect whether the performance bottleneck is due to L1 data memory access
latencies. Using another PMU event, it can be concluded that the memory stall results from simultaneous access by
both the core and the DMA controller to the same region of L1 memory, which is not allowed by the architecture,
thus causing one access to stall. However, the processor core and the DMA controller can access different subbanks
of memory in the same cycle (refer to Overview of On-Chip Level 1 (L1) Memory in the Memory chapter for more
details on L1 memory arbitration stalls). After identifying an issue like this using the PMU, one of the buffers can be
moved to a non-conflicting bank of L1 memory to minimize core versus DMA access conflicts.
Functional Description
The PMU provides two sets of registers (PFCTRx and PFCTL), which permit non-intrusive monitoring of the pro-
cessor's internal resources during program execution.
The 32-bit Performance Monitor Counter (PFCNTR1-0) registers hold the number of occurrences of a selected
event from within a processor core. Each of the counters must be enabled prior to use.
The Performance Control (PFCTL) register provides:
• enable/disable capabilities for the PMU,
• selection of the event mode,
• configuration of the event type to be monitored, and
• selection of interrupt handling type for a counter overflow condition.
Together, these registers provide feedback indicating the measure of load-balancing between the various resources on
the chip. This feedback permits comparison and analysis of expected versus actual resource usage.
PFCNTRx Registers
The Performance Monitor Counter Registers figure shows the Performance Monitor Counter registers,
PFCNTR[1:0]. The PFCNTR0 register contains the count value of Performance Monitor Counter 0, while the
PFCNTR1 register contains the count value of Performance Counter 1. For more information, see Counter 0 Regis-
ter and Counter 1 Register .
The counter retains its value even after the module is disabled, so the programmer has to clear the counter before
using it again. The counter can also be programmed with a non-zero 32-bit value.
PFCTL Register
To enable the PMU, set the PFPWR bit in the Performance Monitor Control (PFCTL) register. After the unit is ena-
bled, individual Count Enable bits (PFCENx) take effect. Use the PFCENx bits to enable or disable the performance
monitors in User mode, Supervisor mode, or both. Use the EVENTx bits to select the type of event triggered. For
more information, see the Control Register .
Programming Example
The following code example demonstrates a possible use case of the PMU to track stalls in a particular application.
/* L1 data memory address */
I0.L = LO(0xFF801004);
I0.H = HI(0xFF801004);
/* L1 data memory address in same 4K sub-bank */
I1.L = LO(0xFF801244);
I1.H = HI(0xFF801244);
/* reset performance control register */
P0.L = LO(PFCTL);
P0.H = HI(PFCTL);
R0 = 0;
[P0] = R0;
P0.H = HI(PFCNTR0);
R0 = 0;
[P0] = R0;
This results in the counter being incremented by one, as there is a one-cycle stall incurred due to a collision in the
data bank A sub-bank 1. A simultaneous access will only result in a stall if the accesses are to the same 32-bit word
alignment (address bit 2 matches), the same 4 KB sub-bank (address bits 13 and 12 match), the same 16 KB half-
bank (address bit 16 matches), and the same bank (address bits 21 and 20 match).
The Hardware Error interrupt can be used in cases where the application needs to be notified of a specific PMU
event. To support this, the EVENTx bit has to be cleared, and the counter has to be pre-loaded with a value of
0xFFFFFFFF, as follows:
P0.L = LO(PFCNTR0);
P0.H = HI(PFCNTR0);
R0.L = 0xFFFF;
R0.H = 0xFFFF;
[P0] = R0;
Because the EVENT0 bit is cleared, the counter overflow that occurs the first time the programmed event occurs
results in a hardware error interrupt being generated The Hardware Error interrupt service routine could be set up
and populated as follows to enable custom handling of any PMU event:
/* LOAD IMASK ADDRESS */
P0.L = LO(IMASK);
P0.H = HI(IMASK);
R0 = [P0];
R1 = IVHW; /* ENABLE HARDWARE ERROR INTERRUPT */
R0 = R0 | R1;
[P0] = R0;
/* STORE ISR HANDLER ADDRESS */
P0.L = LO(EVT5);
P0.H = HI(EVT5);
R0.L = LO(IVHW_ISR);
R0.H = HI(IVHW_ISR);
[P0] = R0;
/* HARDWARE ERROR INTERRUPT SERVICE ROUTINE */
IVHW_ISR:
P0 = SEQSTAT;
R0 = [P0]; /* READ SEQUENCER STATUS REGISTER */
R1 = 0x12; /* CHECK FOR PMU EVENT (HWERRCAUSE 0x12) */
R1 <<= 14;
CC = R1 == R0;
IF !CC JUMP HWERR_EXIT;
PFMON_OVERFLOW:
/* PERFORMANCE MONITOR OVERFLOW HAS BEEN DETECTED */
/* Add handling code here */
HWERR_EXIT:
RTI;
Cycle Counters
The cycle counter counts CCLK cycles while the program is executing. All cycles- including execution, wait state,
interrupts, and events - are counted while the processor is in User or Supervisor mode, but the cycle counter stops
counting in Emulator mode.
The 64-bit cycle counter increments every core clock cycle and is tracked in two 32-bit registers, CYCLES and CY-
CLES2. The least significant 32 bits (LSBs) are stored in CYCLES, and the most significant 32 bits (MSBs) are stor-
ed in CYCLES2.
The CYCLES and CYCLES2 registers are read/write in all modes (User, Supervisor, and Emulator) for all Blackfin+
processors. For more information, see Cycle Count (32 LSBs) Register and Cycle Count (32 MSBs) Register .
To enable the cycle counters, set the SYSCFG.CCEN bit. The following example shows how to use the cycle counter
to benchmark a piece of code:
R2 = 0; /* Clear the cycle counters */
CYCLES = R2;
CYCLES2 = R2;
R2 = SYSCFG;
BITSET(R2,1);
SYSCFG = R2; /* Enable the cycle counters */
NOTE: When single-stepping through instructions in a debug environment, the CYCLES register is incremented
in a non-uniform fashion due to interaction with the debugger over JTAG.
SYSCFG Register
The System Configuration register (SYSCFG) controls the configuration of the processor. This register is accessible
only from Supervisor mode. For more information, see the System Configuration Register .
DSPID Register
The Product Identification register (DSPID) is a read-only register and is part of the processor core. This register
format differs depending on whether the processor is a single or dual-core processor. For more information, see the
DSP Identification Register .
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
COREID (R)
Core ID
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
1 1 1 0 0 1 0 1 0 0 0 0 0 1 0 1
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CNT (R/W)
Count Value
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ADDR[15:0] (R/W)
Data Address
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ADDR[31:16] (R/W)
Data Address
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CNT (R/W)
Count Value
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 Disabled
1 Enabled
19 ENIA4 WPIA4 Enable.
(R/W) The WPIACTL.ENIA4 bit determines whether or not the WPIA4 watchpoint is ena-
bled to monitor an individual address. As such, this bit is only valid when
WPIACTL.ENIR45 is not set.
0 Disabled
1 Enabled
18 INVIR45 Instruction Range 45 Invert Enable.
(R/W) The WPIACTL.INVIR45 bit determines whether the watchpoint event occurs when
the instruction address is within or outside of the range defined by the WPIA4/WPIA5
register pair.
0 Event generated when WPIA4 < ADDRESS <= WPIA5
1 Event generated when ADDRESS <= WPIA4 or AD-
DRESS > WPIA5
17 ENIR45 Instruction Range 45 Enable.
(R/W) When the WPIACTL.ENIR45 bit is set, the WPIA4/WPIA5 instruction watchpoint
pair define a range of addresses for comparisons, and the individual watchpoint enable
bits WPIACTL.ENIA4 and WPIACTL.ENIA5 become invalid. When defined to be a
range, the start address of the range is in WPIA4 and the end address is in WPIA5.
0 Disable Range
1 Enable Range
16 ACT3 WPIA3 Action.
(R/W) The WPIACTL.ACT3 bit determines whether an exception or emulation event occurs
upon an instruction watchpoint 3 match.
0 Exception event
1 Emulation event
0 Diabled
1 Enabled
11 ENIA2 WPIA2 Enable.
(R/W) The WPIACTL.ENIA2 bit determines whether or not the WPIA2 watchpoint is ena-
bled to monitor an individual address. As such, this bit is only valid when
WPIACTL.ENIR23 is not set.
0 Disabled
1 Enabled
0 Disabled
1 Enabled
3 ENIA0 WPIA0 Enable.
(R/W) The WPIACTL.ENIA0 bit determines whether or not the WPIA0 watchpoint is ena-
bled to monitor an individual address. As such, this bit is only valid when
WPIACTL.ENIR01 is not set.
0 Disabled
1 Enabled
2 INVIR01 Instruction Range 01 Invert Enable.
(R/W) The WPIACTL.INVIR01 bit determines whether the watchpoint event occurs when
the instruction address is within or outside of the range defined by the WPIA0/WPIA1
register pair.
0 Event generated when WPIA0 < ADDRESS <= WPIA1
1 Event generated when ADDRESS <= WPIA0 or AD-
DRESS > WPIA1
1 ENIR01 Instruction Range 01 Enable.
(R/W) When the WPIACTL.ENIR01 bit is set, the WPIA0/WPIA1 instruction watchpoint
pair define a range of addresses for comparisons, and the individual watchpoint enable
bits WPIACTL.ENIA0 and WPIACTL.ENIA1 become invalid. When defined to be a
range, the start address of the range is in WPIA0 and the end address is in WPIA1.
0 Disable Range
1 Enable Range
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ADDR[15:0] (R/W)
Instruction Address
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ADDR[31:16] (R/W)
Instruction Address
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Counter 0 Register
The PFCNTR0 register holds the count value for performance monitor counter 0. Depending on the configuration
of the PFCTL register, this count decrements based on monitored occurrences of events or stall cycles related to
events. When this count decrements to zero (expires), the PF issues an exception or an emulation event.
The PFCNTR0 counter retains its value even after the PF is disabled, so the counter must be cleared before it may be
used again.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CNT[15:0] (R/W)
Event Count 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CNT[31:16] (R/W)
Event Count 0
Counter 1 Register
The PFCNTR1 register holds the count value for performance monitor counter 1. Depending on the configuration
of the PFCTL register, this count decrements based on monitored occurrences of events or stall cycles related to
events. When this count decrements to zero (expires), the PF issues an exception or an emulation event.
The PFCNTR1 counter retains its value even after the PF is disabled, so the counter must be cleared before it may be
used again.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CNT[15:0] (R/W)
Event Count 1
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CNT[31:16] (R/W)
Event Count 1
Control Register
The PFCTL register enables the performance monitor unit PF, selects whether event count expirations generate em-
ulator or exception events, select the processor modes in which monitoring is enabled, and select the event type oc-
currences or stalls that the monitor counts.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
X X X X X X X X X X X X X X X X
CNT[15:0] (R/W)
32 MSBs of Cycle Count
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
X X X X X X X X X X X X X X X X
CNT[31:16] (R/W)
32 MSBs of Cycle Count
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
X X X X X X X X X X X X X X X X
CNT[47:32] (R/W)
32 MSBs of Cycle Count
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
X X X X X X X X X X X X X X X X
CNT[63:48] (R/W)
32 MSBs of Cycle Count
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1
DCACHE1 (R)
Data Cache 1
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The processor core implements Program Trace Macrocell (PTM) which implements a subset of Coresight Program
Flow Trace Architecture (CSPFT) specification by ARM and provides instruction trace capability. For Cortex A5
trace unit features refer to the Embedded Trace Macrocell (ETM) chapter the hardware reference manual.
Features
The trace module has the following features
• Address comparators and Context ID comparators for filtering trace data and use as event resources.
• External inputs and outputs for use as event resources.
• Events can be created using address comparators, context ID comparators and external inputs.
• Counters to count events occurrences.
Functional Description
The following section describes the features available in the trace module.
Address Comparators
The trace module provides 4 address comparators. Program the Address Comparator Value register with the address
to be matched and the corresponding Address Comparator Access Type register with additional information about
the required comparison shown in the following list.
• Include or exclude range
• Linking the address comparison with Context ID comparator
Address comparators can be used
• Individually, as single address comparators (SACs)
• In pairs, as address range comparators (ARCs), in which case two adjacent address comparators form an ARC.
Context ID Comparators
The trace module provides 1 Context ID comparator.
Context ID comparator consists of a Context ID Comparator Value Register which can hold a Context ID value,
for comparison with the current Context ID and a Context ID Comparator Mask Register which can hold a mask
value, which is used to mask all Context ID comparisons. If Context ID Comparator Mask Register is programmed
to zero then no mask is applied to the Context ID comparisons.
Events
The trace module includes a number of event resources, address comparators, context ID comparators and external
inputs.
Event resources can be used to define events. Event register can be programmed to define the corresponding event as
the result of a logical operation involving one or two event resources.
Each event resource is either active or inactive, active event resource generates a logical TRUE signal and an inactive
event resource generates a logic FALSE signal. An event is logical combinational of event resources, therefore at any
given time each event is either TRUE or FALSE.
Counters
The trace module provides 2 counters that are controlled using events. Each 16-bit counter can count from 0 to
65535. Counter behavior is controlled by the following registers.
Trace Security
The trace module supports that is controlled by the Debug Enable input signal. It controls whether the trace mod-
ule is allowed to trace instructions. If this signal is de-asserted, all tracing will stop, all internal resources are disabled
and trace module’s state is held.
Programming Model
The trace module registers are memory-mapped in a 4KB region as per CoreSight programmers model
References
• CoreSight™ Program Flow Trace™ Architecture Specification - ARM IHI 0035B – Available at http://infocen-
ter.arm.com
• CoreSight™ Architecture Specification - ARM IHI 0029B – Available at http://infocenter.arm.com
11 Numeric Formats
The Blackfin+ family processors support 8-, 16-, 32-, and 40-bit fixed-point data in hardware. Special features in the
computation units allow support of other formats in software. This appendix describes various aspects of these data
formats. It also describes how to implement a block floating-point format in software.
Bit 15 14 13 2 1 0
Sign Bit
Radix Point
Unsigned Integer
Bit 15 14 13 2 1 0
Sign Bit
Radix Point
The native formats for the Blackfin processor family are a signed fractional 1.M format and an unsigned fractional
0.N format, where N is the number of bits in the data word and M = N - 1.
The notation used to describe a format consists of two numbers separated by a period (.); the first number is the
number of bits to the left of the radix point, the second is the number of bits to the right of the radix point. For
example, 16.0 format is an integer format; all bits lie to the left of the radix point. The format in the Example of
Fractional Format figure is 13.3.
Sig ned Fractional (13.3)
Bi t 15 14 13 4 3 2 1 0
Sig n Bit
Ra dix Poin t
Bit 15 14 13 4 3 2 1 0
Sign Bit
Radix Po int
# of Integer # of Frac- Max Positive Value (0x7FFF) In Max Negative Value Value of 1 LSB (0x0001) In Deci-
Format Bits tional Bits Decimal (0x8000) In Decimal mal
15.1 15 1 16383.500000000000000 -16384.0 0.500000000000000
16.0 16 0 32767.000000000000000 -32768.0 1.000000000000000
Binary Multiplication
In addition and subtraction, both operands must be in the same format (signed or unsigned, radix point in the same
location), and the result format is the same as the input format. Addition and subtraction are performed the same
way whether the inputs are signed or unsigned.
In multiplication, however, the inputs can have different formats, and the result depends on their formats. The
Blackfin+ family assembly language allows you to specify whether the inputs are both signed, both unsigned, or one
of each (mixed-mode). The location of the radix point in the result can be derived from its location in each of the
inputs. This is shown in the Format of Multiplier Result figure. The product of two 16-bit numbers is a 32-bit
number. If the inputs' formats are M.N and P.Q, the product has the format (M + P).(N + Q). For example, the
product of two 13.3 numbers is a 26.6 number. The product of two 1.15 numbers is a 2.30 number.
0x0FFF = 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1
0x1FFF = 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
0x07FF = 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1
Sign Bit
Sign Bit
0x0FFF = 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1
0x1FFF = 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
0x03FF = 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
Sign Bit
O P
off-chip memory..............................................................1–4 P[n] (Pointer Register, REGFILE)................................. 6–18
on-chip memory.............................................................. 1–3 parallel instructions......................................................8–351
ones population count, see bit operations see multi-issue instructions
operands................................................................... 4–5,6–1 parallel operations............................................................6–1
operating modes.............................................................. 3–1 parameter passing.......................................................... 4–10
operators parity error.....................................................................4–38
== (compare)................................................. 4–12,4–13 parity error handler........................................................4–52
OPT_FEATURE0 (Feature Core 0 Register, OPT)....... 9–39 parity errors, bus............................................................ 4–47
options, instruction passing arguments..........................................................4–10
ASL (arithmetic shift left)......................................... 8–6 passing arguments/parameters, example......................... 4–10
ASR (arithmetic shift right)...................................... 8–6 PC (program counter) register
CO (cross output).................................................... 8–6 sequencer usage........................................................ 4–4
FU (fractional unsigned)................................................. PC-relative
....... 8–53-8–56,8–58,8–69,8–71,8–73,8–75,8–84, constant....................................................................1–6
8–85 offset................................................................. 4–7,4–8
IH (integer high word)......................... 8–55,8–71,8–84 PEDC field....................................................................4–38
IS (integer signed)........................................................... PEDX field.................................................................... 4–38
....... 8–53-8–56,8–58,8–69,8–71,8–73,8–75,8–77, PEIC field......................................................................4–38
8–79,8–82,8–84,8–85 PEIX field......................................................................4–38
ISS2 (integer high word).........................................8–54 pending event requests, coordinating............................. 4–33
ISS2 (integer signded, scaled)........................ 8–71,8–84 performance
ISS2 (integer signed, scaled)................. 8–53-8–56,8–75 loop resume latency................................................ 4–31
IU (integer unsigned)..................................................... multi-issue instructions.............................................4–6
........ 8–53-8–56,8–58,8–71,8–73,8–75,8–84,8–85 performance monitor control (PFCTL) register............... 9–6
Z
z (zero extend)............................................................... 8–28
zero-overhead loops
registers.................................................... 1–7,4–4,4–28