cortex-a9-processor
cortex-a9-processor
Cortex-A15
Cortex-R7
Real-Time Control Application Processors
(Cortex-R Series) (Cortex-A Series)
Cortex-A9
Performance and Functionality
Cortex-A8
Cortex-A5
Cortex-R5
Cortex-R4
ARM11
Cortex-M4
ARM9 Cortex-M3
Microcontroller
Cortex-M1
ARM7 Cortex-M0
Capability
© 2012 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS,
QUARTUS and STRATIX words and logos are trademarks of Altera Corporation and registered in the U.S. Patent and Trademark
Office and in other countries. All other words and logos identified as trademarks or service marks are the property of their
respective holders as described at www.altera.com/common/legal.html. Altera warrants performance of its semiconductor ISO
101 Innovation Drive products to current specifications in accordance with Altera's standard warranty, but reserves the right to make changes to any 9001:2008
products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use Registered
San Jose, CA 95134 of any information, product, or service described herein except as expressly agreed to in writing by Altera. Altera customers are
www.altera.com advised to obtain the latest version of device specifications before relying on any published information and before placing orders
for products or services.
Subscribe
ARM Cortex-A9 MPCore Processor Architecture Page 2
The dual-core ARM Cortex-A9 MPCore processor in Altera SoC FPGAs is designed
for maximum performance and power efficiency, implementing the widely-supported
ARMv7 instruction set architecture to address a broad range of industrial,
automotive, and wireless applications. The Cortex-A9 MPCore processor architecture
includes the following features:
■ Dual-core multiprocessing, supporting both symmetric multiprocessing (SMP)
and asymmetric multiprocessing (AMP)
■ Multi-issue superscalar, out-of-order, speculative execution 8-stage pipeline
delivering 2.5 DMIPS/MHz per CPU
■ Advanced branch prediction
■ Single- and double-precision IEEE standard 754-1985 floating-point mathematical
operations
■ ARM NEON™ 128-bit single instruction multiple data (SIMD) media processing
engine
■ Jazelle® byte-code dynamic compiler support
■ TrustZone® architecture for enhanced system security
1 For more details about ARM Cortex-A9 processors, refer to the ARM Cortex-A9
Processors white paper.
SoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief February 2012 Altera Corporation
ARM Cortex-A9 MPCore Processor Architecture Page 3
Cortex-A15
(Increasing Capability)
Processor Architecture
Cortex-R7
ARM11MP Cortex-A9
ARM1156T2 Cortex-A8
Thumb 16-bit
Instruction Set Thumb-2 Memory-Optimized Instruction Set
Architectural Features
Nested Vectored
IEEE-754 Floating Point Arithmetic
Interrupts
Wake-up Interrupt
Jazelle Bytecode Dynamic Compiler Support
Controller
TrustZone System Security
Virtualization
SoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief February 2012 Altera Corporation
Cache Memory Page 4
Figure 3 provides a detailed block diagram of the MPU subsystem. The MPU
subsystem includes two Cortex-A9 processor cores, the level 2 (L2) cache and memory
subsystem, Snoop Control Unit (SCU), Accelerator Coherency Port (ACP), and debug
functions.
.
Interrupt Controller
Jazelle Bytecode Jazelle Bytecode
Dynamic Compiler Dynamic Compiler
IEEE 754 Floating Point IEEE 754 Floating Point
(single-,double-Precision) (single-,double-Precision)
NEON DSP/Media SIMD NEON DSP/Media SIMD
Processing Engine Processing Engine
Processor Cores
ACP SCU
L2 Cache
Cache Memory
Cache memory improves the performance of a processor-based system and helps
reduce power consumption. Cache memory that is tightly integrated with an
associated processor core is called level 1 (L1) cache. Each Cortex-A9 CPU has two
independent 32-kilobyte (KB) L1 caches—one for instructions and one for data—
allowing simultaneous instruction fetches and data access. Each L1 cache is 4-way set
associative and has an eight-word line length.
SoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief February 2012 Altera Corporation
Cache Memory Page 5
The HPS also includes a 512-KB L2 shared, unified cache memory (instruction and
data for both Cortex-A9 cores). The L2 cache is 8-way set-associative with
programmable locking by line, way, and master. The L2 cache includes error
correction code (ECC) reporting.
FPGA 32-Bit 3
L1 Data Instruction L1 Data Ins
Fabric Cache Cache Cache C
L2
Unified Cache
The SCU maintains bidirectional coherency among the L1 data caches ensuring both
CPUs access to the most recent data. When a CPU writes to any coherent memory
location, the SCU ensures that the relevant data is coherent
(updated/tagged/invalidated). Similarly, the SCU monitors read operations from a
coherent memory location. If the required data is already stored in the L1 caches, the
data is returned directly to the requesting CPU. If the data is not in the L1 cache, the
L2 cache checks its contents before the data is finally retrieved from the main memory.
The SCU also manages accesses from the ACP and arbitrates between the Cortex-A9
CPUs if both attempt simultaneous access to the L2 cache.
SoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief February 2012 Altera Corporation
IEEE standard 754-1985 Floating Point Unit Page 6
The ACP ID mapper is located between the L3 interconnect and the ACP. The ARM
ACP port is designed to support up to eight unique transactions concurrently (eight
unique transaction IDs are supported). However, the FPGA fabric can have any
number of masters requesting coherent transactions. The ACP mapper dynamically
allocates the available eight transaction IDs to the requesting masters to ensure that all
masters have access to coherent memory regions.
Source Register
Source Register
Op Op Op Op
Destination Register
SoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief February 2012 Altera Corporation
Jazelle Dynamic Byte-Code Compiler Support Page 7
The Cortex-A9 NEON MPE performs operations on the following data types:
■ SIMD and scalar single-precision floating-point computations
■ Scalar double-precision floating-point computation
■ SIMD and scalar half-precision floating-point conversion
■ 8-, 16-, 32-, and 64-bit signed and unsigned integer SIMD computation
■ 8- or 16-bit polynomial computation for single-bit coefficients
The available operations include the following functionality:
■ Addition and subtraction
■ Multiplication with optional accumulation
■ Maximum or minimum value-driven lane selection operations
■ Inverse square-root approximation
■ Comprehensive data-structure load instructions, including
register-bank-resident table lookup
1 For more details on the ARM NEON processing engine, including application
benchmarks, refer to the NEON Technology Introduction presentation.
SoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief February 2012 Altera Corporation
Generic Interrupt Controller Page 8
Interrupt Sources
ARM Cortex-A9
CPU0/Caches
ARM Cortex-A9
CPU1/Caches
SCU
Snoop Control Unit
L2 Cache
DDR SDRAM ARM Cortex-A9
CPU 0
DMA Controller
Ethernet 1:0
USB 1:0
CAN 1:0
MMC/SD
Generic
NAND Flash Interrupt
Quad SPI Flash Controller
SPI 3:0
I2C 3:0
UART 1:0
GPIO 2:0
Timer 3:0
Watchdog 1:0 ARM Cortex-A9
PLL Lock CPU 1
FPGA Manager
FPGA-based IP
FPGA2SoC
Bridge Timeout
SoC2FPGA
Bridge Timeout
For some peripherals, such as the USB controllers, multiple interrupt sources are
combined to a single interrupt to the processors. Each USB controller supports up to
32 interrupts from individual USB endpoints, all of which are combined into a single
interrupt to the processor.
SoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief February 2012 Altera Corporation
HPS Boot Options Page 9
SoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief February 2012 Altera Corporation
System Interconnect Page 10
System Interconnect
The remainder of the HPS is located outside of the MPU subsystem, as shown in
Figure 7. The processor accesses the rest of the HPS through a pair of 64-bit Advanced
Microcontroller Bus Architecture (AMBA®) Advanced eXtensible Interface (AXI™)
masters. The high-bandwidth peripherals, including the FPGA data ports, connect to
the L3 interconnect structure. The L3 interconnect is further partitioned into three
major sub-switches. The L3 interconnect uses a multilayer, non-blocking architecture
that supports multiple, simultaneous transactions between peripherals, sub-switches,
SDRAM, and the MPU subsystem. Each L3 bus master has programmable priority.
SoC FPGAs use the 32-bit AMBA High-performance Bus (AHB™) as a low-power bus
to low-power peripherals, and high-performance 64-bit AXI to high-performance
peripherals. The lower-bandwidth peripherals reside on the level 4 (L4) bus,
implemented as a 32-bit ARM Advanced Peripheral Bus (APB™). Some peripherals,
like the DMA controller, have low-bandwidth control connections on the L4
interconnect and high-bandwidth data transfer connections on the L3 interconnect.
SoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief February 2012 Altera Corporation
System Interconnect Page 11
ETR AXI-32
AXI-64 ACP ID ACP
(Trace) Mapper SCU
SD AHB-32 L2 Cache
MMC L3 Master AXI-64
Peripheral L3 Main Switch
Switch
AXI-32 AXI-32 AXI-64
EMAC AXI-32
(2) STM
AXI-32 Boot
USB ROM SDRAM
AHB-32
OTG Controller
AXI-64 On-chip
(2) Subsystem
RAM
AXI-32
NAND AXI-32
AXI-32 AXI-64
AXI-32
AHB-32
AHB-32 L3 Slave Peripheral Switch AHB-32
QSPI DMA
L4, APB-32
UART Timer I2C Watchdog CAN GPIO System Clock Reset Scan SPI
(2) (4) (4) (2) (2) (3) Manager Manager Manager Manager (4)
SoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief February 2012 Altera Corporation
Page 12 SoC FPGA System Address Map
Table 1 lists the connections between the various system bus masters and slaves. Not
every bus master communicates with every bus slave, although the MPU subsystem,
the FPGA-to-HPS bridge, lightweight HPS-to-FPGA bridge, and the Debug Access
Port (DAP) have universal access.
Table 1. Master/Slave Connection Matrix
LWHPS2FPGAREGS
SDRAM Subsystem
FPGA2HPSREGS
FPGA2HPSREGS
HPS2FPGAREGS
FPGAMGRDATA
NAND REGS
NAND DATA
ACPIDMAP
QSPIDATA
OCRAM
Slaves
USB0
USB1
ROM
L4
Masters
MPU subsystem v v v v v v v v v v — v v v v
FPGA-to-HPS v v v v v v v v v v v v v v v
DMA v — — — — — v v v v v — v v v
EMAC0 — — — — — — — — — v v — v v v
EMAC1 — — — — — — — — — v v — v v v
USB0 — — — — — — — — — v v — v v v
USB1 — — — — — — — — — v v — v v v
NAND — — — — — — — — — v v — v v v
SD/MMC — — — — — — — — — v v — v v v
Embedded Trace
— — — — — — — — — v — — v v v
Router (ETR)
Debug Access Port v v v v v v v v v v v — v v v
February 2012 Altera Corporation SoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief
SoC FPGA System Address Map Page 13
Figure 8 shows the relationships between the HPS address spaces. Some address
spaces have windows into other address spaces. The thin colored arrows indicate
windows mapped to other address ranges (the arrows point to mapped address
spaces). For example, the ACP window in the L3 address space provides access into
the bottom 1 GB of the CPU address space. The vertical arrows in the SDRAM
window of the CPU address space shows that the boundaries of the SDRAM window
can be moved in the direction of the arrows.
4 GB
Peripheral Peripheral
FPGA FPGA
Window Window
3 GB
ACP
LWFPGA FPGA Window
SDRAM 2 GB
SDRAM
Window
SDRAM
Window 1 GB
Boot
0 GB
The top 64 megabytes (MB) in the CPU and L3 address spaces is always allocated to
the HPS dedicated peripherals. The CPU and L3 peripherals support up to 1 GB of
address space to communicate with slave peripherals in FPGA fabric, minus the
64 MB at the top allocated to the HPS dedicated peripherals.
You can address architecture on the FPGA fabric without limits. A 1 MB window in
the address map allows sharing between the FPGA fabric and the HPS. (soft logic in
the FPGA fabric performs address decoding). The L3 and CPU address spaces
provide a window of nearly 1 GB (1 GB minus 64 MB for peripherals) into the FPGA
address space.
After power-on or a cold or warm reset, the bottom 1 MB is allocated to the boot ROM
for the MPU subsystem and L3 masters. Under software control, the lower 1 MB can
be remapped to the 64 KB on-chip RAM, or the bottom 1 MB of external SDRAM.
The MPU subsystem directly addresses a minimum of nearly 3 GB. The L3 masters
support up to 2 GB of SDRAM. Access to coherent memory from the L3 masters is
performed via the 1 GB of address space allocated to the ACP, as described in
“Accelerator Coherency Port” on page 5.
SoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief February 2012 Altera Corporation
Page 14 Communication Between the HPS and FPGA Fabric
Besides transactions over the system bus, IP implemented in the FPGA fabric have a
separate private data path to the full 4 GB of SDRAM.
System-Level Management
The HPS and the FPGA fabric are both stand-alone entities that can operate
independently from one another. However, the HPS can directly configure and
monitor the FPGA fabric via the FPGA configuration manager.
Beyond the SoC FPGA's on-chip FPGA fabric, the processor can also configure
additional external FPGAs from the same configuration memory source.
February 2012 Altera Corporation SoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief
System-Level Management Page 15
FPGA Manager
The FPGA manager is a 32-bit peripheral, as shown in Figure 9. The configuration
interface is familiar to existing Altera FPGA designers and is similar to the passive-
parallel configuration mode used to configure other Altera FPGA devices from an
external processor. In addition to the ability to program the FPGA fabric from the
HPS, the FPGA fabric also supports configuration from external, active, or passive
sources.
The FPGA manager provides various configuration options:
■ Configure the full FPGA
■ Configure just the I/O and have the remainder of the FPGA configured over
the PCI Express® (PCIe®) port
■ Partially reconfigure portions of the FPGA fabric
■ Provide decompression on a compressed bitstream image
■ Decrypt an FPGA bitstream previously encrypted using Advanced Encryption
Standards AES, thereby providing an additional layer of system security
■ In high-reliability applications, perform bitstream scrubbing to further protect
against single event upsets (SEUs)
FPGA Only 10 kΩ
nSTATUS &
CONF_DONE
nCONFIG x x
nSTATUS,
CONF_DONE,
INIT_DONE
Configuration
MSEL,
nCE, Block x
DCLK,
DATA,
x Only
PR_REQUEST nSTATUS &
CONF_DONE
CONFIG_IO
Mode
HPS
FPGA
Manager
Besides configuring the internal FPGA fabric, the HPS can optionally program other
FPGAs in the system. The CPU controls the CONF_DONE pin and can delay
completion of multi-FPGA configurations.
Clock Manager
The clock manager generates and manages all the system clocks in the HPS.
SoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief February 2012 Altera Corporation
Page 16 System-Level Management
The clock manager controls the three primary phase-locked loops (PLLs), including
the output frequency generated, the phase, and the delay from the selected clock
input. Similarly, the CPU can monitor the lock status for each PLL.
Table 3 shows the clock sources for each of the PLLs.
Table 3. Clock Sources for Clock Manager PLLs
Primary Oscillator Secondary Oscillator From FPGA Fabric
PLL
Input Input Reference Clock
Main PLL v — —
SDRAM PLL v v SDRAM reference clock
Peripheral PLL v v Peripheral reference clock
The HPS architecture automatically handles signals that cross between clock domains.
Reducing the clock frequency helps reduce overall system power consumption.
Under processor control, the clock manager can change the main system clock
without affecting the clocks controlling the peripherals or SDRAM interface. Directly
changing the PLL's operating frequency using the multiply and divide values
provides fine-grain frequency control and long settling times between changes.
Changing the post-scale counters associated with the PLL results in quick changes,
but with reduced frequency resolution.
Reset Manager
The reset manager determines the source and type of reset to the system and performs
any necessary prequalification of a reset signal before forwarding it to the HPS or
FPGA fabric. Based on the type of reset, a reset signal might reset only a portion of the
entire device.
In the HPS, there are three reset request types:
■ Cold reset—forces the entire HPS, including debugging functions, into a
known default state to begin/restart the boot process.
■ Warm reset—recovers a non-responsive system. This reset only affects a
portion of the system and not the debugging infrastructure described in
“Debugging Infrastructure” on page 17. A warm reset does not clear the
system configuration options set during a cold reset or power-on condition.
■ Debug reset—recovers only debug and trace functions should they become
nonresponsive.
Global cold reset requests reset the entire device. The following sources can initiate a
cold reset request:
■ The power-on reset (POR) monitor
■ The nPOR input pin
■ Cold reset request from FPGA fabric
■ Software-based cold reset request
February 2012 Altera Corporation SoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief
Debugging Infrastructure Page 17
Debugging Infrastructure
Developing and debugging a highly-integrated SoC device can be challenging,
especially because so much of the application is deeply embedded in the device. To
speed up development and verification, the Altera SoC FPGA includes extensive
debugging resources that provide system visibility via ARM's CoreSight™ on-chip
debug and trace IP that include the following features:
■ Individual interactive debug for each CPU
■ Individual program trace for each CPU
■ Cross triggering between CPUs
■ Cross triggering between CPU and debugging functions
Interactive Debug
Each Cortex-A9 processor has built-in debug capabilities, including the following
items:
■ Six hardware breakpoints, including two with context ID comparison
capability to differentiate between different code streams
■ Four watchpoints
■ Three control sources, including the external JTAG or Serial Wire Debug (SWD)
tools, by FPGA soft IP, or by CPU-based monitor code
Program Trace
Each processor has an independent Program Trace Macrocell (PTM) that provides
real-time instruction flow trace capability. The PTM is compatible with a number of
third-party debugging tools. The PTM trace information is highly compressed to
maximize throughput by tagging specific points in the program execution flow, called
waypoints. Waypoints are changes to the program flow or specific events, such as an
exception, branches, changes in context, and changes in processor instruction or
security state.
SoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief February 2012 Altera Corporation
Page 18 Further Information
The PTM optionally provides additional information for waypoints, including the
following items:
■ The number of CPU cycles between waypoints
■ The global time stamp value
■ Target addresses for direct branches
External debugger software decodes the trace information into a form that more
closely resembles the original program flow.
Performance Monitoring
Each Cortex-A9 processor has a Performance Monitoring Unit (PMU). The PMU
provides 58 events that gather statistics on the operation of the processor and memory
system. Six counters in the PMU accumulate the events in real time. The PMU
counters are accessible from either the processor itself or the external debugger. The
events are also supplied to the PTM and can be used as trigger or trace conditions.
Further Information
■ SoC FPGA Overview:
www.altera.com/devices/processor/soc-fpga/proc-soc-fpga.html
■ SoC FPGA Product Overview Advance Information Brief (AIB):
www.altera.com/literature/hb/soc-fpga/aib-01017-soc-fpga-overview.pdf
■ Arria V FPGAs: Balance of Cost, Performance, and Power:
www.altera.com/devices/fpga/arria-fpgas/arria-v/arrv-index.jsp
■ Cyclone V FPGAs: Lowest System Cost and Power:
www.altera.com/devices/fpga/cyclone-v-fpgas/cyv-index.jsp
■ Dual-Core ARM Cortex-A9 MPCore Processor
www.altera.com/devices/processor/arm/cortex-a9/m-arm-cortex-a9.html
■ Using the Virtual Target with the ARM Cortex-A9 MPCore Processor:
www.altera.com/devices/processor/arm/cortex-a9/virtual-target/proc-a9-
virtual-target.html
February 2012 Altera Corporation SoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief
Document Revision History Page 19
SoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief February 2012 Altera Corporation
Page 20 Document Revision History
February 2012 Altera Corporation SoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief