Unit 4 Introduction To ARM CORTEX M4
Unit 4 Introduction To ARM CORTEX M4
– High reliability
Cortex-R5
Cortex-R4
Cortex-R
– Applications include automotive braking system, Cortex-M4
powertrains etc. Cortex-M3
Cortex-M series (Microcontroller) Cortex-M1
Cortex-M0+
Cortex-M
– Cost-sensitive solutions for deterministic microcontroller Cortex-M0
applications;
SC000
– Applications include microcontrollers, mixed signal
devices, smart sensors, automotive body electronics and
SC100
SC300
SecurCore
airbags;
ARM11
SecurCore series ARM9
Classic
ARM7
– High security applications.
Previous classic processors
– IncludeARM7, ARM9, ARM11 families As of Dec 2013
Source: ARM
Design an ARM-based SoC
Select a set of IP cores from ARM and/or other third-party IP
vendors
Integrate IP cores into a single chip design
Give design to semiconductor foundries for chip fabrication
IP libraries SoC
Cortex-A9 Cortex-R5 Cortex-M3 ARM
ROM RAM
processor
ARM7 ARM9 ARM11
System bus ARM-based
DRAM ctrl FLASH ctrl SRAM ctrl SoC
Peripherals
AXI bus AHB bus APB bus
Source: ARM
ARM Cortex-M Series
Cortex-M series: Cortex-M0, M0+, M1, M3, M4,M7
Energy-efficiency
Lower energy cost, longer battery life
Smaller code
Lower silicon costs
Ease of use
Faster software development and reuse
Embedded applications
Smart metering, human interface devices, automotive and industrial control systems,
white goods, consumer products and medical instrumentation
As of Dec 2013
Source: ARM
ARM Processors vs. ARM Architectures
ARM architecture
Describes the details of instruction set, programmer’s model, exception model, and memory
map
Documented in the Architecture Reference Manual
ARM processor
Developed using one of the ARM architectures
More implementation details, such as timing information
Documented in processor’s Technical Reference Manual
As of Dec 2013
Source: ARM
ARM Cortex-M Series Family
ARM Core Hardware Hardware Saturated DSP Floating
Processor Thumb® Thumb®-2
Architecture Architecture Multiply Divide Math Extensions Point
Von 1 or 32
Cortex-M0 ARMv6-M Most Subset No No No No
Neumann cycle
Von 1 or 32
Cortex-M0+ ARMv6-M Most Subset No No No No
Neumann cycle
Von 3 or 33
Cortex-M1 ARMv6-M Most Subset No No No No
Neumann cycle
Cortex-M4 ARMv7E-M Harvard Entire Entire 1 cycle Yes Yes Yes Optional
Source: ARM
ARM Cortex-M Series Family
Latest generation Cortex-M processor is the Cortex-M55.
The Cortex-M55 is the first processor built on the Armv8.1-M
architecture with Arm Helium technology, a vector processing
extension. It brings enhanced levels of machine learning and signal
processing performance to the next wave of small embedded devices,
including wearables, smart speakers, and more.
Cortex-M55
The Arm Cortex-M55 processor is Arm’s most AI-capable Cortex-M
processor and the first to feature Arm Helium vector processing
technology.
Source: ARM
ARM Cortex-M Series Family
Cortex-M35P
A tamper-resistant Cortex-M processor with optional software
isolation using TrustZone for Armv8-M.
Cortex-M33
The Arm Cortex-M33 processor has an ideal blend of real-time
determinism, efficiency and security.
Cortex-M23
The Arm Cortex-M23 is the smallest and lowest power
microcontroller with TrustZone security.
Cortex-M7
The Arm Cortex-M7 processor is the highest performance member of
the energy-efficient Cortex-M processor family.
Source: ARM
Introduction to ARM Cortex-M Processors
The Cortex-M3 processor was the first of the Cortex generation
of processors, released by ARM in 2005.
The Cortex-M4 processor was released in 2010.
The Cortex-M4 processors use a 32-bit architecture.
Internal registers in the register bank, the data path, and the bus
interfaces are all 32 bits wide.
The Instruction Set Architecture (ISA) in the Cortex-M
processors is called the Thumb ISA and is based on Thumb-2
Technology which supports a mixture of 16-bit and 32-bit
instructions.
Introduction toARM Cortex M4Architecture
Cortex-M4 processors have:
Three-stage pipeline design
Harvard bus architecture with unified memory space: instructions and
data use the same address space
32-bit addressing, supporting 4GB of memory space
On-chip bus interfaces based on ARM AMBA (Advanced Microcontroller
Bus Architecture) Technology, which allow pipelined bus operations for
higher throughput
An interrupt controller called NVIC (Nested Vectored Interrupt
Controller) supporting up to 240 interrupt requests and from 8 to 256
interrupt priority levels (dependent on the actual device implementation)
Introduction toARM Cortex M4Architecture
Cortex-M4 processors have:
Support for various features for OS (Operating System)
implementation such as a system tick timer, shadowed stack pointer
Sleep mode support and various low power features
Support for an optional MPU (Memory Protection Unit) to provide
memory protection features like programmable memory, or access
permission control
Support for bit-data accesses in two specific memory regions using a
feature called Bit Band
The option of being used in single processor or multi-processor
designs
Introduction toARM Cortex M4Architecture
The ISA used in Cortex-M4 processors provides a wide range of
instructions:
General data processing, including hardware divide instructions
Memory access instructions supporting 8-bit, 16-bit, 32-bit, and
64-bit data, as well as instructions for transferring multiple 32-bit
data
Instructions for bit field processing
Multiply Accumulate (MAC) and saturate instructions
Instructions for branches, conditional branches and function calls
Instructions for system control, OS support, etc.
Introduction toARM Cortex M4Architecture
In addition, the Cortex-M4 processor also supports:
Single Instruction Multiple Data (SIMD) operations
Additional fast MAC and multiply instructions
Saturating arithmetic instructions
Optional floating point instructions (single precision)
Firmware development using CMSIS Standard
(Cortex MicrocontrollerSoftware Interface standard)
The CMSIS is a set of tools, APIs, frameworks, and work flows that help to
simplify software re-use, reduce the learning curve for microcontroller
developers, speed-up project build and debug, and thus reduce the time to
market for new applications.
CMSIS started as a vendor-independent hardware abstraction layer Arm®
Cortex®-M based processors and was later extended to support entry-level
Arm Cortex-A based processors. To simplify access, CMSIS defines generic
tool interfaces and enables consistent device support by providing simple
software interfaces to the processor and the peripherals.
CMSIS is defined in close cooperation with various silicon and software
vendors and provides a common approach to interface to peripherals, real-
time operating systems, and middleware components. It is intended to enable
the combination of software components from multiple vendors.
CMSIS is open-source and collaboratively developed on GitHub.
Firmware development using CMSIS Standard
(Cortex MicrocontrollerSoftware Interface standard)
CMSIS has been created to help the industry in standardization. It enables consistent
software layers and device support across a wide range of development tools and
microcontrollers. CMSIS is not a huge software layer that introduces overhead and
does not define standard peripherals. The silicon industry can therefore support the
wide variations of Arm Cortex processor-based devices with this common standard.
Firmware development using CMSIS Standard
(Cortex MicrocontrollerSoftware Interface standard)
The aims of CMSIS include:
Enhanced software reusability - makes it easier to reuse software code in different
Cortex-M projects, reducing time to market and verification efforts.
Enhanced software compatibility - by having a consistent software infrastructure
(e.g., API for processor core access functions, system initialization method, common
style for defining peripherals), software from various sources can work together,
reducing the risk in integration.
Easy to learn - the CMSIS allows easy access to processor core features from the C
language. In addition, once you learn to use one Cortex-M microcontroller product,
starting to use another Cortex-M product is much easier because of the consistency
in software setup.
Toolchain independent - CMSIS-compliant device drivers can be used with various
compilation tools, providing much greater freedom.
Openness - the source code for CMSIS core files can be downloaded and accessed
by everyone, and everyone can develop software products with CMSIS.
Firmware development using CMSIS Standard
(Cortex MicrocontrollerSoftware Interface standard)
Firmware development using CMSIS Standard
(Cortex MicrocontrollerSoftware Interface standard)
Operation modes
Handler mode: When executing an exception handler such
as an Interrupt Service Routine (ISR). When in handler
mode, the processor always has privileged access level.
41
Exceptions and interrupts
In Cortex-M processors, there are a number of exception sources:
Exceptions are processed by the NVIC. The NVIC can handle a
number of Interrupt Requests (IRQs) and a Non-Maskable
Interrupt (NMI) request.
Usually IRQs are generated by on-chip peripherals or from
external interrupt inputs though I/O ports.
The NMI could be used by a watchdog timer or brownout
detector (a voltage monitoring unit that warns the processor when
the supply voltage drops below a certain level).
Inside the processor there is also a timer called SysTick, which
can generate a periodic timer interrupt request, which can be used
by embedded OSs for timekeeping, or for simple timing control
in applications that don’t require an OS.
Exceptions and interrupts
The processor itself is also a source of exception events.
These could be fault events that indicate system error conditions, or
exceptions generated by software to support embedded OS operations.
As opposed to classic ARM processors such as the ARM7TDMI, there
is no FIQ (Fast Interrupt) in the Cortex-M processor. However, the
interrupt latency of Corex-M4 is very low, only 12 clock cycles, so this
does not cause problems.
Reset is a special kind of exception. When the processor exits from a
reset, it executes the reset handler in Thread mode (rather than Handler
mode as in other exceptions). Also the exception number in IPSR is
read as zero.
Exceptions and interrupts
Nested vectored interrupt controller (NVIC)
The NVIC is a part of the Cortex-M processor. It is
programmable and its registers are located in the System Control
Space (SCS) of the memory map.
The NVIC handles the exceptions and interrupt configurations,
prioritization, and interrupt masking.
The NVIC has the following features:
Flexible exception and interrupt management
Nested exception/interrupt support
Vectored exception/interrupt entry
Interrupt masking
Nested vectored interrupt controller (NVIC)
Nested exception/interrupt support
Each exception has a priority level. Some exceptions, such as interrupts, have
programmable priority levels and some others (e.g., NMI) have a fixed priority
level.
When an exception occurs, the NVIC will compare the priority level of this
exception to the current level.
If the new exception has a higher priority, the current running task will be
suspended.
Some of the registers will be stored on the stack memory, and the processor
will start executing the exception handler of the new exception.
This process is called “preemption.” When the higher priority exception
handler is complete, it is terminated with an exception return operation and the
processor automatically restores the registers from stack and resumes the task
that was running previously.
This mechanism allows nesting of exception services without any software
overhead.
Vector table
System control block (SCB)
One part of the processor that is merged into the NVIC unit is
the SCB.
The SCB contains various registers for:
Controlling processor configurations (e.g., low power
modes)
Providing fault status information (fault status registers)
Vector table relocation (VTOR)
The SCB is memory-mapped. Similar to the NVIC registers, the
SCB registers are accessible from the System Control Space
(SCS).
Debug
There are two types of interfaces provided in the Cortex-M
processors: debug and trace.
The debug interface allows a debug adaptor to connect to a
Cortex-M microcontroller to control the debug features and access
the memory space on the chip.
The Cortex-M processor supports the traditional JTAG protocol, which
uses either 4 or 5 pins, or a newer 2-pin protocol called Serial Wire
Debug (SWD).
The SWD protocol was developed by ARM, and can handle the same
debug features as in JTAG in just two pins, without any loss of debug
performance.
The two protocols can use the same connector, with JTAG TCK shared
with the Serial Wire clock, and JTAG TMS shared with the Serial Wire
Data, which is bidirectional
Debug
The trace interface is used to collect information from the processor
during runtime such as data, event, profiling information, or even
complete details of program execution.
Two types of trace interface are supported: a single pin protocol called
Serial Wire Viewer (SWV) and a multi-pin protocol called Trace Port .
SWV is a low-cost solution that has a lower trace data bandwidth limit.
However, the bandwidth is still large enough to handle capturing of
selective data trace, event trace, and basic profiling.
The output signal, which is called Serial Wire Output (SWO), can be
shared with the JTAG TDO pin so that you only need one standard
JTAG/SWD connector for both debug and trace.
The Trace Port mode requires one clock pin and several data pins
Reset and reset sequence
In typical Cortex-M microcontrollers, there can be three types of
reset:
Power on reset - reset everything in the microcontroller. This
includes the processor and its debug support component and
peripherals.
System reset - reset just the processor and peripherals, but not the
debug support component of the processor.
Processor reset - reset the processor only.
The duration of Power on reset and System reset depends on the
microcontroller design.
In some cases the reset lasts a number of milli seconds as the reset
controller needs to wait for a clock source such as a crystal
oscillator to stabilize.
Reset and reset sequence
The setup of the MSP is necessary because some exceptions such as the NMI or
HardFault handler could potentially occur shortly after the reset, and the stack
memory and hence the MSP will then be needed to push some of the processor
status to the stack before exception handling.
Because the stack operations in the Cortex-M3 or Cortex-M4 processors are
based on full descending stack (SP decrement before store), the initial SP value
should be set to the first memory after the top of the stack region. For example,
if you have a stack memory range from 0x20007C00 to 0x20007FFF
(1Kbytes), the initial stack value should be set to 0x20008000
Reset and reset sequence
Introduction to STM32F40XX
STM32F407xx family is based on the high-performance
ARM® Cortex®-M4 32-bit RISC core operating at a
frequency of up to 168 MHz. The Cortex-M4 core features a
Floating point unit (FPU) single precision which supports all
ARM single-precision data-processing instructions and data
types. It also implements a full set of DSP instructions and a
memory protection unit (MPU) which enhances application
security.
STM32F407xx family incorporates high-speed embedded
memories (Flash memory up to 1 Mbyte, up to 192 Kbytes of
SRAM), up to 4 Kbytes of backup SRAM, and an extensive
range of enhanced I/Os and peripherals connected to two APB
buses, three AHB buses and a 32-bit multi-AHB bus matrix.
Introduction to STM32F40XX
All devices offer three 12-bit ADCs, two DACs, a low-power
RTC, twelve general-purpose 16-bit timers including two
PWM timers for motor control, two general-purpose 32-bit
timers. a true random number generator (RNG). They also
feature standard and advanced communication interfaces.
Up to three I2Cs
Three SPIs, two I2Ss full duplex. To achieve audio class accuracy, the I2S peripherals
can be clocked via a dedicated internal audio PLL or via an external clock to allow
synchronization.
Four USARTs plus two UARTs
An USB OTG full-speed and a USB OTG high-speed with full-speed capability (with the
ULPI),
Two CANs
An SDIO/MMC interface
Ethernet and the camera interface
STM32F40XX Applications
STM32F407xx microcontroller family suitable for a wide
range of applications:
Motor drive and application control
Medical equipment
Industrial applications: PLC, inverters, circuit breakers
Printers, and scanners
Alarm systems, video intercom, and HVAC
Home audio appliances
STM32F40XX Block diagram
STM32F40XX Block diagram
STM32F40XX Block diagram
STM32F40XX Block diagram
Adaptivereal-time memory accelerator(ARTAccelerator™)
The ART Accelerator™ is a memory accelerator which is
optimized for STM32 industry-standard ARM® Cortex®-M4
with FPU processors. It balances the inherent performance
advantage of the ARM Cortex-M4 with FPU over Flash
memory technologies, which normally requires the processor
to wait for the Flash memory at higher frequencies.
To release the processor full 210 DMIPS performance at this
frequency, the accelerator implements an instruction prefetch
queue and branch cache, which increases program execution
speed from the 128-bit Flash memory. Based on CoreMark
benchmark, the performance achieved thanks to the ART
accelerator is equivalent to 0 wait state program execution
from Flash memory at a CPU frequency up to 168 MHz.
Memory protection unit
The memory protection unit (MPU) is used to manage the CPU
accesses to memory to prevent one task to accidentally corrupt the
memory or resources used by any other active task. This memory
area is organized into up to 8 protected areas that can in turn be
divided up into 8 subareas. The protection area sizes are between
32 bytes and the whole 4 gigabytes of addressable memory.
The MPU is especially helpful for applications where some critical
or certified code has to be protected against the misbehavior of
other tasks. It is usually managed by an RTOS (real-time operating
system). If a program accesses a memory location that is prohibited
by the MPU, the RTOS can detect it and take action. In an RTOS
environment, the kernel can dynamically update the MPU area
setting, based on the process to be executed.
Embedded Flash memory
The STM32F40xxx devices embed a Flash memory of 512
Kbytes or 1 Mbytes available for storing programs and data.
CRC (cyclic redundancy check) calculation unit
The CRC (cyclic redundancy check) calculation unit is used to
get a CRC code from a 32-bit data word and a fixed generator
polynomial.
Among other applications, CRC-based techniques are used to
verify data transmission or storage integrity. In the scope of the
EN/IEC 60335-1 standard, they offer a means of verifying the
Flash memory integrity. The CRC calculation unit helps
compute a software signature during runtime, to be compared
with a reference signature generated at link-time and stored at
a given memory location.
Embedded SRAM