0% found this document useful (0 votes)
6 views13 pages

Module-1 MC Part-2

Uploaded by

veeresh biradar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
6 views13 pages

Module-1 MC Part-2

Uploaded by

veeresh biradar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 13
CHAPTER ARM PROCESSOR FUNDAMENTALS Chapter 1 covered embedded systems with an ARM processor. In this chapter we will focus on the actual processor itself. First, we will provide an overview of the processor core and describe how data moves between its different parts. We will describe the programmer's model from a software developer's view of the ARM processor, which will show you the functions of the processor core and how different parts interact. We will also take a look at the core extensions that form an ARM pracessor. Core extensions speed up and organize main memory as well a5 extend the instruction set. We will then cover the revisions to the ARM core architecture by describing the ARM core naming conventions used to identify them and the chronological changes to the ARM instruction set architecture. The final section introduces the architecture implementations by subdividing them into specific ARM processor core families. ‘A programmer can think of an ARM core as functional units connected by data buses, as shown in Figure 2.1, where, the arrows represent the flow of data, the lines represent the buses, and the boxes represent cither an operation unit or a storage area. The figure shows not only the ow of data but also the abstract components that make up an ARM core, Data enters the processor core through the Data bus. The data may be an instruction to exccute of a data item, Figure 2.1 shows a Von Neumann implementation of the ARM— data items and instructions share the same bus. In contrast, Harvard implementations of the ARM use two different buses. . The instruction decoder translates instructions before they ae executed. Each instruction executed belongs to a particular instruction set, “The ARM processor, like all RISC processors, uses a load-store architecture, This means it has two instruction types for transferring data in and ont of the processor: load instructions copy data from memory to registers in the core, and conversely the store v9 ic Scanned with CamScanner Chapter 2 "ARM Processor Fundamentals 20 + Chap! I rem Address Figure 2.1 ARM core dataflow model. instructions copy data from registers to memory, There are no data Processing instructions that directly manipulate data in memory. Thus, data processing is carried out solely ia Data items are placed in the register file—a storage bank made up of 32-bit registers Since the ARM cote is a 32-bit processor, most instructions treat the registers as holding signed or unsigned 32-bit values. The sign extend hardware converts signed 8-bit and 16-bit ‘umber to 32-bit values as they are read from memory and placed in a register. ARM instructi ARM ons typically have two source registers, Rn and Rm, and a single venete destination register, Rd. Source operands are reat from the register file using the inter: bbuses A and B, respectively, ‘The ALU (arit} thmetic logic unit) or MAC. (multipl ul it) takes the reg ter values Rn and R, iply-accumulate unit) e instructions gat 2 fom the A and B buses an mica eA and computes a result. Data procesid ana itectly to the register file. Load and store instruc use the ALU ly register file, Load an Address &"*T8€ aN address to be held in the address register and broadcast on Scanned with CamScanner 21 Registers 21 One important feature of the ARM is that register Rim alternatively can be preprocessed in the barrel shifter before it enters the ALU. Together the barrel shifter and ALU can calculate a wide range of expressions and addresses . After passing through the functional units, the result in Rdis written back to the register file using the Resultbus. For load and store instructions the incrementer updates the address register before the core reads or writes the next register value from or to the next sequential men ory location. The processor continues executing instructions until an exception or inte: rupt changes the normal execution flow. Now that you have an overview of the processor core we'll ake a more detailed look at some of the key components of the processor: the registers, the current program status register (cpsr), and the pipeline. 2.1 ReEcIsteRs General-purpose registers hold either data or an address. They are identified with the letter + prefixed to the register number. For example, register 4 is given the label r4. Figure 2.2 shows the active registers available in user mode—a protected mode normally Figure 2.2 _ Registers available in user mode, iit, Scanned with GamScanner 22. Chapter? |ARM Processor Fundamentals fate in seven diff used when executing applications, The processor oa oe seven differen moa, ~ which we will introduce shortly. Allthe rs prcesor talus pie 1 ‘There are up to 18 active registers ese pen The data registers are visible t6 the TOE ed to a particular task or special functin, 1s ar task 0 : a eh proce has et {ferent labels to differentiate them from the 113, 114, and 115. eres * “other registers. a = died “tii Figure 2.2, the shaded registers s identify the assigned spe’ am Register r13is traditionally used as the stack pointer in the current processor mode. es Register rldis called the link register whenever it calls subroutine. h , Register r15is the program counter (pc) and contains the address of the next instruction ss di jal-pur pose registers: (sp) and stores the head of the stag, (in) and is where the core puts the return address to be fetched by the processor. i the context, registers r13 and ri4 can also be used as general-purpose Depend en uae -iked during. procesor which can be particularly useful since these registers are bai it is dangerous to use 113 as a general register. when the processor because operating systems often assume that r13 registers, mode change. However, is Tanning any form of operating system be always points to a valid stack frame. ——— . Te eM sate the registers rOto r13are orfhogonalany instruction that you cen apply to rOyou can equally-well apply-to any of the other registers. ‘However, there are instructions that treat #74 9nd r151n a special way. “Th addition to the 16 data registers, there are two program status registers: cpsr and spsr (the current and saved program status registers, respectively). — “The register file contains all the registers available to a programmer. Which registers art visible to the programmer ‘depend upon the currentmode of the processor. 2.2 CurRENT PROGRAM STATUS REGISTER th , The ARM car wes i en to monitor and control internal operations. The cpst isa pepoera register and-resides in the register file. Figure 2.3 shows the basic layout -nerit" program status register. Note that the shaded par carson, shaded parts are reserved for fututt ‘The epsris divided into four i (tse) Gat Ids, each 8 bits wi , In current designs the ened el h 8bits wide! flags, status, extension, and control field contains the proce sion and status fields are relerbed fot fu : io tere contans the prosesor mode, state, and interrupt mask bite The fig econ 0 abieats.., be fond fe Peoseser cores have extra bits allocated, fo Se fags fel ated. For example, the Jbit, which , is only avail Only available on Jazee-enabled processors, which exe3!# Scanned with CamScanner 2.2 Current Program Status Register 23 Fields Flags Status Extension Control p—Pxension 4 Control Bit 2 " - 23. 1654 o FIT] Mode Function = — Cendition Interrupt | Processor aes Masks mode Thumb state Figure 2.3 A generic program status register (ps). -S-bit instructions. We will discuss jazelle more in Section 2.2.3. It is highly probable that future designs will assign extra bits for the monitoring and control of new features. For a full description of the epsr, refer to Appendix B. .1 PROCESSOR MODES The processor mode determines which registers are active and the access rights to the cpsr register itself. Each processor mode is either privileged or nonprivileged: A privileged mode allows full read-write access to the epsr. Conversely, a nonprivileged mode only allows tead access to the control field in the cpsr but still allows read: write access to the condition flags. There are Sevent)processor modes in total: six privileged modes (abort, fast interrupt request, interrupt request, supervisor, system, and undefined) and one nonprivileged mode (user). ~ ~ _ ~The processor enters‘abori ode when there is failed attempt to access memory. Fast interrupt request and interrupt request modes correspond to the two interrupt levels available ‘on the ARM processor/Supervisorymode is the mode that the processor is in after reset and ‘is generally the mode that af operating system kernel operates in, System mode is a special version of user mode that allows full read-write access to the epsr. Undefined mode is used when the processor encounters an instruction that is undefined or not supported by the implementation. User modes used for programs and applications. Scanned with CamScanner 26 Chapter? ARM Processor Fundamentals Table 2.1° Processor mode. r Mode(4:9] Mode Abbreviation uit Abort abt i Fast interrupt request fiq 7 10010 i " 10011 Interrupt request irq yes Supervisor sve = mut 1101 a wd . 10000 Undefined und nd User note is that the cpsris not copied into the spsr when a Asai pe ae to sr ‘ting directly to the cpsr. The saving of the epsr is forced due to a program writing directly mode change is pane = ay _ mt i cu ‘active processor mode occupies the five Teast significant bits ofthe ar. When power is applied to the core, it starts in supervisor ode, which is privileged. Starting in a priyileged: ‘mode is useful since initialization code can use full access, the stacks for each of the other mod@s- ee ts tie various modes and the associated binary patterns. The last column of the tbe isthe bit paters tha represent each ofthe processor modes inthe eps 2.2.3 STATE AND INSTRUCTION SETS ‘The state of the core determines which instruction set is being executed. There are three instruction sets: ARM, Thumb, and Jazelle. The ARM instruction set is only active when the processor is in ARM state. Similarly the Thumb instruction set is only active when the processor is in Thumb state. Once in Thumb state the processor is executing purely Thumb 16-bit instructions. You cannot intermingle sequential ARM, Thumb, and Jazele instructions. ‘The Jazelle Jand Thumb Tbits in the cpsr reflect the state of the processor. When both Jand Tits are 0, the processor is in ARM state and executes ARM instructions. This is the case when power is applied to the processor. When the T bit is 1, then the processor is in Thumb state. To change states the core executes a specialized branch instruction, Table 22 compares the ARM and Thumb instruction set features, ‘The ARM designers introduced a third instruction set called Jazelle, Jazelle executes 8-bit instructions and is a hybrid mix of soft igned execution of java neediee ware and hardware designed to speed up To execute Java bytecodes, you require the Jazelle technology plus a specially modified version of the Java virtual machine. It is important to note that the hardware portion ¢ Jazelle only supports a subset of the Java bytecodes; the rest are emulated in softval Scanned with CamScanner 2.2 Current Program Status Register 27 Table 2.2 ARM and Thumb instruction set features. ‘ARM (epsr T= 0) Thumb ( jon to be executed. Decode identifies the insist rites the result back t 2 register Execute processes the instruction . , «sive using a simple example. It shows a sequence of three _ rst i ei oy po” Hk wn bar 7 an i te after the pipeline is filled. | ast com edit the pipeline sequen. tthe fist cle be core fetches the ADD instruction from memory. In the second cycle the core fetches the ‘SUB instruction and decodes the ADD instruction. In the third cycle, both the SUB and ‘The ADD instruction is executed, the SU3 ‘ADD instructions are moved along the pipeline. 1 instruction is decoded, and the CMP instruction is fetched. This procedure is called filling the pipeline. The pipeline allows the core to execute an instruction every cycle. the amount of work done at each stage is reduced, As the Pigne ngh ness which allows the processor to attain a higher operating frequency. This in turn increases the performance. The system latency.also increases Because it takes more cycles to fill he pipeline before the corecan executean instruction, The increased pipeline fength also meass there can be data dependency between certain stages. You can write code to-reduce this dependency ising instruction scheduling (for more information on instruction scheduling takea look at Chapter 6). - Figure 2.8 Pipelined instruction Sequence, Scanned with CamScanner 2A Exceptions, Interrupts, and the Vector Table 33 2.4 Exceptions, In » INTER TABLE RUPTS, AND THE VECTOR wea excention on upt occurs, the processor sets the pe to a spec mory adres. cldress is within a special aduress range called the vector table. The enti te vector table are instructions that branch to specific routines designed to handle Par icular exception or interrupt Oo von BEBO ‘map address 0x00000000 is reserved for the vector table, a set of 32-bit Words. On some processors the vector table can be optionally located at a higher address in memory (starting at the offset Oxff¥10000). Operating systems such as Linux and Microsoft's embedded products can take advantage of this feature. When an exception or interrupt occurs, the processor suspends normal execution and starts loading instructions from the exception vector table (see Table 2.6). Bach vector table entry contains a form of branch instruction pointing to the start ofa specific routine: executed by the processor when power tior ® Reset vectoris the location of the first instructi is applied. This insiruction branches to the init = Undefined instruction vector is used when the processor cannot decode an instruction. = Software interrupt vector is called when you execute a SWI instruction. The SWI instruction is frequently used as the mechanism to invoke an operating system routine. ® Prefetch abort vector occurs when the: processor atternpts to fetch, an instruction from an address without the correct access permissions. “The actual abort occurs in the decode rout the correct access permissions. stage. Data abort vectoris similar to a prefetch abort buts raised when an instruction attempts to access data memory without the correct access permissions. Interrupt request vectors used by external hardware to interrupt the normal execution flow ofthe processor. It can only beTaised if IRQs are not masked in he Pr. Table 2.6 The vector table. Exception/interrupt Shorthand ‘Address High address Reset RESET 0x00000000 oxs¢rro00g i F fff instruction UNDEF 0x0000004 ox ee eEpt Swi 0x00000008, oxff ff0008 Prefetch abort PABT 0x0000000c oxtF¥o00e . Data abort DABT 0x00000010 oxff¢f00 10 Reserved - 0x00000014 ont caI8 IRQ 0x00000018 ont Interrupt request Fast interrupt request FQ ox0000001¢ Scanned with CamScanner 2.5 CORE EXTENSIONS The hardware extensions covered in this section are standard componeiits placed next, the ARM core. They improve performance, manage resources, and provic€ extra function, . They improve pet ' and are designed to provide flexibility in. handling particular applications. Each ARM funy, i ns available. \ ea leréat erensions rae asians around the core cache and th, coupled memory, memory management, and the coprocessor interface. =a 2.5.1 CACHE AND TIGHTLY COUPLED MEMORY The cache is a block of fast memory placed between main memory and the core. Itallovsf; more efficient fetches from some memory types. With a cache the processor core an Pra for the majority of the time without having to wait for data from slow external niemon, Most ARM-based embedded systems use a single-Tevel cache internal to the processo, ‘Ofcourse, many small embedded systems do not require the performance gains that} cache brings, ARM has two forms of cache. The first is found attached to the Von Neumann-styl cores. It combines both data and instruction into a single unified cache, as shown in ‘Figure 2.13. For simplicity, we have called the glue logic that connects the memory system to the AMBA bus logic and control. — —— —— = On-chip AMBA bus gure 2.13 A simplified Von Neumann architecture with cache, Scanned with CamScanner 2.5 Core Extensions 35 ‘On-chip AMBA bus Figure2.14 A simplified Harvard architecture with TCMs, By contrast, the second-form, attached to the Harvard-style cores, has separate caches for data and instruction, —— = os ‘Keache provides an overall increase in performance but at the expense of predictable execution, But for real-time systems itis paramount that code executic leterministic— the time taken for loading and storing instructions or data must be pr is achjeved using a form of memory called tightly coupled memory (TCM). TCM is fast SRAM. located close to the core and guarantees the clock cycles required to fetch instructions or data—critical for real-time algorithms requiring deterministic behavior. TCMs appear as memory in the address map and'can be accessed as fast memory. An example ofa processor with TCMs is shown in Figure 214— by’ combining bothechnalogies ARM procesrscanharebth improved performance and predictable real-time response. Figure 2.15 shows an example core witha combination of caches ani —_ ~— 2.5.2. MEMORY MANAGEMENT Embedded systems often use multiple memory devices. It is usually necessary to have a method to help organize these devices and protect the system from applications trying to make inappropriate accesses to hardware. ‘This is achieved with the assistance of memory management hardware. —_ “ARM cores have three different types of memory management hardware—no extensions providing no protection, a memory protection unit (MPU), providing limited proteon, and a memory management unit (MMU) providing. fall protection: —_—_ 8 Nonprotected memory'is fixed and provides very little flexibility. is normally used for small simple embedded systems that require no protection from rogue appUcauons Scanned with CamScanner 446 Chapter 2. ARM Processor Fundamentals Instruction cache D+l ‘On-chip AMBA bus Figure 2.15 A simplified Harvard architecture with caches and TCMs. = MPUs employ a simple system that uses a limited number of memory y regions. Thes regions are controlled with a set of special coprocessor registers, and each region i defined with Specific access permissions. This type of memory management is used _for systems that Tequire memary protection but don’t have a complex memory mip ‘The MPU is explained in Chapter 13. = MMUsare the most comprehensive memory management hardware available on ARM. The MMU uses a set of translation tables to provide fine-grained control o memory. These ‘ols te sie In sin memory ored in main memory and provide fe vinaalto pip adress map as well as access permissions. MMUs are designed for more sophist- etn form operating systems that support multitasking. The MMU ‘5 explained a pter 14. 25.3 COPROCESSORS Coprocessors can be attached to the ARM processor. A co} i L cessor. A coprocessor extends the processing features sts ore by extending the instruction seo} by providing commguraton te isters. More ided to ion ore tan_one coprocessor can be added to the ARM core via the coprocessor The copracesso: “ee rere 7 mH can be accessed through a group of dedicated ARM instructions TeSSOF Gace ne PE interface. Consider, for example, coprocessor 15: THe Processor ilses coproce: management, Processor 15 registers to control the cache, TCMs, and memory The coprocessor can ali 2 80 extend the instructi of new instructions, For e uatio Nn set by providing a specialized prov? example, there are a set of specialized instrestms that Scanned with CamScanner

You might also like