0% found this document useful (1 vote)
553 views31 pages

Coa Da1

The document provides information about a student named Aishwarya Tapadiya who selected the ARM Cortex-M1 microprocessor for a digital assignment. It includes block diagrams of the Cortex-M1 processor with and without debug features. It also describes the instruction set, addressing modes, processor modes and operating states of the Cortex-M1. Finally, it explains the architecture of the Cortex-M1, which implements a subset of the Thumb-2 architecture and includes components like an ALU, multiplier, shifter, control unit and register file.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
553 views31 pages

Coa Da1

The document provides information about a student named Aishwarya Tapadiya who selected the ARM Cortex-M1 microprocessor for a digital assignment. It includes block diagrams of the Cortex-M1 processor with and without debug features. It also describes the instruction set, addressing modes, processor modes and operating states of the Cortex-M1. Finally, it explains the architecture of the Cortex-M1, which implements a subset of the Thumb-2 architecture and includes components like an ALU, multiplier, shifter, control unit and register file.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Name: Aishwarya Tapadiya

Reg. No: 17BEC0194


Slot: C2 + TC2
Course: Computer Organisation and Architecture

Digital Assignment - 1

Select a Micro Processing element (Microprocessor/controller/IP


core) from any vendor of your choice to answer the following.
ARM Cortex-M1
A small processor design optimized for FPGA designs and provides
Tightly Coupled Memory (TCM) implementation using memory
blocks on the FPGAs. Same instruction set as the Cortex-M0

Question 1.
a. Draw the clear Block Diagram of the Microprocessor / Micro
controller/ IP core selected.
b. Give the internal Diagram of each and every block.
Processor with Debug Block Diagram:
The main blocks of the processor with debug are:
1. Core
2. NVIC
3. Bus master
4. AHB-PPB
5. Debug.
Description:

Core

The core has the following main features:

 3-stage pipeline
 multiply cycles:
o three cycles for normal multiplier

o 33 cycles for small multiplier.

 Thumb state
 Handler and Thread modes
 ISR entry and exit
o processor state saving and restoration, with no
instruction fetch overhead
o tightly-coupled interface to interrupt controller enabling
efficient processing of late-arriving interrupts.
 LE and BE-8 data endianness support.

Registers
The processor contains:

 13 general purpose 32-bit registers.


 Link Register (LR).
 Program Counter (PC).
 Program Status Register, xPSR.
 Two banked SP registers. Without the OS extension option
there is only one SP register present.

NVIC

The NVIC is tightly coupled to the processor core. This facilitates low-
latency exception processing. The main features include:

 a configurable number of external interrupts, 1, 8, 16, or 32


 a fixed number of bits of priority, 2 bits, providing four levels of
configurable priority
 both level and pulse interrupt support
 processor state automatically saved on interrupt entry and
restored on interrupt exit, with no instruction overhead.

Bus master

The Bus master provides a maximum of two interfaces. One master


interface connects the internal Private Peripheral Bus (PPB) signals to
the AHB PPB. The other master interface connects external bus
signals to the AHB port.
AHB-PPB

The AHB Private Peripheral Bus (AHB-PPB) is used to access the:

 NVIC
 the debug components when present.

Debug

There are two configurations for debug:

 The full debug configuration has four breakpoint comparators


and two watchpoint comparators. This is the default
configuration.
 The reduced debug configuration has two breakpoint
comparators and one watchpoint comparator.

Processor without Debug Block Diagram:


The main blocks of the processor without debug are:
1. Core
2. Core memory interface
3. NVIC
4. Bus master
5. AHB-PPB.

Description:

Core memory interface

Core access to Tightly-Coupled Memories (TCMs) is made exclusively


through a dedicated core memory interface.

The core memory interface comprises:


 one core Instruction Tightly-Coupled Memory (ITCM) interface
to access ITCM
 one core Data Tightly-Coupled Memory (DTCM) interface to
access DTCM.
Because reads are speculatively fetched from TCMs, Device and
Strongly-Ordered memory types are not supported, for example
FIFOs in TCM space. You must ensure that any Flash memory in this
space is tolerant of extra accesses at all times. The TCM interface
does not support wait states.
Question 2.
Provide the Instruction set of the Micro Processing element
(Microprocessor/ Microcontroller/ IP core) with clock cycle
required/Machine cycle required, memory required, flags affected,
type of instruction based on addressing modes, operation etc.
The Cortex- M1 processors are based on the ARMv6-M architecture,
which has a small instruction set of just 56 instructions and most of
them are 16-bit. However, the registers in the processor and the
data being operated on are still 32-bit. For most simple I/O control
tasks and general data processing, this small instruction set is
sufficient.

 Instruction sets:
o Thumb-1 (most), missing CBZ, CBNZ, IT.

o Thumb-2 (some), only BL, DMB, DSB, ISB, MRS, MSR.

o 32-bit hardware integer multiply with 32-bit result.

 1 to 32 interrupts, plus NMI.

ARM instructions can be made to execute conditionally by post


fixing them with the appropriate condition code field. This
improves code density and performance by reducing the number
of forward branch instructions.

By default, data processing instructions do not affect the


condition code flags but the flags can be optionally set by using
“S”.

CMP does not need “S”.

Loop: SUBS r1,r1,#1

BNE Loop
Data processing Instructions:

Consist of:

Arithmetic: ADD ADC SUB SBC RSB RSC

Logical: AND ORR EOR BIC

Comparisons: CMP CMN TST TEQ

Data movement: MOV MVN

Syntax: <operation>{<cond>}{S} Rd, Rn, Operand2

Symbol Condition Flag


EQ Equal Z set
NE Not equal Z clear
Carry set/unsigned
CS/HS C set
higher or same
Carry clear/unsigned
CC/LO C clear
lower
MI Minus/negative N set
PL Plus/positive or zero N clear
VS Overflow V set
VC No overflow V clear
HI Unsigned higher C set and Z clear
Unsigned lower or
LS C clear or Z set
same
Signed greater than or N set and V set, or N clear and
GE
equal V clear (N == V)
N set and V clear, or N clear
LT Signed less than
and V set (N != V)
Symbol Condition Flag
Z clear, and either N set and V
GT Signed greater than set, or N clear and V clear (Z ==
0, N == V)
Z set, or N set and V clear, or N
Signed less than or
LE clear and V set (Z == 1 or N !=
equal
V)
AL Always (unconditional) —

Processor Modes:

 User - used for executing most application programs


 FIQ - used for handling fast interrupts
 IRQ - used for general-purpose interrupt handling
 Supervisor - a protected mode for the Operating System
 Undefined - entered upon Undefined Instruction exceptions
 Abort - entered after Data or Pre-fetch Aborts
 System - privileged user mode for the Operating System
 Monitor - a secure mode for Trust Zone

Processor Operating States

The Cortex-M1 processor has two operating states:

• Thumb state – This is normal execution, running the set of 16-bit,


halfword-aligned Thumb and Thumb-2 instructions; as well as the 32-
bit BL, MRS, MSR, ISB, DSB, and DMB instructions.

• Debug state – This is the state when halting debug Processor


Operating Modes

The Cortex-M1 processor supports two modes of operation:


• Thread mode – Entered on Reset, and can be re-entered as the
result of an exception return

• Handler mode – Entered as the result of an exception

The ARM Cortex M1 processor supports ten addressing modes.

[Rn, #±imm] Immediate offset

Address accessed is imm more/less than the address


found in Rn. Rn does not change.
[Rn] Register

Address accessed is value found in Rn. This is just


shorthand for [Rn, #0].
[Rn, ±Rm, shift] Scaled register offset

Address accessed is sum/difference of the value in


Rn and the value in Rm shifted as specified. Rn and
Rm do not change values.
[Rn, ±Rm] Register offset

Address accessed is sum/difference of the value in


Rn and the value in Rm. Rn and Rm do not change
values. This is just shorthand for [Rn, ±Rm, LSL #0].
[Rn, #±imm]! Immediate pre-indexed

Address accessed is as with immediate offset mode,


but Rn's value updates to become the address
accessed.
[Rn, ±Rm, shift]! Scaled register pre-indexed

Address accessed is as with scaled register


offset mode, but Rn's value updates to become the
address accessed.
[Rn, ±Rm]! Register pre-indexed
Address accessed is as with register offset mode, but
Rn's value updates to become the address accessed.
[Rn], #±imm Immediate post-indexed

Address accessed is value found in Rn, and then Rn's


value is increased/decreased by imm.
[Rn], ±Rm, shift Scaled register post-indexed

Address accessed is value found in Rn, and then Rn's


value is increased/decreased by Rm shifted
according to shift.
[Rn], ±Rm Register post-indexed

Address accessed is value found in Rn, and then Rn's


value is increased/decreased by Rm. This is just
shorthand for [Rn], ±Rm, LSL #0.

Question 3.
Explain Micro processing element (Microprocessor/
Microcontroller/ IP core) Architecture.
The Cortex-M1 processor implements a subset of the Thumb-2
(ARMv7) architecture called ARMv6-M. This includes all of the 16-bit
Thumb-2 instructions and some of the 32-bit instructions.

The ARM Architecture

 Arithmetic Logic Unit


 Booth multiplier
 Barrel shifter
 Control unit
 Register file

 The ARM processor conjointly has other components like the


Program status register, which contains the processor flags (Z,
S, V and C). The modes bits conjointly exist within the program
standing register, in addition to the interrupt and quick
interrupt disable bits; Some special registers: Some registers
are used like the instruction, memory data read and write
registers and memory address register.

 Priority encoder: The encoder is used


in the multiple load and store instruction to point which
register within the register file to be loaded or kept.

 Multiplexers: several multiplexers are accustomed to the


management operation of the processor buses. Because of the
restricted project time, we tend to implement these
components in a very behavioural model. Each component is
described with an entity. Every entity has its own architecture,
which can be optimized for certain necessities depending on its
application. This creates the design easier to construct and
maintain.

Arithmetic Logic Unit (ALU)

The ALU has two 32-bits inputs. The primary comes from the register
file, whereas the other comes from the shifter. Status registers flags
modified by the ALU outputs.

The V-bit output goes to the V flag as well as the Count goes to the C
flag. Whereas the foremost significant bit really represents the S
flag, the ALU output operation is done by NORed to get the Z flag.

The ALU has a 4-bit function bus that permits up to 16 opcodes to be


implemented.

Booth Multiplier Factor

The multiplier factor has 3 32-bit inputs and the inputs return from
the register file. The multiplier output is barely 32-Least Significant
Bits of the merchandise.

The multiplication starts whenever the beginning 04 input goes active.


Fin of the output goes high when finishing.

Booth Algorithm

Booth algorithm is a noteworthy multiplication algorithmic rule for 2’s


complement numbers. This treats positive and negative numbers
uniformly.

Moreover, the runs of 0’s or 1’s within the multiplier factor are
skipped over without any addition or subtraction being performed,
thereby creating possible quicker multiplication. It’s clear that the
multiplication finishes only in16 clock cycle.

Barrel Shifter
The barrel shifter features a 32-bit input to be shifted. This input is
coming back from the register file or it might be immediate data. The
shifter has different control inputs coming back from the instruction
register.

The Shift field within the instruction controls the operation of the
barrel shifter. This field indicates the kind of shift to
be performed (logical left or right, arithmetic right or rotate right).

The quantity by which the register ought to be shifted is contained in


an immediate field within the instruction or it might be the lower 6
bits of a register within the register file.

Control Unit

For any microprocessor, control unit is the heart of the whole process
and it is responsible for the system operation, so the control unit
design is the most important part within the whole design.

The control unit is sometimes a pure combinational circuit design.


Here, the control unit is implemented by easy state machine. The
processor timing is additionally included within the control unit.

Signals from the control unit are connected to each component within
the processor to supervise its operation.
Question 4:
Correlate the Micro processing element
(Microprocessor/controller/IP core) architecture with architectures
discussed in Module 1.

Architecture of 8051 MicroController

Architecture of ARM Cortex M1


 ARM- If the user needs fast computing, large number of timer
and ADC’s then usage of ARM will be suitable.

 8051- If the user wants a cheap controller with basic


functions then 8051 will suffice. It will be of great use in your
low-cost college projects.

 8051 has Harvard architecture (separate memory spaces for


RAM and program memory). ARM has von Neumann
architecture (program and RAM in the same space).

 ARM has a 16 and/or 32-bit architecture. The 8051 is byte (8-


bit) architecture.

 8051 has limited stack space – limited to 128 bytes for the
8051. Writing a C compiler for these architectures must have
been challenging, and compiler choice is limited.

 8051 and ARM can directly address all available RAM.

 8051 need multiple clock cycles per instruction. ARM execute


most instructions in a single clock cycle..

 8051 uses CISC Architecture and ARM Cortex-M1 uses RISC


Architecture.

 8051 has a bus width of 8 bit in size and ARM Cortex M1 has a
bus width of 32 bits.
Question 5:
Elaborate Register organisation of Micro processing element.

The processor has the following 32-bit Registers:

• 13 general purpose registers, R0–R12


• Stack Pointer (SP), R13
• Link Register (LR), R14
• Program Counter (PC), R15
• Program status registers, xPSR

General Purpose Registers


The general-purpose registers, R0–R12, have no architecture-specific
uses.
• Low registers – Registers R0–R7 are accessible by all instructions
that specify a general-purpose register.
• High registers – Registers R8–R12 are accessible by some, but not
all 16-bit instructions.

The R13, R14, and R15 registers have the following special functions:
• Stack pointer – Register R13 is used as the Stack Pointer (SP).
Because the SP ignores writes to bits [1:0], it is auto-aligned to a
word (four-byte) boundary. The stack pointer has banked aliases,
SP_process and SP_main, when the processor has been configured
with OS extensions present. When OS extensions are absent, only a
single stack pointer, SP_main, is present.
• Link register – Register R14 is the subroutine Link Register (LR). The
LR receives the return address from the Program Counter (PC) when
a Branch and Link (BL) instruction is executed. The LR is also used for
exception returns. At all other times, R14 can be treated as a
general-purpose register.
• Program counter – Register R15 is the Program Counter (PC). Bit [0]
is always 0, so instructions are always aligned to 16-bit halfword
(two-byte) boundaries.

Special Purpose Program Status Registers (xPSR)


Processor status at the system level is broken into three categories
and can be accessed as individual registers, a combination of any two
of the three, or a combination of all three using the MRS and MSR
instructions.
• Application PSR (APSR):
Contains the condition code flags. Before entering an exception, the
processor saves the condition code flags on the stack. The APSR can
be accessed with the MSR and MRS instructions.
• Interrupt PSR (IPSR):
Contains the Interrupt Service Routine (ISR) number of the current
exception
• Execution PSR (EPSR):
Contains the Thumb state bit (T-bit). Unless the processor is in Debug
state, the EPSR is not directly accessible and all fields read as zero
using an MRS instruction.
MSR instruction writes are ignored. On entering an exception, the
processor saves the combined information from the three status
registers on the stack.

Question 6:
Explain the type of algorithm used for signed/unsigned Calculation
(Addition, Multiplication etc.).

Name Logical Arithmetic Instruction


Instruction
Bit 31 of the result is set.
N
No meaning Indicates a negative number in
(Negative)
signed operations
Result is all
Z (Zero) Result of operation was zero
zeroes
After Shift
C (Carry) operation, ‘1’ was Result was greater than 32 bits
left in carry flag
The signed two’s complement
V result requires more than 32
No meaning
(oVerflow) bits. Indicates a possible
corruption of the result
On the ARM Cortex M1 processor, n is usually assumed to be 32-bits,
because that is the natural word size for the ARM processor. Adding
64-bit numbers requires two add instructions and the carry from the
least-significant 32 bits must be added to the sum of the most-
significant 32 bits. The ARM processor provides a convenient way to
perform the add with carry. Assume we have two 64 bit
numbers, x and y. We have x in r0, r1 and y in r2, r3, where the high
order words of each number are in the higher-numbered registers,
and we want to calculate x = x + y.

On the ARM processor, the algorithm to multiply two 32-bit unsigned


integers is very efficient.

If x or y is a constant, then a loop is not necessary. The multiplication


can be directly translated into a sequence of shift and add
operations. This will result in much more efficient code than the
general algorithm.
For addition:
MOV R0, #0 ; R0 accumulates total
MOV R1, #10 ; R1 counts from 10 down to 1
again ADD R0, R0, R1
SUBS R1, R1, #1
BNE again
halt B halt ; infinite loop to stop computation

Question 7:
Find what type of control logic unit used and correlate its
performance with the type of control unit in Module 3 in COA
syllabus.

The ARM microcontroller stands for Advance Risk Machine; it is one


of the extensive and most licensed processor cores in the world. The
first ARM processor was developed in the year 1978 by Cambridge
University, and the first ARM RISC processor was produced by the
Acorn Group of Computers in the year 1985. These processors are
specifically used in portable devices like digital cameras, mobile
phones, home networking modules and wireless communication
technologies and other embedded systems due to the benefits, such
as low power consumption, reasonable performance, etc.

For any microprocessor, control unit is the heart of the whole process
and it is responsible for the system operation, so the control unit
design is the most important part within the whole design. The control
unit is sometimes a pure combinational circuit design. Here, the
control unit is implemented by easy state machine. The processor
timing is additionally included within the control unit. Signals from the
control unit are connected to each component within the processor
to supervise its operation.

Control Unit (8051)


 The Control unit consists of a microcontroller which receives
signals from the sensors and decides what operations to be
performs by the system. The microcontroller used in the
module 4 of COA was 8051 cores.
 We can interface it with a DC motor whose motion can be
controlled with the received signals from the sensors.
 It’s a low power, high performance CMOS 8-bit microcontroller
with 4K bytes of In-System Programmable Flash memory, 128
bytes of RAM, 32 I/O lines, Watchgod timer, two data pointers,
two 16 bit timer/counters, a five-vector two-level interrupt
architecture, a full duplex serial port, on-chip oscillator and
clock circuitry.
Question 8:
Explain the memory management system in the processing element
selected. List its Addressing capabilities, Memory mapping
technique followed. Find the advantage of memory design and
management techniques over the concepts given in Module 4 in
COA syllabus.

The Memory Management Unit (MMU) performs translations.


The MMU contains the following:
 The table walk unit, which contains logic that reads the
translation tables from memory.
 Translation Lookaside Buffers (TLBs), which cache recently
used translations.
All memory addresses that are issued by software are virtual.
These memory addresses are passed to the MMU, which checks
the TLBs for a recently used cached translation. If the MMU does
not find a recently cached translation, the table walk unit reads
the appropriate table entry, or entries, from memory
A virtual address must be translated to a physical address before
a memory access can take place (because we must know which
physical memory location we are accessing). This need for
translation also applies to cached data, because on Armv6 and
later processors, the data caches store data using the physical
address (addresses that are physically tagged). Therefore, the
address must be translated before a cache lookup can complete.

8051 Memory Organization


The 8051 microcontroller's memory is divided into Program Memory
and Data Memory. Program Memory (ROM) is used for permanent
saving program being executed, while Data Memory (RAM) is used
for temporarily storing and keeping intermediate results and
variables.

Program Memory (ROM)


Program Memory (ROM) is used for permanent saving program
(CODE) being executed. The memory is read only. Depending on the
settings made in compiler, program memory may also used to store
a constant variables. The 8051 executes programs stored in program
memory only. code memory type specifier is used to refer to
program memory.
8051 memory organization alows external program memory to be
added.
How does the microcontroller handle external memory depends on
the pin EA logical state.

Internal Data Memory


Up to 256 bytes of internal data memory are available depending on
the 8051 derivative. Locations available to the user occupy
addressing space from 0 to 7Fh, i.e. first 128 registers and this part of
RAM is divided in several blocks. The first 128 bytes of internal data
memory are both directly and indirectly addressable. The upper 128
bytes of data memory (from 0x80 to 0xFF) can be addressed only
indirectly.

Since internal data memory is used for CALL stack also and there is
only 256 bytes splited over few different memory areas fine utilizing
of this memory is crucial for fast and compact code. See types
efficiency also.
Memory block in the range of 20h to 2Fh is bit-addressable, which
means that each bit being there has its own address from 0 to 7Fh.
Since there are 16 such registers, this block contains in total of 128
bits with separate addresses ( Bit 0 of byte 20h has the bit address 0,
and bit 7 of byte 2Fh has the bit address 7Fh).
Three memory type specifiers can be used to refer to the internal
data memory: data, idata, and bdata.

External Data Memory


Access to external memory is slower than access to internal data
memory. There may be up to 64K Bytes of external data memory.
Several 8051 devices provide on-chip XRAM space that is accessed
with the same instructions as the traditional external data space.
This XRAM space is typically enabled via proper setting of SFR
register and overlaps the external memory space. Setting of that
register must be manually done in code, before any access to
external memory or XRAM space is made.
The mikroC PRO for 8051 has two memory type specifiers that refers
to external memory space: xdata and pdata.

Question 9:
Elaborate the pipeline and parallel processing capabilities of the
unit and hazard management technique.

Instruction pipelining is a technique for implementing instruction-


level parallelism within a single processor. Pipelining attempts to
keep every part of the processor busy with some instruction by
dividing incoming instructions into a series of sequential steps.

If the processor has the 5 steps listed in the initial illustration,


instruction 1 would be fetched at time t1 and its execution would be
complete at t5. Instruction 2 would be fetched at t2 and would be
complete at t6. The first instruction might deposit the incremented
number into R5 as its fifth step (register write back) at t5. But the
second instruction might get the number from R5 (to copy to R6) in
its second step (instruction decode and register fetch) at time t3.

 IF: Fetches the instruction into the instruction register.


 ID: Instruction Decode, decodes the instruction for the opcode.
 AG: Address Generator, generates the address.
 DF: Data Fetch, fetches the operands into the data register.
 EX: Execution, executes the specified operation.
 WB: Write back, writes back the result to the register.
Pipelining Hazards
In a typical computer program besides simple instructions, there are
branch instructions, interrupt operations, read and write
instructions. Pipelining is not suitable for all kinds of instructions.

In most of the computer programs, the result from one instruction is


used as an operand by the other instruction. When such instructions
are executed in pipelining, break down occurs as the result of the first
instruction is not available when instruction two starts collecting
operands. So, instruction two must stall till instruction one is executed
and the result is generated. This type of hazard is called Read –after-
write pipelining hazard.

Execution of branch instructions also causes a pipelining hazard.

Branch instructions while executed in pipelining effects the fetch


stages of the next instructions.
Pipelined Branch Behaviour
Advantages of Pipelining
 Instruction throughput increases.
 Increase in the number of pipeline stages increases the number of
instructions executed simultaneously.
 Faster ALU can be designed when pipelining is used.
 Pipelined CPU’s works at higher clock frequencies than the RAM.
 Pipelining increases the overall performance of the CPU.

Disadvantages of Pipelining
 Designing of the pipelined processor is complex.
 Instruction latency increases in pipelined processors.
 The throughput of a pipelined processor is difficult to predict.
 The longer the pipeline, worse the problem of hazard for branch
instructions.

Pipelining benefits all the instructions that follow a similar sequence


of steps for execution. Processors that have complex instructions
where every instruction behaves differently from the other are hard
to pipeline. Processors have reasonable implements with 3 or 5 stages
of the pipeline because as the depth of pipeline increases the hazards
related to it increases.
Question 10:
Find whether processor level parallelism is available in the element
selected. Illustrate cache coherence and protocol used for handling
memory HIT or MISS.

Cache coherency is an important concept to understand when


sharing data. Disabling caches can impact performance; software
coherency adds overheads and complexity; and hardware coherency
manages sharing automatically which can simplify software. The
AMBA 4 ACE bus interface extends hardware cache coherency
outside of the processor cluster and into the system.

There are three mechanisms to maintain coherency:

 Disable caching is the simplest mechanism but may cost significant


CPU performance. To get the highest performance processors are
pipe-lined to run at high frequency, and to run from caches which
offer a very low latency. Caching of data that is accessed multiple
times increases performance significantly and reduces DRAM
accesses and power. Marking data as “non-cached” could impact
performance and power.
 Software managed coherency is the traditional solution to the data
sharing problem. Here the software, usually device drivers, must
clean or flush dirty data from caches, and invalidate old data to
enable sharing with other processors or masters in the system. This
takes processor cycles, bus bandwidth, and power.
 Hardware managed coherency offers an alternative to simplify
software. With this solution any cached data marked ‘shared’ will
always be up to date, automatically. All processors and bus masters
in that sharing domain see the exact same value.

Software managed coherency manages cache contents with two key


mechanisms:

 Cache Cleaning (flushing):


 If any data stored in a cache is modified, it is marked as ‘dirty’ and
must be written back to DRAM at some point in the future. The
process of cleaning or flushing caches will force dirty data to be
written to external memory.
 Cache Invalidation:
 If a processor has a local copy of data, but an external agent
updates main memory then the cache contents are out of date, or
‘stale’. Before reading this data, the processor must remove the
stale data from caches, this is known as ‘invalidation’ (a cache line
is marked invalid). An example is a region of memory used as a
shared buffer for network traffic which may be updated by a
network interface DMA hardware; a processor wishing to access
this data must invalidate any old stale copy before reading the
new data.
 An access to an item which is in the cache is called a hit, and an
access to an item which is not in the cache is a miss.
Separate Data and Instruction Caches

Unified Instruction and Data Cache

 A processor can have one of the following two organizations: –


 A unified cache: This is a single cache for both instructions and
data.
 Separate instruction and data caches: This organization is
sometimes called a modified Harvard architecture

You might also like