0% found this document useful (0 votes)
16 views

Module 4 DDCO

Uploaded by

gj7cpz4fzy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Module 4 DDCO

Uploaded by

gj7cpz4fzy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Module-4

INPUT/OUTPUT
ORGANIZATION
Syllabus
• Accessing I/O Devices,
• Interrupts – Interrupt Hardware,
• Enabling and Disabling Interrupts,
• Handling Multiple Devices,
• Direct Memory Access: Bus Arbitration,
• Speed, size and Cost of memory systems.
• Cache Memories – Mapping Functions.
Introduction
• One of the basic features of a computer is its ability to
exchange data with other devices.
• This communication capability enables a human operator. We
make extensive use of computers to communicate with other
computers over the Internet and access information around the
globe.
• Computers are an integral part of home appliances,
manufacturing equipment, transportation systems, banking, and
point-of sale terminals.
• In such applications, input to a computer may come from a
sensor switch, a digital camera, a microphone, or a fire alarm.
Output may be a sound signal sent to a speaker, or a digitally
coded command that changes the speed of a motor, opens a
valve, or causes a robot to move in a specified manner.
Accessing I/O devices

Processor Memory

Bus

I/O device 1 I/O device n

•Multiple I/O devices may be connected to the processor and the memory via a
single bus. Bus will helpful to exchange information.
•Bus consists of three sets of lines to carry address, data and control signals.
•Each I/O device is assigned an unique set of addresses.
•To access an I/O device, the processor places the address on the address lines.
•The device recognizes the address, and responds to the commands issued on
the control lines.
Accessing I/O devices (contd..)
 I/O devices and the memory may share the same
address space:
 The arrangement is called as Memory-mapped I/O.
 Any machine instruction that can access memory can be used to
transfer data to or from an I/O device.
Move DATAIN, R0
Move R0, DATAOUT
 I/O devices and the memory may have different
address spaces:
 Special instructions to transfer data to and from I/O devices.
 I/O devices may have to deal with fewer address lines.
 I/O address lines need not be physically separate from memory
address lines.
 In fact, address lines may be shared between I/O devices and memory,
with a control signal to indicate whether it is a memory address or an
I/O address.
Accessing I/O devices (contd..)
Address lines
Bus Data lines
Control lines

Address Control Data and I/O


decoder circuits status registers interface

Input device

•I/O device is connected to the bus using an I/O interface circuit which has:
- Address decoder, control circuit, and data and status registers.
•Address decoder decodes the address placed on the address lines thus enabling the
device to recognize its address.
•Data register holds the data being transferred to or from the processor.
•Status register holds information necessary for the operation of the I/O device.
•Data and status registers are connected to the data lines, and have unique
addresses.
•I/O interface circuit coordinates I/O transfers.
Accessing I/O devices (contd..)
• When a human operator is entering characters at a keyboard,
the processor is capable of executing millions of instructions
between successive character entries.
• Striking a key stores the corresponding character code in an 8-
bit buffer register DATAIN.
• To inform the processor that a valid character is in DATAIN, a
status control flag, SIN, is set to 1.
• A program monitors SIN, and when SIN is set to 1, the
processor reads the contents of DATAIN.
• When the character is transferred to the processor, SIN is
automatically cleared to 0.
• If a second character is entered at the keyboard, SIN is again
set to 1 and the process repeats.
Accessing I/O devices (contd..)
• A buffer register, DATAOUT, and a status control flag, SOUT,
are used for this transfer.
• When SOUT equals 1, the display is ready to receive a
character.
• Under program control, the processor monitors SOUT, and
when SOUT is set to 1, the processor transfers a character
code to DATAOUT.
• The transfer of a character to DATAOUT clears SOUT to 0.
• When the display device is ready to receive a second
character, SOUT is again set to 1.
• The buffer registers DATAIN and DATAOUT and the status
flags SIN and SOUT are part of circuitry commonly known as
a device interface.
Registers in keyboard
and display interface

Program that reads one


line from the keyboard,
stores it in the memory
buffer, and echoes it back
to the display.
Accessing I/O devices (contd..)
 Recall that the rate of transfer to and from I/O
devices is slower than the speed of the processor.
This creates the need for mechanisms to synchronize
data transfers between them.
 Program-controlled I/O:
 Processor repeatedly monitors a status flag to achieve the
necessary synchronization.
 Processor polls the I/O device.
 Two other mechanisms used for synchronizing data
transfers between the processor and memory:
 Interrupts.
 Direct Memory Access.
Interrupts
• In program-controlled I/O, when the processor
continuously monitors the status of the device, it
does not perform any useful tasks.
• An alternate approach would be for the I/O device to
alert the processor when it becomes ready.
• Do so by sending a hardware signal called an interrupt to
the processor.
• At least one of the bus control lines, called an interrupt-
request line is dedicated for this purpose.
• Processor can perform other useful tasks while it is
waiting for the device to be ready.
Interrupts (contd..)
Program 1 Program 2
Compute routine Print routine
Interrupt Service routine
1
2

Interrupt
occurs i
here
i+1

• Processor is executing the instruction located at address i when an interrupt occurs.


• Routine executed in response to an interrupt request is called the interrupt-service
routine.
• When an interrupt occurs, control must be transferred to the interrupt service
routine.
• But before transferring control, the current contents of the PC (i+1), must be saved in
a known location.
• This will enable the return-from-interrupt instruction to resume execution at i+1.
• Return address, or the contents of the PC are usually stored on the processor stack.
Interrupts (contd..)
 Treatment of an interrupt-service routine is very
similar to that of a subroutine.
 However there are significant differences:
 A subroutine performs a task that is required by the calling program.
 Interrupt-service routine may not have anything in common with
the program it interrupts.
 Interrupt-service routine and the program that it interrupts may
belong to different users.
 As a result, before branching to the interrupt-service routine, not
only the PC, but other information such as condition code flags, and
processor registers used by both the interrupted program and the
interrupt service routine must be stored.
 This will enable the interrupted program to resume execution upon
return from interrupt service routine.
Interrupts (contd..)
 Saving and restoring information can be done
automatically by the processor or explicitly by program
instructions.
 Saving and restoring registers involves memory transfers:
 Increases the total execution time.
 Increases the delay between the time an interrupt request is received, and
the start of execution of the interrupt-service routine. This delay is called
interrupt latency.
 In order to reduce the interrupt latency, most processors
save only the minimal amount of information:
 This minimal amount of information includes Program Counter and
processor status registers.
 Any additional information that must be saved, must be
saved explicitly by the program instructions at the
beginning of the interrupt service routine.
Interrupts (contd..)
• When a processor receives an interrupt-request, it
must branch to the interrupt service routine.
• It must also inform the device that it has recognized
the interrupt request.
• This can be accomplished in two ways:
• Some processors have an explicit interrupt-acknowledge control
signal for this purpose.
• In other cases, the data transfer that takes place between the device
and the processor can be used to inform the device.
Interrupts (contd..)
INTERRUPT HARDWARE
• An I/O device requests an interrupt by activating a bus-line called interrupt-
request(IR).
• A single IR line can be used to serve ‘n’ devices. All devices are connected to IR line
via switches to ground.
• To request an interrupt, a device closes its associated switch. Thus, if all IR signals are
inactive(i.e. all switches are open), the voltage on the IR line will be equal to Vdd.
• When a device requests an interrupt by closing its switch, the voltage on the line drops
to 0, causing the INTR received by the processor to goto 1.
• The value of INTR is the logical OR of the requests from individual devices
INTR=INTR1+ INTR2+ . . . . . +INTRn

• A special gate known as open-collector or


open-drain are used to drive the INTR
line.
• Resistor R is called a pull-up resistor
because it pulls the line voltage up to the
high-voltage state when the switches are
open.
Interrupts (contd..)
 Interrupt-requests interrupt the execution of a program,
and may alter the intended sequence of events:
 Sometimes such alterations may be undesirable, and must not be
allowed.
 For example, the processor may not want to be interrupted by the
same device while executing its interrupt-service routine.
 Processors generally provide the ability to enable and
disable such interruptions as desired.
 One simple way is to provide machine instructions such
as Interrupt-enable and Interrupt-disable for this purpose.
 To avoid interruption by the same device during the
execution of an interrupt service routine:
 First instruction of an interrupt service routine can be Interrupt-disable.
 Last instruction of an interrupt service routine can be Interrupt-enable.
Interrupts (contd..)
 Multiple I/O devices may be connected to the processor
and the memory via a bus. Some or all of these devices
may be capable of generating interrupt requests.
 Each device operates independently, and hence no definite order
can be imposed on how the devices generate interrupt requests?
 How does the processor know which device has
generated an interrupt?
 How does the processor know which interrupt service
routine needs to be executed?
 When the processor is executing an interrupt service
routine for one device, can other device interrupt the
processor?
 If two interrupt-requests are received simultaneously,
then how to break the tie?
Interrupts (contd..)
 Consider a simple arrangement where all devices send
their interrupt-requests over a single control line in the
bus.
 When the processor receives an interrupt request over
this control line, how does it know which device is
requesting an interrupt?
 This information is available in the status register of the
device requesting an interrupt:
 The status register of each device has an IRQ bit which it sets to
1 when it requests an interrupt.
 Interrupt service routine can poll the I/O devices
connected to the bus. The first device with IRQ equal to 1
is the one that is serviced.
 Polling mechanism is easy, but time consuming to query
the status bits of all the I/O devices connected to the bus.
Interrupts (contd..)
• The device requesting an interrupt may identify itself
directly to the processor.
• Device can do so by sending a special code (4 to 8 bits) the
processor over the bus.
• Code supplied by the device may represent a part of the
starting address of the interrupt-service routine.
• The remainder of the starting address is obtained by the
processor based on other information such as the range of
memory addresses where interrupt service routines are
located.
• Usually the location pointed to by the interrupting
device is used to store the starting address of the
interrupt-service routine.
Interrupts (contd..)
 Previously, before the processor started executing
the interrupt service routine for a device, it disabled
the interrupts from the device.
 In general, same arrangement is used when multiple
devices can send interrupt requests to the processor.
 During the execution of an interrupt service routine of device,
the processor does not accept interrupt requests from any other
device.
 Since the interrupt service routines are usually short, the delay
that this causes is generally acceptable.
 However, for certain devices this delay may not be
acceptable.
 Which devices can be allowed to interrupt a processor when it is
executing an interrupt service routine of another device?
Interrupts (contd..)
• I/O devices are organized in a priority structure:
• An interrupt request from a high-priority device is accepted
while the processor is executing the interrupt service routine of
a low priority device.
• A priority level is assigned to a processor that can be
changed under program control.
• Priority level of a processor is the priority of the program that is
currently being executed.
• When the processor starts executing the interrupt service
routine of a device, its priority is raised to that of the device.
• If the device sending an interrupt request has a higher priority
than the processor, the processor accepts the interrupt
request.
Interrupts (contd..)
• Processor’s priority is encoded in a few bits of the
processor status register.
• Priority can be changed by instructions that write into the
processor status register.
• Usually, these are privileged instructions, or instructions that
can be executed only in the supervisor mode.
• Privileged instructions cannot be executed in the user mode.
• Prevents a user program from accidentally or intentionally
changing the priority of the processor.
• If there is an attempt to execute a privileged
instruction in the user mode, it causes a special type
of interrupt called as privilege exception.
Interrupts (contd..)
IN T R 1 INTR p
Processor

Device 1 Device 2 Device p

INTA1 INTA p

Priority arbitration

• Each device has a separate interrupt-request and interrupt-


acknowledge line.
• Each interrupt-request line is assigned a different priority level.
• Interrupt requests received over these lines are sent to a priority
arbitration circuit in the processor.
• If the interrupt request has a higher priority level than the priority
of the processor, then the request is accepted.
Interrupts (contd..)
 Which interrupt request does the processor accept if
it receives interrupt requests from two or more
devices simultaneously?.
 If the I/O devices are organized in a priority
structure, the processor accepts the interrupt
request from a device with higher priority.
 Each device has its own interrupt request and interrupt
acknowledge line.
 A different priority level is assigned to the interrupt request line
of each device.
 However, if the devices share an interrupt request
line, then how does the processor decide which
interrupt request to accept?
Interrupts (contd..)
Polling scheme:
• If the processor uses a polling mechanism to poll the status registers of I/O
devices to determine which device is requesting an interrupt.
• In this case the priority is determined by the order in which the devices are polled.
• The first device with status bit set to 1 is the device whose interrupt request is
accepted.
Daisy chain scheme:
INTR
Processor

Device 1 Device 2 Device n


INTA

• Devices are connected to form a daisy chain.


• Devices share the interrupt-request line, and interrupt-acknowledge line is
connected to form a daisy chain.
• When devices raise an interrupt request, the interrupt-request line is activated.
• The processor in response activates interrupt-acknowledge.
• Received by device 1, if device 1 does not need service, it passes the signal to device 2.
• Device that is electrically closest to the processor has the highest priority.
Interrupts (contd..)
• When I/O devices were organized into a priority structure, each device had its own
interrupt-request and interrupt-acknowledge line.
• When I/O devices were organized in a daisy chain fashion, the devices shared an
interrupt-request line, and the interrupt-acknowledge propagated through the
devices.
• A combination of priority structure and daisy chain scheme can also used.
INT R 1

Device Device
INTA1
Processor

IN T R p

Device Device
INTA p
Priority arbitration
circuit

• Devices are organized into groups.


• Each group is assigned a different priority level.
• All the devices within a single group share an interrupt-request line, and are
connected to form a daisy chain.
Interrupts (contd..)
 Only those devices that are being used in a program
should be allowed to generate interrupt requests.
 To control which devices are allowed to generate
interrupt requests, the interface circuit of each I/O device
has an interrupt-enable bit.
 If the interrupt-enable bit in the device interface is set to 1, then
the device is allowed to generate an interrupt-request.
 Interrupt-enable bit in the device’s interface circuit
determines whether the device is allowed to generate an
interrupt request.
 Interrupt-enable bit in the processor status register or
the priority structure of the interrupts determines
whether a given interrupt will be accepted.
Direct Memory Access
 Direct Memory Access (DMA):
 A special control unit may be provided to transfer a block of data
directly between an I/O device and the main memory, without
continuous intervention by the processor.
 Control unit which performs these transfers is a part
of the I/O device’s interface circuit. This control unit is
called as a DMA controller.
 DMA controller performs functions that would be
normally carried out by the processor:
 For each word, it provides the memory address and all the control
signals.
 To transfer a block of data, it increments the memory addresses
and keeps track of the number of transfers.
Direct Memory Access (contd..)
 DMA controller can transfer a block of data from an
external device to the processor, without any
intervention from the processor.
 However, the operation of the DMA controller must be
under the control of a program executed by the processor.
That is, the processor must initiate the DMA transfer.
 To initiate the DMA transfer, the processor informs
the DMA controller of:
 Starting address,
 Number of words in the block.
 Direction of transfer (I/O device to the memory, or memory to the
I/O device).
 Once the DMA controller completes the DMA transfer,
it informs the processor by raising an interrupt signal.
Registers in a DMA interface
• Two registers are used for storing the starting address
and the word count.
• The third register contains status and control flags.
• The R/W bit determines the direction of the transfer.
Registers in a DMA interface
• When R/W bit is set to 1 by a program instruction, the
controller performs a read operation, that is, it
transfers data from the memory to the I/O device.
Otherwise, it performs a write operation.
• When the controller has completed transferring a block
of data and is ready to receive another command, it
sets the Done flag to 1.
• Bit 30 is the Interrupt-enable flag, IE. When this flag is
set to 1, it causes the controller to raise an interrupt
after it has completed transferring a block of data.
• Finally, the controller sets the IRQ bit to 1 when it has
requested an interrupt.
Direct Memory Access
Main
Processor
memory

System bus

Disk/DMA DMA Keyboard


controller controller Printer

Disk Disk Network


Interface

• DMA controller connects a high-speed network to the computer bus.


• Disk controller, which controls two disks also has DMA capability. It provides two
DMA channels.
• It can perform two independent DMA operations, as if each disk has its own DMA
controller. The registers to store the memory address, word count and status and
control information are duplicated.
Direct Memory Access (contd..)
 Processor and DMA controllers have to use the bus in an
interwoven fashion to access the memory.
 DMA devices are given higher priority than the processor to access the
bus.
 Among different DMA devices, high priority is given to high-speed
peripherals such as a disk or a graphics display device.
 Processor originates most memory access cycles on the
bus.
 DMA controller can be said to “steal” memory access cycles from the
bus. This interweaving technique is called as “cycle stealing”.
 An alternate approach is the provide a DMA controller an
exclusive capability to initiate transfers on the bus, and
hence exclusive access to the main memory. This is
known as the block or burst mode.
Bus arbitration
 Processor and DMA controllers both need to initiate data
transfers on the bus and access main memory.
 The device that is allowed to initiate transfers on the bus
at any given time is called the bus master.
 When the current bus master relinquishes its status as
the bus master, another device can acquire this status.
 The process by which the next device to become the bus master is
selected and bus mastership is transferred to it is called bus
arbitration.
 Centralized arbitration:
 A single bus arbiter performs the arbitration.
 Distributed arbitration:
 All devices participate in the selection of the next bus master.
Centralized Bus Arbitration

B BS Y

BR

Processor

DMA DMA
controller controller
BG1 1 BG2 2
Centralized Bus Arbitration
• Bus arbiter may be the processor or a separate unit
connected to the bus.
• Normally, the processor is the bus master, unless it grants
bus membership to one of the DMA controllers.
• DMA controller requests the control of the bus by
asserting the Bus Request (BR) line.
• In response, the processor activates the Bus-Grant1 (BG1)
line, indicating that the controller may use the bus when
it is free.
• BG1 signal is connected to all DMA controllers in a daisy
chain fashion.
• BBSY signal is 0, it indicates that the bus is busy. When
BBSY becomes 1, the DMA controller which asserted BR
can acquire control of the bus.
Centralized arbitration
DMA controller 2
asserts the BR signal. Time
Processor asserts
BR
the BG1 signal

BG1 BG1 signal propagates


to DMA#2.
BG2

B BS Y

Bus
master
Processor DMA controller 2 Processor

Processor relinquishes control


of the bus by setting BBSY to 1.
Distributed arbitration
 All devices waiting to use the bus share the
responsibility of carrying out the arbitration process.
 Arbitration process does not depend on a central arbiter and hence
distributed arbitration has higher reliability.
 Each device is assigned a 4-bit ID number.
 All the devices are connected using 5 lines, 4 arbitration
lines to transmit the ID, and one line for the Start-
Arbitration signal.
 To request the bus a device:
 Asserts the Start-Arbitration signal.
 Places its 4-bit ID number on the arbitration lines.
 The pattern that appears on the arbitration lines is the
logical-OR of all the 4-bit device IDs placed on the
arbitration lines.
Distributed arbitration
Distributed arbitration
• Arbitration process:
• Each device compares the pattern that appears
on the arbitration lines to its own ID, starting
with MSB.
• If it detects a difference, it transmits 0s on the
arbitration lines for that and all lower bit
positions.
• The pattern that appears on the arbitration lines
is the logical-OR of all the 4-bit device IDs placed
on the arbitration lines.
Distributed arbitration
•Device A has the ID 5 and wants to request the bus:
- Transmits the pattern 0101 on the arbitration lines.
•Device B has the ID 6 and wants to request the bus:
- Transmits the pattern 0110 on the arbitration lines.
•Pattern that appears on the arbitration lines is the logical OR of the patterns:
- Pattern 0111 appears on the arbitration lines.

Arbitration process:
•Each device compares the pattern that appears on the arbitration lines to its
own ID, starting with MSB.
•If it detects a difference, it transmits 0s on the arbitration lines for that and
all lower bit positions.
•Device A compares its ID 5 with a pattern 0101 to pattern 0111.
•It detects a difference at bit position 0, as a result, it transmits a pattern
0100 on the arbitration lines.
•The pattern that appears on the arbitration lines is the logical-OR of 0100
and 0110, which is 0110.
•This pattern is the same as the device ID of B, and hence B has won the
arbitration.
Speed, Size, and Cost
 A big challenge in the design of a computer system is to
provide a sufficiently large memory, with a reasonable
speed at an affordable cost.
 Static RAM:
▪ Very fast, but expensive, because a basic SRAM cell has a complex
circuit making it impossible to pack a large number of cells onto a
single chip.
 Dynamic RAM:
▪ Simpler basic cell circuit, hence are much less expensive, but
significantly slower than SRAMs.
 Magnetic disks:
▪ Storage provided by DRAMs is higher than SRAMs, but is still less
than what is necessary.
▪ Secondary storage such as magnetic disks provide a large amount
of storage, but is much slower than DRAMs.
Memory Hierarchy
Pr ocessor •Fastest access is to the data held in
processor registers. Registers are at the
Re gisters top of the memory hierarchy.
Increasing Increasing Increasing •Relatively small amount of memory
size speed cost per bit
Primary L1 that can be implemented on the
cache
processor chip. This is processor cache.
•Two levels of cache. Level 1 (L1) cache
is on the processor chip. Level 2 (L2)
Secondary L2 cache is in between main memory and
cache
processor.
•Next level is main memory,
implemented as SIMMs. Much larger,
Main
memory but much slower than cache memory.
•Next level is magnetic disks. Huge
amount of inexepensive storage.
•Speed of memory access is critical, the
Magnetic disk
secondary idea is to bring instructions and data
memory that will be used in the near future as
close to the processor as possible.
Cache Memories
 Processor is much faster than the main memory.
▪ As a result, the processor has to spend much of its time
waiting while instructions and data are being fetched from
the main memory.
▪ Major obstacle towards achieving good performance.
 Speed of the main memory cannot be increased
beyond a certain point.
 Cache memory is an architectural arrangement
which makes the main memory appear faster to
the processor than it really is.
 Cache memory is based on the property of
computer programs known as “locality of
reference”.
Locality of Reference
 Analysis of programs indicates that many instructions
in localized areas of a program are executed
repeatedly during some period of time, while the
others are accessed relatively less frequently.
▪ These instructions may be the ones in a loop, nested loop
or few procedures calling each other repeatedly.
▪ This is called “locality of reference”.
 Temporal locality of reference:
▪ Recently executed instruction is likely to be executed again
very soon.
 Spatial locality of reference:
▪ Instructions with addresses close to a recently instruction
are likely to be executed soon.
Cache memories

Main
Processor Cache memory

• Processor issues a Read request, a block of words is transferred


from the main memory to the cache, one word at a time.
• Subsequent references to the data in this block of words are
found in the cache.
• At any given time, only some blocks in the main memory are held
in the cache. Which blocks in the main memory are in the cache
is determined by a “mapping function”.
• When the cache is full, and a block of words needs to be
transferred from the main memory, some block of words in the
cache must be replaced. This is determined by a “replacement
algorithm”.
Cache hit
• Existence of a cache is transparent to the processor. The
processor issues Read and Write requests in the same
manner.
• If the data is in the cache it is called a Read or Write hit.
• Read hit:
▪ The data is obtained from the cache.
• Write hit:
▪ Cache has a replica of the contents of the main memory.
▪ Contents of the cache and the main memory may be updated
simultaneously. This is the write-through protocol.
▪ Update the contents of the cache, and mark it as updated by
setting a bit known as the dirty bit or modified bit. The contents
of the main memory are updated when this block is replaced.
This is write-back or copy-back protocol.
Cache miss
• If the data is not present in the cache, then a Read miss or
Write miss occurs.
• Read miss:
▪ Block of words containing this requested word is transferred from the
memory.
▪ After the block is transferred, the desired word is forwarded to the
processor.
▪ The desired word may also be forwarded to the processor as soon as
it is transferred without waiting for the entire block to be
transferred. This is called load-through or early-restart.
• Write-miss:
▪ Write-through protocol is used, then the contents of the main
memory are updated directly.
▪ If write-back protocol is used, the block containing the
addressed word is first brought into the cache. The desired word
is overwritten with new information.
Cache Coherence Problem
• A bit called as “valid bit” is provided for each block.
• If the block contains valid data, then the bit is set to 1, else it is 0.
• Valid bits are set to 0, when the power is just turned on.
• When a block is loaded into the cache for the first time, the valid
bit is set to 1.
• Data transfers between main memory and disk occur directly
bypassing the cache.
• When the data on a disk changes, the main memory block is also
updated.
• However, if the data is also resident in the cache, then the valid bit
is set to 0.
• What happens if the data in the disk and main memory changes
and the write-back protocol is being used?
• In this case, the data in the cache may also have changed and is
indicated by the dirty bit.
• The copies of the data in the cache, and the main memory are
different. This is called the cache coherence problem.
• One option is to force a write-back before the main memory is
updated from the disk.
Mapping functions
 Mapping functions determine how memory blocks
are placed in the cache.
 A simple processor example:
▪ Cache consisting of 128 blocks of 16 words each.
▪ Total size of cache is 2048 (2K) words.
▪ Main memory is addressable by a 16-bit address.
▪ Main memory has 64K words.
▪ Main memory has 4K blocks of 16 words each.
 Three mapping functions:
▪ Direct mapping
▪ Associative mapping
▪ Set-associative mapping.
Direct mapping
Main
memory Block 0 •Block j of the main memory maps to j modulo 128
Cache Block 1 of the cache. 0 maps to 0, 129 maps to 1.
tag •More than one memory block is mapped onto the
Block 0
same position in the cache.
tag
Block 1 •May lead to contention for cache blocks even if the
Block 127 cache is not full.
•Resolve the contention by allowing new block to
Block 128
replace the old block, leading to a trivial
tag
Block 127 Block 129 replacement algorithm.
•Memory address is divided into three fields:
- Low order 4 bits determine one of the 16 words
in a block.
Tag Block Word
Block 255 - When a new block is brought into the cache, the
5 7 4 Block 256 next 7 bits determine which cache block this new
Block 257
block is placed in.
Main memory address
- High order 5 bits determine which of the
possible 32 blocks is currently present in the
cache. These are tag bits.
Block 4095 •Simple to implement but not very flexible.
Associative mapping
Main Block 0
memory •Main memory block can be placed into any
Cache Block 1 cache position.
tag
Block 0 •Memory address is divided into two fields:
tag
Block 1
- Low order 4 bits identify the word within a
block.
Block 127
- High order 12 bits or tag bits identify a
Block 128 memory block when it is resident in the
tag
Block 127 Block 129 cache.
•Flexible, and uses cache space efficiently.
•Replacement algorithms can be used to
replace an existing block in the cache when
Block 255
Tag Word the cache is full.
12 4 Block 256 •Cost is higher than direct-mapped cache
Main memory address Block 257 because of the need to search all 128
patterns to determine whether a given block
is in the cache.

Block 4095
Set-Associative mapping
Cache
Main
memory
Block 0 Blocks of cache are grouped into sets.
Block 1 Mapping function allows a block of the main
tag
tag
Block 1
Block 0
memory to reside in any block of a specific set.
tag Block 2 Divide the cache into 64 sets, with two blocks per
set.
Block 63 Memory block 0, 64, 128 etc. map to block 0, and
tag Block 3
Block 64
they can occupy either of the two positions.
tag Memory address is divided into three fields:
Block 126 Block 65
- 6 bit field determines the set number.
- High order 6 bit fields are compared to the
tag
Block 127 tag fields of the two blocks in a set.
Set-associative mapping combination of direct
Block 127
T ag Block Word and associative mapping.
Block 128
5 7 4 Number of blocks per set is a design parameter.
Main memory address Block 129 - One extreme is to have all the blocks in one
set, requiring no set bits (fully associative
mapping).
- Other extreme is to have one block per set, is
Block 4095 the same as direct mapping.

You might also like