0% found this document useful (0 votes)
2 views

Module-4-23CS302

Direct Memory Access (DMA) allows data transfer between I/O devices and main memory without continuous CPU intervention, using a DMA controller that manages memory addresses and control signals. The processor initiates DMA transfers by providing the starting address, word count, and transfer direction, while the DMA controller can operate in cycle stealing or block transfer modes. Bus arbitration methods, including centralized and distributed arbitration, determine which device can access the bus, ensuring efficient memory access and performance optimization through techniques like cache memory and interleaving.

Uploaded by

nandanhs.146
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module-4-23CS302

Direct Memory Access (DMA) allows data transfer between I/O devices and main memory without continuous CPU intervention, using a DMA controller that manages memory addresses and control signals. The processor initiates DMA transfers by providing the starting address, word count, and transfer direction, while the DMA controller can operate in cycle stealing or block transfer modes. Bus arbitration methods, including centralized and distributed arbitration, determine which device can access the bus, ensuring efficient memory access and performance optimization through techniques like cache memory and interleaving.

Uploaded by

nandanhs.146
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Direct Memory Access

Direct Memory Access


⚫Direct Memory Access (DMA):
⚫A special control unit may be provided to transfer a block of data
directly between an I/O device and the main memory, without
continuous intervention by the processor.
⚫Control unit which performs these transfers is a part of the I/O device’s
interface circuit. This control unit is called as a DMA controller.
⚫DMA controller performs functions that would be normally carried out
by the processor:
⚫For each word, it provides the memory address and all the control
signals.
⚫To transfer a block of data, it increments the memory addresses
and keeps track of the number of transfers.
Direct Memory Access (contd..)
⚫However, the operation of the DMA controller must be under the
control of a program executed by the processor. That is, the
processor must initiate the DMA transfer.
⚫To initiate the DMA transfer, the processor informs the
DMA controller of:
⚫Starting address,
⚫Number of words in the block.
⚫Direction of transfer (I/O device to the memory, or memory to the
I/O device).
⚫Once the DMA controller completes the DMA transfer, it
informs the processor by raising an interrupt signal.
Registers in DMA interface
Status and control
31 30 1 0

IRQ IE R/W done

Starting address

Word count
Direct Memory Access
Mai
Process n
memor
or
y
System bus

Disk/DMA DMA Keyboard


controlle controlle Printe
r r r

Dis Dis Network


k k Interface
DMA Controller
• A hardwired controller called the DMA controller can enable direct
data transfer between I/O device (e.g. disk) and memory without CPU
intervention.
– No need to execute instructions to carry out data transfer.
– Maximum data transfer speed will be determined by the rate with
which memory read and write operaions can be carried out.
– Much faster than programmed I/O.
⚫Processor and DMA controllers have to use the bus in an
interleaving fashion to access the memory.
⚫DMA devices are given higher priority than the processor to access
the bus.
⚫Among different DMA devices, high priority is given to high-speed
peripherals such as a disk or a graphics display device.
⚫Processor originates most memory access cycles on the bus.
⚫DMA controller can be said to “steal” memory access cycles from
the processor. This interweaving technique is called as “cycle
stealing”.
⚫An alternate approach is the provide a DMA controller an exclusive
capability to initiate transfers on the bus, and hence exclusive access
to the main memory. This is known as the block or burst mode.
• DMA transfer can take place in two modes:
• a) DMA cycle stealing
• • The DMA controller requests for the for a few cycles 1 or 2.
• • Preferably when the CPU is not using memory.
• • DMA controller is said to steal cycles from the CPU without the
CPU knowing it.
• b) DMA block transfer
• • The DMA controller transfers the whole block of data without
interrupion.
Bus arbitration
⚫The device that is allowed to initiate transfers on the bus at any
given time is called the bus master.
⚫When the current bus master relinquishes its status as the bus
master, another device can acquire this status.
⚫The process by which the next device to become the bus master is
selected and bus mastership is transferred to it is called bus arbitration.
⚫Centralized arbitration:
⚫A single bus arbiter performs the arbitration.
⚫Distributed arbitration:
⚫All devices participate in the selection of the next bus master.
Centralized Bus Arbitration(cont.,)
• Bus arbiter may be the processor or a separate unit connected to the bus.

• DMA controller requests the control of the bus by asserting the Bus Request (BR) line.

• In response, the processor activates the Bus-Grant1 (BG1) line, indicating that the
controller may use the bus when it is free.

• BG1 signal is connected to all DMA controllers in a daisy chain fashion.

• BBSY signal is 0, it indicates that the bus is busy. When BBSY becomes 1, the DMA
controller which asserted BR can acquire control of the bus.
Centralized Bus Arbitration

B BS Y

BR

Processo
r

DMA DMA
controller controller
BG1 1 BG2 2
Distributed arbitration
⚫All devices waiting to use the bus share the responsibility of
carrying out the arbitration process.
⚫Each device is assigned a 4-bit ID number.
⚫All the devices are connected using 5 lines, 4 arbitration lines
to transmit the ID, and one line for the Start-Arbitration signal.
⚫To request the bus a device:
⚫Asserts the Start-Arbitration signal.
⚫Places its 4-bit ID number on the arbitration lines.

⚫The pattern that appears on the arbitration lines is the logical-


OR of all the 4-bit device IDs placed on the arbitration lines.
Distributed arbitration
Distributed arbitration (contd..)
• Device A has the ID 5 and wants to request the bus:
- Transmits the pattern 0101 on the arbitration lines.
• Device B has the ID 6 and wants to request the bus:
- Transmits the pattern 0110 on the arbitration lines.
• Pattern that appears on the arbitration lines is the logical OR of the
patterns:
- Pattern 0111 appears on the arbitration lines.

Arbitration process:
• Each device compares the pattern that appears on the arbitration lines to its own
ID, starting with MSB.
• If it detects a difference, it transmits 0s on the arbitration lines for that and all lower
bit positions.
• Device A compares its ID 5 with a pattern 0101 to pattern 0111.
• It detects a difference at bit position 2, as a result, it transmits a pattern 0100 on the
arbitration lines.
• The pattern that appears on the arbitration lines is the logical-OR of 0100 and 0110,
which is 0110.
• This pattern is the same as the device ID of B, and hence B has won the arbitration.
Fundamental Concepts

The Memory System


Some basic concepts
Some basic concepts(Contd.,)

Memory Access Times: -


It is a useful measure of the speed of the memory unit. It is the time that
elapses between the initiation of an operation and the completion of that
operation (for example, the time between READ and MFC).
Memory Cycle Time :-
It is an important measure of the memory system. It is the minimum time
delay required between the initiations of two successive memory operations
(for example, the time between two successive READ operations). The cycle
time is usually slightly longer than the access time.
Several techniques to increase the effective
size and speed of the memory:
▪ Cache memory (to increase the effective speed).
▪ Virtual memory (to increase the effective size).
Cache Memories
◼ Processor is much faster than the main memory.
▪ As a result, the processor has to spend much of its time waiting while instructions
and data are being fetched from the main memory.
◼ Cache memory is an architectural arrangement
which makes the main memory appear faster to
the processor than it really is.
◼ Cache memory is based on the property of
computer programs known as “locality of
reference”.
Locality of Reference
◼ Analysis of programs indicates that many
instructions in localized areas of a program are
executed repeatedly during some period of time,
while the others are accessed relatively less
frequently.
▪ These instructions may be the ones in a loop, nested loop or few procedures
calling each other repeatedly.
▪ This is called “locality of reference”.
◼ Temporal locality of reference:
▪ Recently executed instruction is likely to be executed again very soon.
◼ Spatial locality of reference:
▪ Instructions with addresses close to a recently instruction are likely
to be executed soon.
Cache memories

Mai
Proces Cac memo
n
sor he ry

● Processor issues a Read request, a block of words is transferred from the


main memory to the cache, one word at a time.
● Subsequent references to the data in this block of words are found in the
cache.
● At any given time, only some blocks in the main memory are held in the
cache. Which blocks in the main memory are in the cache is determined by
a “mapping function”.
● When the cache is full, and a block of words needs to be transferred
from the main memory, some block of words in the cache must be
replaced. This is determined by a “replacement algorithm”.
Cache hit
● Existence of a cache is transparent to the processor. The processor issues
Read and
Write requests in the same manner.
● If the data is in the cache it is called a Read or Write hit.
● Read hit:
▪ The data is obtained from the cache.

● Write hit:
▪ Cache has a replica of the contents of the main memory.
▪ Contents of the cache and the main memory may be updated simultaneously.
This is the write-through protocol.
▪ Update the contents of the cache, The contents of the main memory are updated
when this block is replaced. This is write-back or copy-back protocol.
Cache miss
● If the data is not present in the cache, then a Read miss or Write miss
occurs.
● Read miss:
▪ Block of words containing this requested word is transferred from the
memory.
▪ After the block is transferred, the desired word is forwarded to the processor.
▪ The desired word may also be forwarded to the processor as soon as it is
transferred without waiting for the entire block to be transferred. This is called
load-through or early-restart.

● Write-miss:
▪ Write-through protocol is used, then the contents of the main memory are
updated directly.
▪ If write-back protocol is used, the block containing the
addressed word is first brought into the cache. The desired word
is overwritten with new information.
Mapping functions

◼ Mapping functions determine how memory


blocks are placed in the cache.
◼ A simple processor example:
▪ Cache consisting of 128 blocks of 16 words each.
▪ Total size of cache is 2048 (2K) words.
▪ Main memory is addressable by a 16-bit address.
▪ Main memory has 64K words.
▪ Main memory has 4K blocks of 16 words each.
◼ Three mapping functions:
▪ Direct mapping
▪ Associative mapping
▪ Set-associative mapping.
Direct mapping
Mai
memor
n
Block 0 •Block j of the main memory maps to j modulo 128 of
y
Block 1
the cache. 0 maps to 0, 129 maps to 1.
Cach
ta e •More than one memory block is mapped onto the same
g Block position in the cache.
0
ta
Block •May lead to contention for cache blocks even if the
g
1 cache is not full.
Block 127
•Resolve the contention by allowing new block to
Block 128 replace the old block, leading to a trivial replacement
ta
Block Block 129
algorithm.
g
127 •Memory address is divided into three fields:
- Low order 4 bits determine one of the 16
words in a block.
- When a new block is brought into the cache,
Tag Block 255
Block Word the the next 7 bits determine which cache
5 7 4 Block 256 block this new block is placed in.
- High order 5 bits determine which of the possible
Main memory Block 257
address
32 blocks is currently present in the cache. These
are tag bits.
•Simple to implement but not very flexible.
Block 4095
Associative mapping
Mai Block 0
memor
n
y
Block 1
•Main memory block can be placed into any cache
Cach
ta e position.
g Block •Memory address is divided into two fields:
0
ta
Block - Low order 4 bits identify the word within a block.
g
1 - High order 12 bits or tag bits identify a memory
Block 127
block when it is resident in the cache.
Block 128 •Flexible, and uses cache space efficiently.
ta
Block 129
•Replacement algorithms can be used to replace an
Block
g existing block in the cache when the cache is full.
127
•Cost is higher than direct-mapped cache because of
the need to search all 128 patterns to determine
whether a given block is in the cache.
Block 255
Tag Word
12 4 Block 256

Main memory Block 257


address

Block 4095
Performance considerations

The Memory System


Performance considerations
Performance considerations
◼ A key design objective of a computer system is to achieve
the best possible performance at the lowest possible cost.
▪ Price/performance ratio is a common measure of success.
◼ Performance of a processor depends on:
▪ How fast machine instructions can be brought into the processor for
execution.
▪ How fast the instructions can be executed.
Interleaving

◼ Divides the memory system into a number of


memory modules. Each module has its own address buffer register
(ABR) and data buffer register (DBR).
◼ Arranges addressing so that successive words in
the address space are placed in different
modules.
◼ When requests for memory access involve
consecutive addresses, the access will be to
different modules.
◼ Since parallel access to these modules is
possible, the average rate of fetching words
from the Main Memory can be increased.
Methods of address layouts
k m
bits bits m k
Modul Address in MM bit bit
e module address Address in module Modul MM
s s
e address

AB DB AB DB AB DB AB DB AB DB AB DB
R R R R R R R R R R R R
Modul Modul Modul Modul Modul Modul
k
e 0 e i en- 1 e 0 e i e2 - 1

◼ Consecutive words are placed in a • Consecutive words are located in


module.
◼ High-order k bits of a memory address consecutive modules.
determine the module. • Consecutive addresses can be located in
◼ Low-order m bits of a memory address consecutive modules.
determine the word within a module.
◼ When a block of words is transferred • While transferring a block of data,
from main memory to cache, only one several memory modules can be kept busy
module is busy at a time. at the same time.
What happens on a write?
◼ Cache Write Strategies
◼ 1. Write Through / Store Through
◼ 2. Write Back / Copy Back

1.Write Through / Store Through


Write Through Contd.
◼ Information is written to both the cache block and the
main memory block.
◼ Features: – Easier to implement
◼ – Read misses do not result in writes to the lower level
(i.e. MM).
◼ – The lower level (i.e. MM) has the most updated
version of the data
◼ – important for I/O operations and multiprocessor
systems
◼ – A write buffer is often used to reduce CPU
write stall time while data is written to main memory.
2. Write Back Strategy
◼ Information is written only to the cache block.
◼ • A modified cache block is written to MM only when it is
replaced.
◼ • Features:
◼ – Writes occur at the speed of cache memory.
◼ – Multiple writes to a cache block requires only one write to
MM.
◼ • Write-back cache blocks can be clean or dirty.
◼ – A status bit called dirty bit or modified bit is associated with
each cache block, which indicates whether the block was
modified in the cache (0: clean, 1: dirty).
◼ – If the status is clean, the block is not written back to
MM while being replaced.
Write Back Strategy
Hit Rate and Miss Penalty

◼ Hit rate
◼ Miss penalty
◼ Hit rate can be improved by increasing block size, while
keeping cache size constant
◼ Block sizes that are neither very small nor very large give
best results.
◼ Miss penalty can be reduced if load-through approach is
used when loading new blocks into cache.
Other Performance Enhancements

Write buffer
◼ Write-through:
● Each write operation involves writing to the main memory.
● If the processor has to wait for the write operation to be complete, it slows
down the processor.
● Processor does not depend on the results of the write operation.
● Write buffer can be included for temporary storage of write requests.
● Processor places each write request into the buffer and continues execution.
● If a subsequent Read request references data which is still in the write
buffer, then this data is referenced in the write buffer.
◼ Write-back:
● Block is written back to the main memory when it is replaced.
● If the processor waits for this write to complete, before reading the new
block, it is slowed down.
● Fast write buffer can hold the block to be written, and the new
block can be read first.
Other Performance Enhancements
(Contd.,)
Prefetching
● New data are brought into the processor when they are first
needed.
● Processor has to wait before the data transfer is complete.
● Prefetch the data into the cache before they are actually
needed, or a before a Read miss occurs.
● Prefetching can be accomplished through software by
including a special instruction in the machine language of
the processor.
▪ Inclusion of prefetch instructions increases the length of the
programs.
● Prefetching can also be accomplished using hardware:
▪ Circuitry that attempts to discover patterns in
memory references and then prefetches according
to this pattern.
Arithmetic
Multiplication
Multiplication of unsigned numbers

Product of 2 n-bit numbers is at most a 2n-bit number.


Unsigned multiplication can be viewed as addition of shifted
versions of the multiplicand.
Multiplication of unsigned numbers
(contd..)
 add the partial products at each stage.

 Rules to implement multiplication are:


 If the ith bit of the multiplier is 1, shift the multiplicand and add the
shifted multiplicand to the current value of the partial product.
 Hand over the partial product to the next stage
 Value of the partial product at the start stage is 0.
Combinatorial array multiplier
Combinatorial array multiplier

Multiplicand

0 m3 0 m2 0 m1 0 m0
(PP0)
q0
0
PP1 p0
q1
0
PP2 p1
q2
0
PP3 p2
q3
0
,
p7 p6 p5 p4 p3

Product is: p7,p6,..p0

Multiplicand is shifted by displacing it through an array of adders.


Multiplication of unsigned numbers
Typical multiplication cell

Bit of incoming partial product (PPi)


jth multiplicand bit

ith multiplier bit ith multiplier bit

carry out FA carry in

Bit of outgoing partial product (PP(i+1))


Combinatorial array multiplier
(contd..)
• Combinatorial array multipliers are:
– Extremely inefficient.
– Have a high gate count for multiplying numbers of practical size such as 32-
bit or 64-bit numbers.
– Perform only one function, namely, unsigned integer product.

• Improve gate efficiency by using a mixture of


combinatorial array techniques and
sequential techniques.
Sequential multiplication
• Recall the rule for generating partial
products:
– If the ith bit of the multiplier is 1, add the appropriately shifted multiplicand
to the current partial product.
– Multiplicand has been shifted left when added to the partial product.

• However, adding a left-shifted multiplicand


to an unshifted partial product is equivalent
to adding an unshifted multiplicand to a
right-shifted partial product.
Sequential Circuit Multiplier
Register A (initially 0)

Shift right

C a a q q
n - 1 0 n - 1 0

Multiplier Q
Add/Noadd
control

n-bit
Adder
MUX Control
sequencer

0 0

m m
n - 1 0

Multiplicand M
Sequential multiplication (contd..)
M
1 1 0 1
Initial configuration
0 0 0 0 0 1 0 1 1
C A Q
0 1 1 0 1 1 0 1 1 Add
Shift First cycle
0 0 1 1 0 1 1 0 1

1 0 0 1 1 1 1 0 1 Add
Shift Second cycle
0 1 0 0 1 1 1 1 0

0 1 0 0 1 1 1 1 0 No add
Shift Third cycle
0 0 1 0 0 1 1 1 1

1 0 0 0 1 1 1 1 1 Add
Shift Fourth cycle
0 1 0 0 0 1 1 1 1

Product
Signed Multiplication
Signed Multiplication

• Considering 2’s-complement signed operands, what will happen to (-


13)(+11) if following the same method of unsigned multiplication?

1 0 0 1 1 ( - 13)
0 1 0 1 1 ( + 11)

1 1 1 1 1 1 0 0 1 1

1 1 1 1 1 0 0 1 1
Sign extension is
shown in blue 0 0 0 0 0 0 0 0

1 1 1 0 0 1 1

0 0 0 0 0 0

1 1 0 1 1 1 0 0 0 1 ( - 143 )

Sign extension of negative multiplicand.


Signed Multiplication- Booth Algorithm

• A technique that works equally well for both


negative and positive multiplier
•In the Booth scheme, -1 times the shifted multiplicand
is selected when moving from 0 to 1, and +1 times the
shifted multiplicand is selected when moving from 1 to 0,
as the multiplier is scanned from right to left.

0 0 1 0 1 1 0 0 1 1 1 0 1 0 1 1 0 0

0 + 1 -1 + 1 0 - 1 0 +1 0 0 - 1 +1 - 1 + 1 0 - 1 0 0

Booth recoding of a multiplier.


Booth Algorithm

0 1 1 0 1 ( + 13) 0 1 1 0 1
X1 1 0 1 0 (- 6 ) 0 - 1 +1 - 1 0
0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 0 0 1 1
0 0 0 0 1 1 0 1
1 1 1 0 0 1 1
0 0 0 0 0 0
1 1 1 0 1 1 0 0 1 0 ( - 78 )

Booth multiplication with a negative multiplier.


Booth Algorithm

Multiplier
Version of multiplicand
selected by biti
Bit i Bit i -1

0 0 0 X M
0 1 +1 X M
1 0 −1 X M
1 1 0 X M

Booth multiplier recoding table.


FAST MULTIPLICATION
FAST MULTIPLICATION-Example
Integer Division
Manual Division
21 10101
13 274 1101 100010010
26 1101
14 10000
13 1101
1 1110
1101
1

Longhand division examples.


Longhand Division Steps
• Position the divisor appropriately with
respect to the dividend and performs a
subtraction.
• If the remainder is zero or positive, a
quotient bit of 1 is determined, the
remainder is extended by another bit of the
dividend, the divisor is repositioned, and
another subtraction is performed.
• If the remainder is negative, a quotient bit of
0 is determined, the dividend is restored by
Circuit Arrangement
Shift left

an an-1 a0 qn-1 q0
Dividend Q
A Quotient
Setting

N+1 bit Add/Subtract


adder
Control
Sequencer

0 mn-1 m0

Divisor M

Figure 6.21. Circuit arrangement for binary division.


Restoring Division
• Shift A and Q left one binary position
• Subtract M from A, and place the answer
back in A
• If the sign of A is 1, set q0 to 0 and add M
back to A (restore A); otherwise, set q0 to 1
• Repeat these steps n times
Examples
Initially 0 0 0 0 0 1 0 0 0
0 0 0 1 1
Shift 0 0 0 0 1 0 0 0
Subtract 1 1 1 0 1 First cycle
Set q0 1 1 1 1 0
Restore 1 1
0 0 0 0 1 0 0 0 0
1 0 Shift 0 0 0 1 0 0 0 0
1 1 10 0 0 Subtract 1 1 1 0 1
1 1 Set q0 1 1 1 1 1 Second cycle
Restore 1 1
1 0 0 0 0 1 0 0 0 0 0
Shift 0 0 1 0 0 0 0 0
Subtract 1 1 1 0 1
Set q0 0 0 0 0 1 Third cycle

Shift 0 0 0 1 0 0 0 0 1
Subtract 1 1 1 0 1 0 0 1
Set q0 1 1 1 1 1 Fourth cycle
Restore 1 1
0 0 0 1 0 0 0 1 0

Remainder Quotient

Figure 6.22. A restoring-division example.


Nonrestoring Division
• Avoid the need for restoring A after an
unsuccessful subtraction.
• Any idea?
• Step 1: (Repeat n times)
➢ If the sign of A is 0, shift A and Q left one bit position and
subtract M from A; otherwise, shift A and Q left and add M
to A.
➢ Now, if the sign of A is 0, set q0 to 1; otherwise, set q0 to 0.
• Step2: If the sign of A is 1, add M to A
Examples
Initially 0 0 0 0 0 1 0 0 0
0 0 0 1 1
Shift 0 0 0 0 1 0 0 0 First cycle
Subtract 1 1 1 0 1
Set q 0 1 1 1 1 0 0 0 0 0

Shift 1 1 1 0 0 0 0 0
Add 0 0 0 1 1 Second cycle

Set q 1 1 1 1 1 0 0 0 0
0

Shift 1 1 1 1 0 0 0 0
1 1 1 1 1 Add 0 0 0 1 1 Third cycle
Restore
0 0 0 1 1 Set q 0 0 0 0 1 0 0 0 1
remainder 0
Add 0 0 0 1 0
Remainder Shift 0 0 0 1 0 0 0 1
Subtract 1 1 1 0 1 Fourth cycle
Set q 1 1 1 1 1 0 0 1 0
0

Quotient
A nonrestoring-division example.
IEEE standard for floating point numbers
IEEE standard for floating point numbers
IEEE standard for floating point numbers
IEEE standard for floating point numbers
IEEE standard for floating point numbers
Example

You might also like