0% found this document useful (0 votes)
16 views68 pages

Lecture 7 - DMA - Shlomo

The document discusses Direct Memory Access (DMA) in digital computer architecture, explaining how DMA controllers enable peripherals to transfer data directly to memory without CPU intervention, resulting in faster data transfer rates. It covers various aspects of DMA, including its structure, transaction modes, and configuration, as well as the benefits of using DMA for high-bandwidth devices. Additionally, it describes different DMA operation modes, such as cycle stealing and burst mode, and the management of DMA descriptors for efficient data handling.

Uploaded by

bitmetvt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views68 pages

Lecture 7 - DMA - Shlomo

The document discusses Direct Memory Access (DMA) in digital computer architecture, explaining how DMA controllers enable peripherals to transfer data directly to memory without CPU intervention, resulting in faster data transfer rates. It covers various aspects of DMA, including its structure, transaction modes, and configuration, as well as the benefits of using DMA for high-bandwidth devices. Additionally, it describes different DMA operation modes, such as cycle stealing and burst mode, and the management of DMA descriptors for efficient data handling.

Uploaded by

bitmetvt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 68

Digital Computer Structure

36114191
1

Lecture 6 – DMA

Shlomo Greenberg

Web site: http://moodle2.bgu.ac.il


2 DMA - Background

Course “Digital Computer Structure” (36114191)


© Roy Shor. 2017.
Background
3

 A DMA controller allows the peripherals


to interface directly with memory
without CPU intervention.
 This allows the data transfer rate to
approach the access time of the
memory. I.e. Much higher speed data
transfer rates.
 Used for DRAM refresh, or when large
blocks of data are to be transferred, e.g.:
 Video, disk system R/W, LPT and COM
ports, memory-to-memory R/W.
Why do we need DMA?
DMA in Digital Computer
5
DMA – General – Cont.
6

 DMA allows the I/O device to access the memory


directly, without using the core. DMA can lead
to a significant improvement in performance
because data movement is one of the most
common operations performed in processing
applications.
 Only peripherals where data flow is significant
(Kbps or greater) need to be DMA-capable.
 E.g.: video, audio and network interfaces.
 Lower-bandwidth peripherals can also be
equipped with DMA capability, but it's less of an
imposition on the core to step-in and assist with
data transfer on these interfaces.
Without DMA – Memory to
7
Memory
 Sequential data transaction is done by a
short program with inputs of pointer to
the data
 The CPU would loop (for 1KB of data,
1024 iterations are required) thru:
 Fetch Read command (into a register)
 Fetch Write command (into a register)
 Fetch increment command (into a register)
 Commands decoding
 Execute three commands
 Initiator: programmer
Without DMA – IO to
8
Memory
 On data transaction from IO to Memory,
data is already on the data bus.
 Giving Chip Select (CS) to the IO device
 Data transaction is done in multiple
transactions
 Initiator: IO

Remark: All IO devices should be port-mapped


Barcode 39 - An Example for IO
transactions where DMA would be useful
9

 Each character is composed of nine elements: five


bars and four spaces. Three of the nine elements
in each character are wide (binary value 1), and
six elements are narrow (binary value 0).
 For example, ID number would use 81 bars (9 x 9).
 A light source is emitting on the barcode => A
sensor is receiving the light reflected from the
barcode
 The output of the sensors looks like:
 Pulse width is used to decode characters
Example – Cont.
10

 Initiator of the transaction: IO (need to


transfer the decoded characters to
memory)
 A counter counts the number of decoded
characters, and initiates a transaction
upon reaching a certain threshold (e.g.
after getting 81 bars, hence reading a
full ID number)
 Decoded characters should be copied to
memory, otherwise they will be overwritten
by newer data
11 DMA Structure

Course “Digital Computer Structure” (36114191)


© Roy Shor. 2017.
DMA Structure
12

 DMA controller includes:


 an address bus
 a data bus
 control registers
 An efficient DMA controller possess the ability
to request access to any resource it needs,
without having the processor itself get
involved
 It must have the capability to generate
interrupts
 It has to be able to calculate addresses within
the controller.
DMA Registers
13

 DMA Controller usually has at least the following


registers:
 Block length (transfer block size) register
 Source address (for the next byte/word transfer) register
 Destination address (for the next byte/word transfer)
register
 Byte counter - how many bytes have been transferred
 A temporary data register – the data after reading it from
the source and before sending the data to the destination
 The user initializes the first 3 registers, then starts the
DMA process
 Some DMA controllers have additional control
registers to handle scheduling and priority of multiple
DMA operations.
DMA Channels
14

 Each DMA can supports a certain


amount of independent channels (e.g.
64)
 Each channel supports a transaction
 Each of the channels is associated with a
given event queue/transfer controller
and a parameter set, which describes
the transaction parameters
DMA data transaction
15
modes
Byte: Control is returned to CPU each byte.
E.g. in 81 bars, there will be 81 loops of the
order of execution shown in the former slide
(DMA_REQ, …)
 Block: Use one request to transfer a whole

block. Control is returned to the CPU only


after the entire block has been transferred.
E.g. in 81 bars, there will be only one DMA
request.
 Burst: Block is divided, by programmer, into

few fixed size bursts. Control is returned to


ark: Data
thetransaction mode determines who controls the dat
CPU each burst.
Transaction mode – data
16
flows
Block Burst Byte
DMA Transaction Mode -
17
Transparent
 Transparent mode takes the most time to transfer a
block of data, yet it is also the most efficient mode
in terms of overall system performance.
 In transparent mode, the DMA controller transfers
data only when the CPU is performing operations
that do not use the system buses.
 The primary advantage of transparent mode is that
the CPU never stops executing its programs and the
DMA transfer is free in terms of time.
 The disadvantage is that the hardware needs to
determine when the CPU is not using the system
buses, which can be complex.
Cycle “Stealing” DMA
18

 Memory accesses by the processor and DMA Controller are


interwoven; DMA devices have higher priority then
processor over bus control
 Cycle Stealing mode: DMA Controller “steals” memory
cycles from processor, though processor originates most
memory access.
 Block or Burst mode: The DMA controller may be given
exclusive access to the main memory to transfer a block of
data without interruption
 Conflicts in DMA:
 Processor and DMA,
 Two DMA controllers, try to use the bus at the same time to
access the main memory
 Synchronization mechanisms must be provided to avoid
accessing non-updated information from RAM
Difference Between Burst
Mode and Cycle Stealing
19
ModeBurst
ofMode
DMAof DMA Cycle Stealing Mode of DMA
Definition It is the DMA data transfer It is the data transfer
technique in which no. of technique in which one data
data words are transferred word is transferred and then
continuously until whole control is returned to CPU.
data is not transferred.
Data Data transfer Continues Data is transferred Only when
Transfer until whole data is not CPU is idle.
transferred.
Speed This is very fast data It is the slow data transfer
transfer technique and is technique as data is
used to transfer data for transferred only when CPU is
fast speed devices. idle
CPU Low CPU Utilization High CPU utilization because
Utilizatio because CPU remains idle data is transferred when CPU
n until whole data is not has no task to perform.
transferred.
Extra No need to check CPU Extra Overhead because every
DMA Requests
20

 HOLD: Input to the CPU which is used to


request a DMA cycle
 CPU suspends execution (could be the
middle of an instruction), places all buffers
in Hi-Z state
 Higher priority than ALL pins (NMI, INTR)
except RESET
 HLDA: Output to the CPU to acknowledge
DMA request
Memory to Memory DMA
21
steps (burst mode)
1. CPU writes to DMA controller to request a memory to
memory DMA operation
2. DMA starts and requests CPU for buses (Hold REQ)
3. CPU gives buses to DMA (Hold ACK) and disconnects itself
from buses
4. DMA puts source address and read signal on address and
control buses
5. DMA gets data from data bus
6. DMA puts destination address, data and write signal on
buses
7. DMA increments source and destination address registers
and byte counter by 1
8. If value in byte counter is not equal to block size, go back to
step 4; else, DMA gives buses back to CPU (withdraws Hold
REQ)
Peripherals to Memory
(=Read IO) DMA steps
22
(burst
1.
mode)
CPU writes to DMA controller and peripheral device to request a
peripheral to memory DMA operation
2. When ready, peripheral device sends a DMA request to DMA
3. DMA requests CPU for buses (Hold REQ)
4. CPU gives buses to DMA (Hold ACK) and disconnects itself from
buses
5. DMA gives DMA ACK signal back to peripheral device to signal
start of DMA
6. DMA sends read control signal to peripheral device and puts
destination address and read signal on address and control buses
for memory
7. Data transfers from peripheral to memory directly
8. DMA increments destination address register and byte counter by
1
9. If value in byte counter is not equal to block size, go back to step
6; else, DMA gives buses back to CPU (withdraws Hold REQ)
DMA-CPU Optimized
23
Operation
 In an optimized application, the core
would never have to move any data, but
rather only access it in L1 memory.
 The core wouldn't need to wait for data
to arrive, because the DMA engine would
have already made it available by the
time the core was ready to access it.
24 DMA Programming (Configuration)

Course “Digital Computer Structure” (36114191)


© Roy Shor. 2017.
DMA Configuration - Basic
25

 Basic DMA Configuration is specifying a


starting source and a destination address
for data.
 In the case of a peripheral DMA, the
peripheral's FIFO serves as either the
source or the destination.
 When the peripheral serves as the source, a
memory location (internal or external) serves
as the destination address.
 When the peripheral serves as the destination,
a memory location (internal or external) serves
as the source address.
DMA Configuration - Basic
26

 2nd basic parameter for basic


configuration is the number of words to
transfer.
 3rd basic parameter for basic
configuration is the word size of each
transfer. E.g. can be either 8, 16 or 32
bits.
 This type of transaction represents a
simple one-dimensional ("1D") transfer
with a unity "stride“.
Transfer Mode
27

 When a DMA channel is enabled and receives a trigger


from its configured trigger source, it begins moving
data as soon as the needed resources become
available.
 As a result of the trigger event, the channel transfers
either all or a subset of the block (this is
configurable). The amount of data that is transferred
in response to each trigger event (i.e. chunk) is
determined by the DMA transfer mode
 E.g. chunk size can be: single word, line, and block.
 Typically, a DMA channel used in conjunction with a
peripheral operates in a single word transfer mode
(triggered by a receiver full or transmitter empty
condition).
DMA Config. – 1D – unity
28
stride
 As part of this transfer,
the DMA controller keeps
track of the source and
destination addresses as
they increment. With a
unity stride, the address
increments by 1 byte for
8-bit transfers, 2 bytes
for 16-bit transfers, and
4 bytes for 32-bit
transfers. The above
parameters configure a
basic 1D DMA transfer
DMA translations -
29
Examples
Allocating a DMA buffer
30

 Contiguous buffer: contiguous block of memory


allocated
 Scatter/gather: allocated buffer can be fragmented
in the physical memory and does not need to be
allocated contiguously. The allocated physical
memory blocks are mapped to a contiguous buffer in
the calling process's virtual address space, thus
enabling easy access to the allocated physical
memory blocks.
 Bounce buffer: to allow devices with limited
addressing to access all of virtual address space. A
bounce buffer resides in memory low enough for a
device to copy from and write data to. It is then
copied to the desired user page in high memory
DMA Special Address Modes
31

 Circular buffer: Use a two-dimensional


counter and a negative offset that wraps back
to the buffer start address.
 Linear buffer with non-unit stride: Use a
two-dimensional counter with one word per
row. This method must be used with byte
packing, which has a stride of three.
 A larger-than-normal field width in a
two-dimensional counter: Concatenate two
fields in a three-dimensional counter by
specifying an offset value of one between
them.
Descriptor Non-Continuous-
32
Array Mode Management
Performed in either of the below two
options:
 Linked List of Descriptors (Chained

Descriptors)
 “Throttled” Descriptor Management

(Processor Manual Descriptors’ Control)


Chained (Linked-List)
33
Descriptors
 Setting up multiple descriptors that are chained
together. The term "chained" implies that one
descriptor points to the next descriptor, which is
loaded automatically. To complete the chain, the
last descriptor points back to the first descriptor,
and the process repeats.
 One reason to use this technique rather than the
Autobuffer mode is that descriptors allow more
flexibility in the size and direction of the transfers
“Throttled” Descriptor
34
Management
 The processor manually managing the descriptor list. a
descriptor is a structure in memory. Each descriptor
contains a configuration word. Each configuration word
contains an "Enable" bit which can regulate when a
transfer starts.
 All the descriptors are setup in advance, but with the
"Enable" bits cleared. When the processor determines
the time is right to start a descriptor, it simply updates
the descriptor in memory and then writes to a DMA
register to start the stalled DMA channel.
Managing Descriptors’
35
Queue using Interrupts
 There are two general methods for managing a
descriptor queue using interrupts:
 Interrupting upon the completion of every descriptor. Use
this method only if you can guarantee that each interrupt
event will be serviced separately, with no interrupt overrun.
 Interrupting only on completion of the work transfer
specified by the last descriptor of a work block. A work
block is a collection of one or more descriptors.
 To maintain synchronization of the descriptor queue,
the non-interrupt software has to maintain a count of
descriptors added to the queue, while the interrupt
handler maintains a count of completed descriptors
removed from the queue.
 The counts are then equal only when the DMA channel
pauses after having processed all the descriptors.
Channel Priority
36

 DMA channel priority determines if and when a


DMA channel can be interrupted during a block
transfer.
 An interruption occurs between word transfers.
The current DMA word transfer is allowed to
complete before the core or another DMA channel
can take control of the resource that is under
contention.
 The DMA channel priority arbitration occurs for
each DMA word transfer; only enabled and
already triggered channels can take part in this
arbitration.
37 Intel DMA Controller 8237A

Course “Digital Computer Structure” (36114191)


© Roy Shor. 2017.
Intel DMA Controller 8237A
38

 Widely used DMA


controller.
 Integrated in ISP chipsets
(e.g. 82357, includes
PICs).
 A single 8237A supports
up to 4 peripheral
devices.
 8237A acts as a
peripheral device itself
 Requires S/W initialization
via internal registers.
Block Diagram & Pin
Configuration
DMA Operation of the
40
8237A
 8237A contains four
independent DMA
channels (0-3), each
dedicated to a
particular peripheral
device.
 DREQ0-DREQ3: DMA
request lines.
 8237A monitors these
lines while in its idle
state.
 When a device wants
service, it activates its
DREQx line
DMA Operation of the
41
8237A
 In response to a DREQx, the
8237A activates “hold
request” (HRQ), tied to HOLD
on 8086.
 CPU responds by activating
“hold acknowledge”: (HLDA).
 After the buffers have gone hi-Z
 8237A now acknowledges to
the requesting device by
activating the corresponding
DACKx line.
Device now has direct access to

the system buses.


NOTE: HRQ stays active until DMA device is done
HOLD/HLDA Waveform
42

 HRQ (HOLD) stays active until DMA


device is done
DMA Operation of the
43
8237A
 8237A generates all
address and control lines
during DMA operation
 The data pins are used for
memory- to-memory
operations (8237 acts as a
temporary holding place for
the data via internal
registers)
 To initiate DMA transfer, one
has to program the 8237 on
how to behave, where to
move data to/from, how
much data, etc.
Programming the 8237A
44

 Command control
 Decodes which register (see below) is to be
accessed and what operation is to be performed
 Twelve (12) internal registers: Loaded prior to
DMA operations
Examples: IO to/from
45
memory
 Peripheral I/O write to
memory
 IOR used to signal device to
place data on bus.
 MEMW active to allow write to
memory.
 Data does not go through the
8237A!
 Peripheral I/O read from
memory
 IOW used to signal device to
latch data from bus.
 MEMR active to allow read
from memory.
Example: Memory-to-
46
memory
 MEMR cycle (4 clock
cycles) stores data in a
temporary register inside
8237A
 MEMW cycle (4 clock
cycles) writes to memory.
 8 clock cycles total.
 READY line:
 Must be “1” before the
8237A will complete a
memory or I/O cycle
 Used by “slow” memory or
devices (WAIT states)
8086 system with 8237A
47
DMA
48 Intel DMA Controller 82380

Course “Digital Computer Structure” (36114191)


© Roy Shor. 2017.
Intel 32b DMA Controller
49
82380
 Built to accompany Intel’s 80386
 32b Data Bus
 24b Address Bus
 Maximum data rate of 40MBps at 20MHz
 8 x Independent programmable channels
 20 x source interrupt controller
 15 external, 5 internal
 Built to accompany Intel Interrupt Controller 8259
 4 x 16b Programmable-interval timers
 Programmable wait state generator
 DRAM refresh controller
 Shutdown detect and reset control
Intel DMAC 82380 Block
Diagram
20
independ
ent
general
purpose
registers
for
command
and
status.
Architecture of the DMAC
82380
Interrupt Controller – Block
Diagram
Flow of DMA Controller
Operation
Requester, Target, and DMAC
Interconnection (2-cycle
Configuration)
Buffer Transfer in Single
Transfer Mode
Waveform – Single Transfer
Mode
Buffer Transfer in Block
Transfer Mode
Waveform – Block Transfer
58
Mode
Buffer Transfer in Demand
Transfer Mode
Waveform – Demand
60
Transfer Mode
Flow of Events in the Buffer
Chaining Process
Waveform – Beginning of a
62
DMA process
 Channel priority resolution takes place during
the bus state before the HLDA is asserted,
allowing the DMA controller to respond to
HDLA without extra idle bus states.
Waveform – Termination of
63
a DMA process
 Due to De-Asserting DREQn
Waveform – Termination of
64
a DMA process
 Due to an external EOP#
DMA Programmed Priority
Bus Arbitration and DMA
Sequence
DMA - Summary
67

 Hardware device which offloads the CPU from


intense MEMCPY operations, and from IO <> Memory
operations
 Many modes to support various operations.
 Inevitable in real-time optimization techniques to
reach required performances.
 An integral part of any multimedia system, and it's
crucial to appreciate its complexities in order to fully
optimize an application. However, other system
resources like memory and the processor core must
arbitrate with the DMA controller, and achieving the
perfect balance between them all involves gaining a
fundamental understanding of how data moves
around a system.
References
68

 Networked Embedded Systems, Sachin K. & Pengyu Z., 2016


 The Intel Microprocessors 8086/8, 80186/80188, 80286,
80386, 80486 Pentium, Pentium Pro processor, Pentium II,
Pentium 4, and core2 with 64-bit extensions Architecture
Programming and Interfacing, Chapter 13: Direct Memory
Access and DMA-controlled I/O, Barry B. Brey, 2011
 DCS Lecture Notes of Shlomo Greenberg, Yaara Ben-Or, 2013
 DCS Lecture Notes of Shlomo Greenberg, Zeevik, 2008
 DCS - Direct Memory Access Using Intel DMA Controllers
8237 / 82380, Shlomo Greenberg, 1999
 DSP56300 Family Manual, Motorola, 1999
 Using Direct Memory Access effectively in media-based
embedded applications - Part 1, David Katz and Rick Gentile, 2007
 Intel 8237A datasheet, 1993
 Intel 82380 datasheet, 1992

You might also like