0% found this document useful (0 votes)
10K views

Chapter 4 Input Output System

This document discusses input/output systems and the Raspberry Pi GPIO port. It begins with an introduction to typical I/O system components like I/O devices, buses, and controllers. It then covers I/O ports that allow communication between devices and processors via input, output, and input/output ports. Methods of communication include memory-mapped I/O and I/O instructions. The document focuses on the Raspberry Pi GPIO port, which provides 54 I/O pins that are mapped to memory addresses to allow the processor to control I/O devices.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10K views

Chapter 4 Input Output System

This document discusses input/output systems and the Raspberry Pi GPIO port. It begins with an introduction to typical I/O system components like I/O devices, buses, and controllers. It then covers I/O ports that allow communication between devices and processors via input, output, and input/output ports. Methods of communication include memory-mapped I/O and I/O instructions. The document focuses on the Raspberry Pi GPIO port, which provides 54 I/O pins that are mapped to memory addresses to allow the processor to control I/O devices.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

Chapter 4: Input/Output System

Estructura de Computadores
Dept. de Arquitectura de Computadores

[Adapted from: PRÁCTICAS DE ENSAMBLADOR BASADAS


EN RASPBERRY PI, AJ Villena Godoy - 2015, riuma.uma.es]
Index
Introduction
Input/Output ports
◦ Raspberry GPIO
Communication between I/O devices and
the processor
◦ Polling
◦ Exceptions
◦ DMA
Buses
Dept. Arquitectura de Computadores 2
INTRODUCTION

Dept. Arquitectura de Computadores 3


A typical I/O system
Interrupts
Processor

Cache

Memory - I/O Bus

Main I/O I/O I/O


Memory Controller Controller Controller

Graphics Network
Disk Disk

Dept. Arquitectura de Computadores 4


A typical I/O system

Dept. Arquitectura de Computadores 5


I/O devices
I/O devices are incredibly diverse with respect to
◦ Behavior – input, output or storage
◦ Partner – human or machine
◦ Data rate – the peak rate at which data can be transferred
between the I/O device and the main memory or processor

8 orders of magnitude range


Device Behavior Partner Data rate (Mb/s)
Keyboard input human 0.0001
Mouse input human 0.0038
Laser printer output human 3.2000
Magnetic disk storage machine 800.0000-3000.0000
Graphics display output human 800.0000-8000.0000
Network/LAN input or machine 100.0000-10000.0000
output
Dept. Arquitectura de Computadores 6
I/O performance measures
I/O bandwidth (throughput) – amount of information that
can be input (output) and communicated across an
interconnect (e.g., a bus) to the processor/memory (I/O
device) per unit time
1. How much data can we move through the system in a certain
time?
2. How many I/O operations can we do per unit time?
I/O response time (latency) – the total elapsed time to
accomplish an input or output operation
◦ An especially important performance metric in real-time
systems
Expandability – is there any easy way to connect another
disk to the system?
Resilience – if this I/O controller (network) fails, is it going
to affect the rest of the network?

Dept. Arquitectura de Computadores 7


INPUT/OUTPUT PORTS
I/O ports
Communication between CPU and I/O devices
How does the processor communicate
with devices other than main memory?
◦ By using Input/output ports
I/O ports
◦ Input port: transfers from external device to
CPU
◦ Output port: transfers from CPU to external
◦ Input/Output ports: transfers in both
directions

Dept. Arquitectura de Computadores 9


I/O ports

Control bus
Address bus
Data bus
CPU

Data

Wrt

Rd
Address port
Register

I/O Device

Dept. Arquitectura de Computadores 10


I/O Commands
I/O devices are managed by I/O controller
hardware
Transfers data to/from device
Synchronizes operations with software
Ports in a I/O controller:
◦ Command registers
Cause device to do something
◦ Status registers
Indicate what the device is doing and occurrence of errors
◦ Data registers
Write: transfer data to a device
Read: transfer data from a device

Dept. Arquitectura de Computadores 11


I/O Commands
Main
Memory

Address bus
Data bus
CPU

Comand
Status.reg
Read reg
Write reg

I/O Device

Dept. Arquitectura de Computadores 12


Communication of I/O Devices and
Processor
User programs (processor in protected
mode) are prevented from issuing I/O
operations directly because the OS does not
provide access to the I/O ports
Only when processor is in kernel
(supervisor) mode, then the I/O ports can
be accessed
How the processor directs the I/O devices?
Through the address space!
◦ Memory-mapped I/O
◦ Specific I/O instructions

Dept. Arquitectura de Computadores 13


How the processor directs the I/O
devices?
1. Memory-mapped I/O
◦ Portions of the high-order memory address
space are assigned to each I/O device
◦ Read and writes to those memory addresses are
interpreted
as commands to the I/O devices
◦ Load/stores to the I/O address space can only be
done by the OS
◦ MIPS processor:
Load instruction to read from I/O device
i.e. lw $4, 100($5)
Store instruction to write to I/O device
i.e. sw $4, 100($5)

Dept. Arquitectura de Computadores 14


Memory-mapped I/O
FFFFFFFFh
System with I/O device

Main
Memory 000FFFFFh

Main
00000000h Memory

Address bus
Data bus 00000000h
CPU
Address bus
Data bus
CPU
System without I/O device FFFFFFFFh
I/O
Device
00100000h

Dept. Arquitectura de Computadores 15


How the processor directs the I/O
devices?
2. I/O instructions
◦ Separate instructions to access I/O registers
◦ Can only be executed in kernel mode
◦ Example: x86:
Control signal: IO/M (1= memory, 0 = IO)
Specific Input instruction to read from I/O device
i.e. in al, 37h
P(37) AL register
Specific Output instruction to write to I/O device
i.e. out 37h, al
AL register P(37)

Dept. Arquitectura de Computadores 16


I/O instructions
FFFFFFFFh
System with I/O device
FFFFFFFFh
Main
Memory Main
Memory

00000000h

Address bus
Data bus 00000000h
CPU
Address bus
Data bus
CPU
System without I/O device FFFFh
I/O
Device
0000h

Dept. Arquitectura de Computadores 17


Raspberry Pi: GPIO port
GPIO: 54 I/O signals
◦ 26 signals available through two rows with 13
pines
◦ We will use the inner row for connecting an
external board

Dept. Arquitectura de Computadores 18


External board connected to GPIO

Dept. Arquitectura de Computadores 19


GPIO pins
Raspberry Pi
manages up to 54
pins Tx

Only the showed Rx

ones are accessible


GPIO ports are
mapped in memory,
starting at
0x3F200000

Dept. Arquitectura de Computadores 20


GPIO memory mapping
GPFSELn: GPIO Function Select Registers
◦ The 54 pins are configured through 6 memory ports, GPFSEL0
to GPFSEL5
◦ Each port defines 10 groups, FSEL0 to FSEL9 (the last one only
4)
◦ A group consists of 3 bits (3*10=30 thus 2 MSB are unused)
◦ GPFSEL0 controls GPIO0 to GPIO9, GPFSEL1 controls GPIO10
to GPIO19, …
GPSETn: GPIO Pin Output Set Registers
◦ GPSET0 sets pins 0 to 31, and GPSET1 sets pins 32 to 53
GPCLRn: GPIO Pin Output Clear Registers
◦ GPCLR0 clears pins 0 to 31, and GPCLR1 clears pins 32 to 53
A SET or CLR operation in any pin just needs 1 in the
corresponding position and only affects that pin (0 means
that the pin is not modified)

Dept. Arquitectura de Computadores 21


GPIO memory mapping
• 000: Input pin
3F20 0000 GPFSEL0 • 001: Output pin
1. Set GPIO9 as Output • 010-111: Other modes
3F20 0004 GPFSEL1

3F20 0008 GPFSEL2 X FSEL9 FSEL8 FSEL7 FSEL6 FSEL5 FSEL4 FSEL3 FSEL2 FSEL1 FSEL0

3F20 000C GPFSEL3 001


3F20 0010 GPFSEL4

3F20 0014 GPFSEL5 2. Set GPIO9 (turn the led on)


3F20 0018 --
SET31
SET30
SET29
SET28
SET27
SET26
SET25
SET24
SET23
SET22
SET21
SET20
SET19
SET18
SET17
SET16
SET15
SET14
SET13
SET12
SET11
SET10
SET9
SET8
SET7
SET6
SET5
SET4
SET3
SET2
SET1
SET0
3F20 001C GPSET0

3F20 0020 GPSET1 1


3F20 0024 --

3F20 0028 GPCLR0


3. Clear GPIO9 (turn the led off)
3F20 002C GPCLR1
CLR31
CLR30
CLR29
CLR28
CLR27
CLR26
CLR25
CLR24
CLR23
CLR22
CLR21
CLR20
CLR19
CLR18
CLR17
CLR16
CLR15
CLR14
CLR13
CLR12
CLR11
CLR10
CLR9
CLR8
CLR7
CLR6
CLR5
CLR4
CLR3
CLR2
CLR1
CLR0
1
Dept. Arquitectura de Computadores 22
Example 1
Code for turning on a red LED (GPIO9):

.set GPBASE, 0x3F200000


.set GPFSEL0, 0x00
.set GPSET0, 0x1c
.text
ldr r0, =GPBASE
/* guia bits xx999888777666555444333222111000 */
mov r1, #0b00001000000000000000000000000000
str r1, [r0, #GPFSEL0]
/* guia bits 10987654321098765432109876543210
mov r1, #0b00000000000000000000001000000000
str r1, [r0, #GPSET0]
infi: b infi

Dept. Arquitectura de Computadores 23


GPIO memory mapping cont.
Other GPIO ports
GPLEVn: GPIO Pin Level Registers
◦ Returns 0 if level is 0V or 1 when level is 3.3V
GPEDSn: GPIO Event Detect Status Registers
◦ Manages interruption requests
GPRENn / GPFENn: GPIO Rising / Falling Edge Detect Enable Registers
◦ Enable interruptions fired by rising edge / falling edge
◦ Synchronous edge detection (sampled by system clock). It suppresses glitches
GPHENn / GPLENn: GPIO High / Low Detect Enable Registers
◦ Enable interruptions fired by high/ low level
GPAENn / GPAFENn: GPIO Asynchronous rising Edge / Falling Edge
Detect Enable Registers
◦ Enable interruptions fired by asynchronous rising edge / falling edge
◦ For detecting edges of very short duration
GPPUD /GPPUDCLKn : GPIO Pull-up/down Register

Dept. Arquitectura de Computadores 24


GPIO memory mapping cont
• 000: Input pin
3F20 0000 GPFSEL0 • 001: Output pin
1. Set GPIO2 as Input • 010-111: Other modes
---- ----

3F20 0034 GPLEV0 X FSEL9 FSEL8 FSEL7 FSEL6 FSEL5 FSEL4 FSEL3 FSEL2 FSEL1 FSEL0

3F20 0038 GPLEV1 000


3F20 003C --

3F20 0040 GPEDS0 2. Read GPIO2 (load GPLEV0)


3F20 0044 GPEDS1
31
30
29
28
27
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
3F20 0048 --

3F20 004C GPREN0

3F20 0050 GPREN1

3F20 0054 -- Many more ports, some of them will be explained


3F20 0058 GPFEN0 later!
3F20 005C GPFEN1

---- ----

Dept. Arquitectura de Computadores 25


Example 2
Code for checking push button (GPIO2) and turning led on:
.set GPBASE, 0x3F200000
.set GPFSEL0, 0x00
.set GPSET0, 0x1c
.set GPLEV0, 0x34
.text
ldr r0, =GPBASE
/* guia bits xx999888777666555444333222111000 */
mov r1, #0b00001000000000000000000000000000
str r1, [r0, #GPFSEL0]
/* mask for testing GPIO2 */
mov r2, #0b00000000000000000000000000000100
bucle:
ldr r3, [r0, #GPLEV0]
tst r3, r2
bne bucle
/* guia bits 10987654321098765432109876543210
mov r1, #0b00000000000000000000001000000000
str r1, [r0, #GPSET0]
infi: b infi

Dept. Arquitectura de Computadores 26


COMMUNICATION
BETWEEN I/O DEVICES
AND THE PROCESSOR

Dept. Arquitectura de Computadores 31


How I/O devices communicate with
the processor
Polling – the processor periodically checks the status of an I/O device
(through the OS) to determine its need for service
◦ Processor is totally in control – but does all the work
◦ In real-time embedded applications:
I/O rates are predetermined and it makes I/O overhead predictable
(helpful for real time)
◦ Can waste a lot of processor time due to speed differences

Interrupt-driven I/O – the I/O device issues an interrupt to indicate that it


needs attention
◦ Advantages of using interrupts
Relieves the processor from having to continuously poll for an I/O event;
user program progress is only suspended during the actual transfer of
I/O data to/from user memory space
◦ Disadvantage – special hardware is needed to
Indicate the I/O device causing the interrupt and to save the necessary
information prior to servicing the interrupt and to resume normal
processing after servicing the interrupt

Dept. Arquitectura de Computadores 32


Polling
Periodically check I/O status register
◦ If device ready, do operation
◦ If error, take action
Common in small or low-performance
real-time embedded systems
◦ Predictable timing
◦ Low hardware cost
In other systems, wastes CPU time

Dept. Arquitectura de Computadores 33


Raspberry Pi: System Timer
64 bits counter (CHI:CLO high y low)
C0 to C3: Compare registers (4 time channels)
CS: Control/status register
◦ M0 - M3 fields are set to 1 if CLO == C0 : C3
CK frequency: 1MHz (each increment 1 microsecond)

M3
M2
M1
M0
3F20 3000 CS

3F20 3004 CLO Ascending counter low bytes


3F20 3008 CHI Ascending counter high bytes
3F20 300C C0

3F20 3010 C1 Compare registers: if any one of them is equal to


CLO, then corresponding bit Mx in CS is set and
3F20 3014 C2
interrupt is provoked (if it is enabled)
3F20 3018 C3

Dept. Arquitectura de Computadores 34


System Timer
CS Port

M3
M2
M1
M0
Comparators
64 bit counter
CO

To the interrupt controller


CHI Port
C1
CLO Port
C2

C3
C0 and C2 are
used by the GPU

Dept. Arquitectura de Computadores 35


Example 3: red LED blinking
We must:
1. Configure GPIO9
2. Turn the led on
3. Wait some time
4. Turn the led off
5. Wait some time
6. Repeat steps 2-5 forever
We need:
◦ A routine that “waits”

Dept. Arquitectura de Computadores 36


Example 3: red LED blinking
.set GPBASE, 0x3F200000 GPIO9 configuration, and
.set GPFSEL0, 0x00
.set GPSET0, 0x1c
turning LED on and off
.set GPCLR0, 0x28
.set STBASE, 0x3F003000
.set STCLO, 0x04

Timer read

Our “waiting” routine will:


1. Read waiting time (input parameter)
2. Repeat
1. Read current timer value
3. While it is lower than waiting time

Dept. Arquitectura de Computadores 37


Example 3: red LED blinking
.set GPBASE, 0x3F200000
.set GPFSEL0, 0x00
.set GPSET0, 0x1c
.set GPCLR0, 0x28
.set STBASE, 0x3F003000
.set STCLO, 0x04

Routine implementation:
• We can use registers r0 and r1 as input parameters
• r0 contains the timer port address
• r1 contains the waiting time
• We must preserve registers modified inside our routine
• r4 contains the ending time
• r5 loads current timer value
espera: push {r4, r5} @ Save r4 and r5 in the stack
ldr r4, [r0, #STCLO] @ Load CLO timer
add r4, r1 @ Add waiting time -> this is our ending time
ret1: ldr r5, [r0, #STCLO] @ Enter waiting loop: load current CLO timer
cmp r5, r4 @ Compare current time with ending time
blo ret1 @ If lower, go back to read timer again
pop {r4, r5} @ Restore r4 and r5
bx lr @ Return from routine
Dept. Arquitectura de Computadores 38
Example 3: red LED blinking
.set GPBASE, 0x3F200000
.set GPFSEL0, 0x00
.set GPSET0, 0x1c
.set GPCLR0, 0x28
.set STBASE, 0x3F003000
.set STCLO, 0x04

Our main program must:


• Init the stack
• Configure GPIO9
• Init timer access (r0) and waiting time (r1) parameters
• Turn the led on and off, and call “waiting” routine between them

espera: push {r4, r5} @ Save r4 and r5 in the stack


ldr r4, [r0, #STCLO] @ Load CLO timer
add r4, r1 @ Add waiting time -> this is our ending time
ret1: ldr r5, [r0, #STCLO] @ Enter waiting loop: load current CLO timer
cmp r5, r4 @ Compare current time with ending time
blo ret1 @ If lower, go back to read timer again
pop {r4, r5} @ Restore r4 and r5
bx lr @ Return from routine
Dept. Arquitectura de Computadores 39
Example 3: red LED blinking
.set GPBASE, 0x3F200000
.set GPFSEL0, 0x00
.set GPSET0, 0x1c
.set GPCLR0, 0x28
.set STBASE, 0x3F003000
.set STCLO, 0x04

Program Status Register


31 28 27 24 23 19 16 15 10 9 8 7 6 5 4 0

N Z C V Q de J GE[3:0] IT cond_abc E A I F T mode


f s x c

It is not strictly necessary but it is better


to make sure SVC mode is enabled

espera: push {r4, r5} @ Save r4 and r5 in the stack


ldr r4, [r0, #STCLO] @ Load CLO timer
add r4, r1 @ Add waiting time -> this is our ending time
ret1: ldr r5, [r0, #STCLO] @ Enter waiting loop: load current CLO timer
cmp r5, r4 @ Compare current time with ending time
blo ret1 @ If lower, go back to read timer again
pop {r4, r5} @ Restore r4 and r5
bx lr @ Return from routine
Dept. Arquitectura de Computadores 40
Example 3: red LED blinking
.set GPBASE, 0x3F200000
.set GPFSEL0, 0x00
.set GPSET0, 0x1c
.set GPCLR0, 0x28
.set STBASE, 0x3F003000
.set STCLO, 0x04
.text mov r0, #0b11010011
msr cpsr_c, r0
mov sp, #0x08000000 @ Init stack in SVC mode
ldr r4, =GPBASE
mov r5, #0b00001000000000000000000000000000
str r5, [r4, #GPFSEL0] @ Configure GPIO9
mov r5, #0b00000000000000000000001000000000
ldr r0, =STBASE @ r0 is an input parameter (ST base address)
ldr r1, =500000 @ r1 is an input parameter (waiting time in microseconds)
bucle: bl espera @ Call waiting routine
str r5, [r4, #GPSET0] @ Turn LED on
bl espera @ Call waiting routine
str r5, [r4, #GPCLR0] @ Turn LED off
b bucle
espera: push {r4, r5} @ Save r4 and r5 in the stack
ldr r4, [r0, #STCLO] @ Load CLO timer
add r4, r1 @ Add waiting time -> this is our ending time
ret1: ldr r5, [r0, #STCLO] @ Enter waiting loop: load current CLO timer
cmp r5, r4 @ Compare current time with ending time
blo ret1 @ If lower, go back to read timer again
pop {r4, r5} @ Restore r4 and r5
bx lr @ Return from routine
Dept. Arquitectura de Computadores 41
Example 4: sound generation
A square wave can simulate a sound
◦ A pure tone is a sinusoidal waveform
with a single frequency
Very similar to led blinking
◦ We use GPIO4 instead of GPIO9
◦ The waiting time is equal to half the
period
Pure tone period

Waiting time

◦ E.g: a 440 Hz tone (la in Spanish musical


notation) has a period of
2.272 , thus the needed waiting time
.
is 1136
Dept. Arquitectura de Computadores 42
Example 4: sound generation
.set GPBASE, 0x3F200000
.set GPFSEL0, 0x00
.set GPSET0, 0x1c
.set GPCLR0, 0x28
.set STBASE, 0x3F003000
.set STCLO, 0x04
.text mov r0, #0b11010011
msr cpsr_c, r0
mov sp, #0x08000000 @ Init stack in SVC mode
ldr r4, =GPBASE
mov r5, #0b00000000000000000001000000000000
str r5, [r4, #GPFSEL0] @ Configure GPIO4
mov r5, #0b00000000000000000000000000010000
ldr r0, =STBASE @ r0 is an input parameter (ST base address)
ldr r1, =1136 @ r1 is an input parameter (waiting time in microseconds)
bucle: bl espera @ Call waiting routine
str r5, [r4, #GPSET0] @ Turn LED on
bl espera @ Call waiting routine
str r5, [r4, #GPCLR0] @ Turn LED off
b bucle
espera: push {r4, r5} @ Save r4 and r5 in the stack
ldr r4, [r0, #STCLO] @ Load CLO timer
add r4, r1 @ Add waiting time -> this is our ending time
ret1: ldr r5, [r0, #STCLO] @ Enter waiting loop: load current CLO timer
cmp r5, r4 @ Compare current time with ending time
blo ret1 @ If lower, go back to read timer again
pop {r4, r5} @ Restore r4 and r5
bx lr @ Return from routine
Dept. Arquitectura de Computadores 43
Exceptions
“Unexpected” events requiring change
in flow of instructions execution
Branch and Jumps are excluded (they are “expected
changes)
Two possible sources of exceptions
◦ Internal exceptions
e.g., undefined opcode, overflow, syscall, …
◦ External exceptions INTERRUPTS
From an external device (no memory)
Dealing with them without sacrificing
performance is hard

Dept. Arquitectura de Computadores 47


Dealing with Exceptions
Different ISAs use the terms differently
◦ Traps, exceptions, interrupts …
◦ i.e.: intel x86: exceptions and interrupt

Convention:
◦ Exception: any event (other than branches and
jumps) that changes the normal flow of
instructions
If it is an external event the exception is called
Interrupt

Dept. Arquitectura de Computadores 49


Dealing with Exceptions
Exceptions are just another form of control
hazard. Exceptions (Interrupts) arise from
◦ Arithmetic overflow (internal, exc.)
◦ Trying to execute an undefined instruction (internal,
exc.)
◦ An OS service request (e.g., a page fault) (internal,
exc.)
◦ A hardware malfunction (internal or external)
◦ An I/O device request (external, int)
Invoke the OS from the user program (internal,
software int. or system call)
The software (OS) /HW looks at the cause of the
exception and “deals” with it

Dept. Arquitectura de Computadores 50


Two Types of Exceptions
Internal exception – synchronous to
program execution
◦ caused by internal events
◦ condition must be remedied by the trap
handler:
stop the offending instruction midstream in the
pipeline
pass control to the OS trap handler
◦ the offending instruction may be retried (or
simulated by the OS) and the program may
continue or it may be aborted

Dept. Arquitectura de Computadores 51


Two Types of Exceptions
External exceptions -> Interrupts –
asynchronous to program execution
◦ caused by external events
◦ may be handled between instructions:
let the prior instructions currently active in the pipeline
complete
pass control to the OS interrupt handler
◦ simply suspend and resume user program
int

CPU
I/O
device
Dept. Arquitectura de Computadores 52
Interrupt Driven I/O
An I/O interrupt is asynchronous wrt instruction execution
◦ Is not associated with any instruction so doesn’t prevent any instruction from completing
You can pick your own convenient point to handle the interrupt
Control unit needs only check for a pending I/O interrupt at the time it starts a new instruction
With I/O interrupts
◦ Need a way to identify the device generating the interrupt
Vectored interrupts: the device can send a vector (id.) to the processor, which uses it to address the table of
the interrupt vectors, from where it gets the address of the handle.
Non vectored interrupts: the device places a status field in the Cause register, jumps to a handler at a fixed
direction.
Auto-vectored interrupts: each exception has vector associated to it.
When the handle gets control, it knows the identity of the device and
can immediately start the I/O operation
◦ Can have different urgencies (so need a way to prioritize them)
I/O interrupts have lower priority than internal exceptions
UNIX OS uses four to six levels
Interrupt priority levels (IPLs) assigned by the OS to each process can be
raised and lowered via changes to the Status’s Interrupt mask field
• Lowest ILP: all interrupts are permitted
• Highest ILP: all interrupts are blocked

Dept. Arquitectura de Computadores 53


Exceptions in ARM
ARM’s exception system is auto-vectorized
◦ There are 8 exception types, NI=0:7
◦ Each NI has an exception vector associated to it
The exception vector is a jump to a handler
NI*4 is the offset to the exception vectors table
Exception Type Offset Mode
Reset Interruption 0x00 SVC
Undefined Instruct. Exception 0x04 Undefined
SW interrupt SW Interrup. 0x08 SVC
Prefetch abort Exception 0x0C Abort
Data abort Exception 0x10 Abort
Reserved - 0x14 -
IRQ Interruption 0x18 IRQ
FIQ Interruption 0x1C FIQ
Dept. Arquitectura de Computadores 55
Exceptions in ARM
Type of exceptions:
◦ Reset: pins in P6 fire a bootload
◦ Undefined instruction: op. code not valid
◦ Software interruptions: system calls
◦ Prefetch abort /data abort: memory
misalignment, access privilege errors
◦ IRQ: interruptions due to external devices
◦ FIQ: fast interruptions

Dept. Arquitectura de Computadores 56


Exception priorities
When multiple exceptions arise at the same
time, a fixed priority system determines the
order that they are handled:
Priority Exception
Higuest 1 Reset
2 Precise Data Abort
3 FIQ
4 IRQ
5 Prefetch Abort
6 Imprecise Data Abort
Lowest 7 BKPT
Undefined Instruction
SVC
SMC
Dept. Arquitectura de Computadores 57
Status register again: cpsr_{fsxc}
31 28 27 24 23 19 16 15 10 9 8 7 6 5 4 0

N Z C V Q [de] J GE[3:0] IT[abc] E A I F T mode


f s x c
Condition code flags T bit
N = Negative result from ALU ◦ T = 0: Processor in ARM state
Z = Zero result from ALU ◦ T = 1: Processor in Thumb state
C = ALU operation Carried out J bit
◦ J = 1: Processor in Jazelle state
V = ALU operation oVerflowed
Mode bits
◦ Specify the processor mode
Sticky Overflow flag - Q flag
Interrupt Disable bits
Indicates if saturation has
◦ I = 1: Disables IRQ
occurred
◦ F = 1: Disables FIQ
E bit
SIMD Condition code bits – GE[3:0]
◦ E = 0: Data load/store is little endian
Used by some SIMD instructions ◦ E = 1: Data load/store is bigendian
A bit
IF THEN status bits – IT[abcde] ◦ A = 1: Disable imprecise data aborts
Controls conditional execution of
Thumb instructions
Dept. Arquitectura de Computadores 58
Banking of registers
10000 10010 10001 11011 10111 10011
User mode IRQ FIQ Undef Abort SVC
r0
r1 • ARM has 37 registers, all 32-bits long
r2
r3 - A subset of these registers is accessible in each
r4 mode and does not have to be preserved
r5 - Note: System mode uses the User mode
r6 register set.
r7
r8 r8
r9 r9
r10 r10
r11 r11
r12 r12
r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp)
r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr)
r15 (pc)

cpsr
spsr spsr spsr spsr spsr
Current mode Banked out registers
Dept. Arquitectura de Computadores 59
Handling exceptions
When an exception occurs, the core:
1. Copies CPSR into SPSR_<mode>
2. Sets appropriate CPSR bits
0x1C FIQ
◦ Change to ARM state
◦ Change to exception mode 0x18 IRQb irq_handler
◦ Disable interrupts (if appropriate) 0x14 (Reserved)
3. Stores the return address in 0x10 Data Abort
LR_<mode> 0x0C Prefetch Abort
4. Sets PC to vector address 0x08 Supervisor Call
0x04 Undefined Instruction
To return, exception handler needs to: 0x00 Reset
1. Restore CPSR from SPSR_<mode>
2. Restore PC from LR_<mode> Vector Table
Vector table can also be at
This can only be done in ARM state. 0xFFFF0000 on most cores

Dept. Arquitectura de Computadores 60


Handling exceptions (again)
1. Save processor status
Main Stores PC in LR_<mode>
Application Adjusts LR based on exception type
Stores X+8 LR_<mode>
Copies CPSR into SPSR_<mode>
2. Change processor status for exception
Forces the CPSR mode bits to a value
(depends on the exception)
dir inst
Exception Sets PC to vector address
X i
handler
X+4 i+1
3. Execute exception handler
X+8 i+2
<user code>
4. Return to main application
Restore CPSR from SPSR_<mode>
Restore PC: PC  LR_<mode> -4
1 and 2 performed automatically by the core
3 and 4 responsibility of software

Dept. Arquitectura de Computadores 61


User mode → FIQ mode
Registers in use Registers in use
r0 User mode FIQ mode r0
r1 r1
r2
Return address r2
r3 Exception r3
copied in r14_fiq r4
r4
r5 r5
r6 FIQ mode banked User mode banked r6
r7 out registers out registers r7
r8 r8_fiq r8 r8_fiq
r9 r9_fiq r9 r9_fiq
r10 r10_fiq r10 r10_fiq
r11 r11_fiq r11 r11_fiq
r12 r12_fiq r12 r12_fiq
r13 (sp) r13_fiq r13 (sp) r13_fiq
r14 (lr) r14_fiq r14 (lr) r14_fiq
r15 (pc) r15 (pc)

cpsr spsr_fiq cpsr


spsr_fiq

Dept. Arquitectura de Computadores 62


Exception handler
Basic structure of a exception handler
◦ Interruption: the return is done by lr-4
◦ Internal exception (as data abort): the return is done
by lr-8

◦ User must manage A, I and F flags to disable/enable


nesting of new exceptions and interruptions.
Initially the interruptions are disabled (I=F=1).

Dept. Arquitectura de Computadores 65


Memory map
256MB
256MB

256MB
128MB

16KB

16KB

Or OxFFFF0000 Dept. Arquitectura de Computadores 66


Steps to set up the Interruptions
1. Initialize the Interruption Vector (IRQ or
FIQ) in the Vector Table
2. Disable interruptions and initialize the stack
on each mode:
sp_ifq <- 0x00004000
sp_irq <- 0x00008000
sp_svc <- 0x08000000
3. Initialize GPIO ports
4. Enable sources of interruption
5. Enable interruption flags in cpsr

Dept. Arquitectura de Computadores 67


1- Initialize Vector Table
To write in the Vector Table we can use a
macro, ADDEXC, that computes the offset
of the exception handler and writes the
Vector in the Vector Table.

mov r0, #0 @Vector table base = 0


ADDEXC 0x18, irq_handler

Dept. Arquitectura de Computadores 68


macro ADDEXC offset, dirDest
The IRQ handler is located at
dirDest …
Vector table stores a branch dirDest irq_handler
instruction to the IRQ handler,
b disp, located at offset
(0x18)
disp is the number of bytes 0x1C FIQ
between dirDest and offset, 0x18 IRQ (b disp)
divided by 4
0x14 (Reserved)
While executing b disp, pc is 0x10 Data Abort
incremented twice (pc = 0x18+8)
0x0C Prefetch Abort
Thus, disp must store 0x08 Supervisor Call
0x04 Undefined Instruction
8 0x00 Reset
4 Vector Table

Dept. Arquitectura de Computadores 69


macro ADDEXC offset, dirDest
ASM b disp

Binary 11101010
dirDest irq_handler
Hex E A

8
4
0x1C FIQ
!"#$% &' ()) %&*+ 0x18 IRQ (b disp)
0xEA000000 + =
0x14 (Reserved)
!"#$% &'()) %& 0x10 Data Abort
0xE9FFFFFE = 0x0C Prefetch Abort
8 digits! 0x08 Supervisor Call
0x3A7FFFFF8* !"#$% &'()) %&
Undefined Instruction
0x04
0x00 Reset
We can divide by 4 by rotating right
the numerator (after adding 3 to Vector Table
save the two most significant bits)
Dept. Arquitectura de Computadores 70
macro ADDEXC offset, dirDest
ASM b disp

Binary 11101010
dirDest irq_handler
Hex E A

8
.macro 4 ADDEXC offset, dirDest
0x1C FIQ
ldr r1, =(\dirDest-\offset+0xA7FFFFFB)
ror!"#$%r1, #2%&*+
&' ()) 0x18 IRQ (b disp)
0xEA000000 + =
str r1, [r0, #\offset] 0x14 (Reserved)
.endm 0x10 Data Abort
!"#$% &'()) %&
0xE9FFFFFE = 0x0C Prefetch Abort
0x08 Supervisor Call
0x3A7FFFFF8* !"#$% &'()) %&
Undefined Instruction
0x04
0x00 Reset
We can divide by 4 by rotating right
the numerator (after adding 3 to Vector Table
save the two most significant bits)
Dept. Arquitectura de Computadores 71
2- Disable Interruptions and
initialize the stack
Each mode has its stack pointer (sp)
◦ Change the mode (via cpsr_c)
◦ Instructions msr (sr <- reg) y mrs (reg <-sr).
◦ Initialize the corresponding sp register
Initial state in BareMetal is SVC
◦ sp_fiq=0x4000, sp_irq=0x8000, sp_svc=0x08000000:

mov r0, #0 @ Pointer to vector table


ADDEXC 0x18, irq_handler
ADDEXC 0x1c, fiq_handler
mov r0, #0b11010001 @ FIQ mode, FIQ and IRQ disabled
msr cpsr_c, r0
mov sp, #0x4000
mov r0, #0b11010010 @ IRQ mode, FIQ and IRQ disabled
msr cpsr_c, r0
mov sp, #0x8000
mov r0, #0b11010011 @ SVC mode, FIQ and IRQ disabled
msr cpsr_c, r0
mov sp, #0x08000000

Dept. Arquitectura de Computadores 73


74
4- Enable sources of interruption ARM timer Systim_c0 0 ARM timer

ARM Mailbox Systim_c1 1 ARM Mailbox

A. Doorbell 0 Systim_c2 2 A. Doorbell 0


Select

Dept. Arquitectura de Computadores


A. Doorbell 1 Systim_c3 3 FIQ A. Doorbell 1
Source
GPU 0 halted 4 4 GPU 0 halted

GPU 1 halted 5 5 GPU 1 halted

Illegal type 1 6 6 Illegal type 1

Illegal type 0 7 7 FIQ Enable Illegal type 0

Bits in PR1 8 8 8 8

Bits in PR2 Usb_con 9 9 9

GPU IRQ 7 10 10 10 10
I2c_spi_s
GPU IRQ 9 11 11 11
lv_int
GPU IRQ 10 12 12 12 12

GPU IRQ 18 13 pwa0 13 13

GPU IRQ 19 14 pwa1 14 14

GPU IRQ 53 15 15 15 15

GPU IRQ 54 16 smi 16 16

GPU IRQ 55 17 Gpio_int0 17 17

GPU IRQ 56 18 Gpio_int1 18 18

GPU IRQ 57 19 Gpio_int2 19 19

20 20 Gpio_int3 20 20

21 21 I2c_int 21 21

22 22 Spi_int 22 22

23 23 Pcm_int 23 23

24 24 24 24 24

25 25 Uart_int 25 25

26 26 26 26 26

27 27 27 27 27

28 28 28 28 28

29 Aux_int 29 29 29
30 30 Sd_host 30 30

31 31 31 31 31
Disable Basic IRQs
IRQ basic pending

Enable Basic IRQs

Disable IRQs 1

Disable IRQs 2
IRQ pending 1

IRQ pending 2

Enable IRQs 1

Enable IRQs 2
FIQ control
3F00 B204
3F00 B200

3F00 B208
3F00 B20C
3F00 B210
3F00 B214
3F00 B218
3F00 B21C
3F00 B220
3F00 B224
4- Enable sources of interruption
Three groups: pending, enable and disable
In each group:
3F00 B200 IRQ basic pending
• IRQ basic: summary
3F00 B204 IRQ pending 1 Pending • IRQs 1 and 2: in detail
3F00 B208 IRQ pending 2

3F00 B20C FIQ control

3F00 B210 Enable IRQs 1

3F00 B214 Enable IRQs 2 Enable There is also one port for
3F00 B218 Enable Basic IRQs FIQ control
3F00 B21C Disable IRQs 1

3F00 B220 Disable IRQs 2


Disable
3F00 B224 Disable Basic IRQs

Dept. Arquitectura de Computadores 75


C1 in system

77
triggered by
triggered by
4- Enable sources of interruption

any pin of
Interrupt
Interrupt

Dept. Arquitectura de Computadores


GPIO
timer
ARM timer Systim_c0 0

ARM Mailbox Systim_c1 1 1

A. Doorbell 0 Systim_c2 2
Select
A. Doorbell 1 Systim_c3 3 FIQ
Source
GPU 0 halted 4 4

GPU 1 halted 5 5

Illegal type 1 6 6

Illegal type 0 7 7 FIQ Enable

Bits in PR1 8 8 8

Bits in PR2 Usb_con 9 9

GPU IRQ 7 10 10 10
I2c_spi_s
GPU IRQ 9 11 11
lv_int
GPU IRQ 10 12 12 12

GPU IRQ 18 13 pwa0 13

GPU IRQ 19 14 pwa1 14

GPU IRQ 53 15 15 15

GPU IRQ 54 16 smi 16

GPU IRQ 55 17 Gpio_int0 17

GPU IRQ 56 18 Gpio_int1 18

GPU IRQ 57 19 Gpio_int2 19

20 20 Gpio_int3 1 20

21 21 I2c_int 21

22 22 Spi_int 22

23 23 Pcm_int 23

24 24 24 24

25 25 Uart_int 25

26 26 26 26

27 27 27 27

28 28 28 28

29 Aux_int 29 29
30 30 Sd_host 30

31 31 31 31

Basic Control
IRQ 1 IRQ 2 FIQ
IRQ
4- Enable sources of interruption
Set the corresponding bit in the appropriate
Enable IRQs port
◦ GPIO interruption enable: use GPRENn, GPFENn,
GPHENn, GPLENn, GPARENn y GPAFENn
◦ System Timer: each counter (C0-C3) can be
enabled/disabled in Enable IRQ1 / Disable IRQ1
ports.
In case of interruption, the handler must identify
the source reading the IRQ pending ports
◦ GPIO interruption detection: use GPEDSn.
◦ System Timer interrupt detection: STCS notifies
interruption due to C0 : C3 counters

Dept. Arquitectura de Computadores 79


GPIO memory mapping
3F20 0000 GPFSEL0
Event Detect Status Set 1 when interrupt is
---- ---- requested. Must be cleared writing 1 when the
3F20 0034 GPLEV0 interrupt has been serviced
3F20 0038 GPLEV1

3F20 003C --
Rising edge ENable Enable interrupt request with a
3F20 0040 GPEDS0 rising edge (sync., avoid glitch)
3F20 0044 GPEDS1

3F20 0048 --

3F20 004C GPREN0 Falling edge Enable Enable interrupt request with a
falling edge (sync., avoid glitch)
3F20 0050 GPREN1

3F20 0054 --

3F20 0058 GPFEN0

3F20 005C GPFEN1 The suppression of glitches is done by sampling the pin using
---- ---- the system clock and then looking for a “011” (rising) or
“011” (falling edge) pattern on the sampled signal.

Dept. Arquitectura de Computadores 80


GPIO memory mapping
3F20 0000 GPFSEL0
High ENable Enable interrupt request when
---- ----
pin has 1
3F20 0064 GPHEN0

3F20 0068 GPHEN1


Low ENable Enable interrupt request when
3F20 006C --
pin has 0
3F20 0070 GPLEN0

3F20 0074 GPLEN1 Async. Rising edge ENable Enable interrupt


3F20 0078 -- request with a rising edge (async., glitch possible)
3F20 007C GPRAEN0

3F20 0080 GPAREN1 Async. Falling edge ENable Enable interrupt


--
request with a falling edge(async., glitch possible)
3F20 0084
3F20 0088 GPAFEN0

3F20 008C GPAFEN1


Pull Up Down control
3F20 0090 ----

3F20 0094 GPPUD

3F20 0098 GPPUDCLK0 Pull Up Down clock


3F20 009C GPPUDCLK1
Dept. Arquitectura de Computadores 81
Example 5: turn on a red led after 4
seconds
.include “inter.inc”
.text
mov r0, #0
ADDEXC 0x18, irq_handler
ldr r0, =GPBASE
ldr r1, =0b00001000000000000000000000000000 Set GPIO9 as Output
str r1, [r0, #GPFSEL0]
ldr r0, =STBASE
ldr r1, [r0, #STCLO] Load CLO, add 4 sec
add
str
r1, #0x400000
r1, [r0, #STC1]
@ 4.19 seconds
and store result in C1
ldr r0, =INTBASE
mov r1, #0b0010 Enable C1 interruption
str r1, [r0, #INTENIRQ1]
mov r0, #0b01010011 @ SVC mode, IRQ enabled
msr cpsr_c, r0 Enable I flag
buc: b buc

irq_handler:
push {r0, r1}
ldr r0, =GPBASE
mov r1, #0b00000000000000000000001000000000 Turn on RED led (GPIO9)
str r1, [r0, #GPSET0]
pop {r0, r1}
subs pc, lr, #4 PC LR - 4

Dept. Arquitectura de Computadores 86


Example 5: inter.inc file
.macro ADDEXC vector, dirRTI
ldr r1, =(\dirRTI-\vector+0xa7fffffb)
ror r1, #2
str r1, [r0, #\vector]
.endm
.set GPBASE, 0x3F200000
.set GPFSEL0, 0x00
.set GPFSEL1, 0x04
.set GPFSEL2, 0x08
.set GPFSEL3, 0x0c
.set GPFSEL4, 0x10
.set GPFSEL5, 0x14
.set GPFSEL6, 0x18
.set GPSET0, 0x1c
.set GPSET1, 0x20
.set GPCLR0, 0x28 GPIO
.set GPCLR1, 0x2c
.set GPLEV0, 0x34
.set GPLEV1, 0x38
.set GPEDS0, 0x40
.set GPEDS1, 0x44
.set GPFEN0, 0x58
.set GPFEN1, 0x5c
.set GPPUD, 0x94
.set GPPUDCLK0, 0x98
.set STBASE, 0x3F003000
.set STCS, 0x00
.set
.set
STCLO,
STC1,
0x04
0x10
Timer
.set STC3, 0x18
.set INTBASE, 0x3F00b000
.set INTFIQCON, 0x20c
.set INTENIRQ1, 0x210 Interrupt
.set INTENIRQ2, 0x214

Dept. Arquitectura de Computadores 87


Example 6: turn on a red left after
pushing a button
.include "inter.inc"
.text
mov r0, #0
ADDEXC 0x18, irq_handler
mov r0, #0b11010010
msr cpsr_c, r0 Stack init for IRQ mode
mov sp, #0x8000
mov r0, #0b11010011
msr cpsr_c, r0 Stack init for SVC mode
mov sp, #0x8000000
ldr r0, =GPBASE
mov r1, #0b00001000000000000000000000000000 Set GPIO9
str r1, [r0, #GPFSEL0] as output
mov r1, #0b00000000000000000000000000000100 Enable falling edge
str r1, [r0, #GPFEN0] interruptions through GPIO2
ldr r0, =INTBASE
Allow interruptions
mov r1, #0b00000000000100000000000000000000
from any GPIO pin
str r1, [r0, #INTENIRQ2]
mov r0, #0b01010011
msr cpsr_c, r0 Set SVC mode with IRQ enabled
bucle: b bucle

Dept. Arquitectura de Computadores 88


Example 6: IRQ handler

irq_handler:
push {r0, r1, r2}
ldr r0, =GPBASE
ldr r2, [r0, #GPEDS0] Check GPIO2
ands r2, #0b00000000000000000000000000000100 was pressed
movne r1, #0b00000000000000000000001000000000 Turn on
strne r1, [r0, #GPSET0] GPIO9 red led
movne r1, #0b00000000000000000000000000000100 Clear GPIO2
strne r1, [r0, #GPEDS0] event
fin: pop {r0, r1, r2}
subs pc, lr, #4

Dept. Arquitectura de Computadores 89


Select FIQ Source = 7 bits 128 sources
0-31 represent 32 interruption sources of IRQ 1

Using FIQ 32-63 represent 32 interruption sources of IRQ 2


64-95 represent 32 interruption sources of IRQ basic

FIQ Enable
control

Source

Select
FIQ

FIQ
31

30

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10
29

Systim_c3

Systim_c2

Systim_c1

Systim_c0
Usb_con
IRQ 1

Aux_int
31

30

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

4
3 1

Gpio_int3

Gpio_int2

Gpio_int1

Gpio_int0

I2c_spi_s
Uart_int
Sd_host

Pcm_int

Spi_int

I2c_int
IRQ 2

lv_int
pwa1

pwa0
smi
31

28

27

26

24

15

12

10
29

0
20

Enable FIQ for C1 of Enable FIQ for C3 of Enable FIQ for


SysTimer SysTimer GPIO_int3
◦ Bit 1 of IRQ1 Code 1 ◦ Bit 3 of IRQ1 Code 3 ◦ Bit 20 of IRQ2 Code
◦ Also 1 in FIQ Enable ◦ Also 1 in FIQ Enable 20+32
◦ Result: 0b10000001 ◦ Result: 0b10000011 ◦ Also 1 in FIQ Enable
0x81 0x83 ◦ Result: 0b10110100
0xB4

Dept. Arquitectura de Computadores 92


Using FIQ
Advantages:
◦ Registers r8 to r14 are
FIQ handler
saved
0x1C FIQ
◦ FIQ handler can start
0x18 IRQ
after vector table: no 0x14 (Reserved)
need to branch 0x10 Data Abort
0x0C Prefetch Abort
Disadvantage: only 0x08 Supervisor Call
one source of 0x04 Undefined Instruction

interruption can be 0x00 Reset

handled Vector Table

Dept. Arquitectura de Computadores 93


Example 7: Putting it all together
Turn on and off the red led at GPIO9
every 4 seconds or when the push button
at GPIO2 is pressed.
Use IRQ to handle the timer and FIQ to
handle the push button.
Use a variable in memory to control the
led state (on or off)

Dept. Arquitectura de Computadores 94


Example 7: Timer in IRQ and push
button in FIQ
.include "inter.inc"
.text
ADDEXC 0x18, irq_handler
ADDEXC 0x1c, fiq_handler
mov r0, #0b11010001
msr cpsr_c, r0 Stack init for FIQ mode
mov sp, #0x4000
mov r0, #0b11010010
msr cpsr_c, r0 Stack init for IRQ mode
mov sp, #0x8000
mov r0, #0b11010011
msr cpsr_c, r0 Stack init for SVC mode
mov sp, #0x8000000
ldr r0, =GPBASE
mov r1, #0b00001000000000000000000000000000 Set GPIO9 as output
str r1, [r0, #GPFSEL0]
mov r1, #0b00000000000000000000000000000100
str r1, [r0, #GPFEN0] Enable FE ints through GPIO2
ldr r0, =INTBASE
mov r1, #0b00000000000100000000000000000000
Allow interruptions
str r1, [r0, #INTENIRQ2] from any GPIO pin
ldr r0, =STBASE
ldr r1, [r0, #STCLO] Program timer to
add r1, #0x400000
str r1, [r0, #STC1]
interrupt in 4 seconds
ldr r0, =INTBASE
mov r1, #0b00000010 Enable C1 interruption
str r1, [r0, #INTENIRQ1]
mov r1, #0b10110100
str r1, [r0, #INTFIQCON]
Enable FIQ for GPIO_int3
mov r0, #0b00010011
msr cpsr_c, r0
Set SVC mode with FIQ and IRQ enabled
bucle: b bucle
Dept. Arquitectura de Computadores 95
Example 7: IRQ and FIQ handlers
fiq_handler:
push {r0, r1, r2}
ldr r0, =GPBASE
ldr r1, =onoff
ldr r2, [r1] Update onoff variable
eors r2, #1
str r2, [r1]
and test if its 0 or 1
mov r1, #0b00000000000000000000001000000000
streq r1, [r0, #GPCLR0] Turn on or off red led
strne r1, [r0, #GPSET0]
mov r1, #0b00000000000000000000000000000100
str r1, [r0, #GPEDS0] Clear GPIO2 interrupt
pop {r0, r1, r2}
subs pc, lr, #4
irq_handler:
push {r0, r1, r2}
ldr r0, =GPBASE
ldr r1, =onoff
ldr r2, [r1] Update onoff variable
eors
str
r2, #1
r2, [r1]
and test if its 0 or 1
mov r1, #0b00000000000000000000001000000000
strne r1, [r0, #GPSET0] Turn on or off red led
streq r1, [r0, #GPCLR0]
ldr r0, =STBASE
mov r1, #0b0010 Clear timer interrupt
str r1, [r0, #STCS]
A variable ldr r1, [r0, #STCLO] Program timer to
add r1, #0x400000
str r1, [r0, #STC1] interrupt in 4 seconds
pop {r0, r1}
subs pc, lr, #4
onoff: .word 0
Dept. Arquitectura de Computadores 96
Direct Memory Access (DMA)
For high-bandwidth devices (like disks) polling or interrupt-driven I/O
would consume a lot of processor cycles
With DMA, the DMA controller has the ability to transfer large
blocks of data directly to/from the memory without involving the
processor
1. The processor initiates the DMA transfer by supplying the I/O device
address (identity), the operation to be performed, the memory address
destination/source, the number of bytes to transfer
2. The DMA controller manages the entire transfer (possibly thousand of
bytes in length), arbitrating for the bus
3. When the DMA transfer is complete (or in case of error), the DMA
controller interrupts the processor to let it know that the transfer is
complete
There may be multiple DMA devices in one system
◦ E.g.: systems with a single memory bus and multiple I/O buses, each I/O bus
controller will often contain a DMA
◦ Processor and DMA controllers contend for bus cycles and for memory
The processor can be delayed when the memory is busy doing a DMA transfer

Dept. Arquitectura de Computadores 99


Direct Memory Access (DMA)
Memory Memory

Reg. Reg.
File File

CPU CPU

I/O I/O
device device

With DMA
Without DMA

Dept. Arquitectura de Computadores 100


Direct Memory Access (DMA)
Processor works in parallel with the DMA controller
◦ Processor dealing with Cache Buses ( )
◦ DMA controler dealing with Main Memory Bus
DMA
Main controller Hard disk
Memory
Control bus

Memory Bus

MIPS Bus
Controller
Inst
Cache Data
Cache

Dept. Arquitectura de Computadores 101


Direct Memory Access (DMA)
Example of data cache miss (or updating in a write-through)
◦ Processor is dealing with Main Memory Bus

DMA
Main controller Hard disk
Memory

Memory Bus

Control bus

MIPS Bus
Controller
Inst
Cache Data
Cache

Dept. Arquitectura de Computadores 102


The DMA Stale Data or Coherence
Problem
In systems with caches, there can be two copies of a data
item, one in the cache and one in the main memory
◦ For a DMA input (from disk to memory) – the processor will
be using stale data if that location is also in the cache
◦ For a DMA output (from memory to disk) and a write-back
cache – the I/O device will receive stale data if the data is in
the cache and has not yet been written back to the memory
The coherency problem can be solved by
1. Routing all I/O activity through the cache – expensive and a
large negative performance impact
2. Having the OS invalidate all the entries in the cache for an I/O
input or force write-backs for an I/O output (called a cache
flush)
3. Providing hardware to selectively invalidate cache entries – i.e.,
need a snooping cache controller

Dept. Arquitectura de Computadores 103


DMA and Virtual Memory
Considerations
Should the DMA work with virtual addresses or physical
addresses?
If working with physical addresses
◦ Must constrain all of the DMA transfers to stay within one page
because if it crosses a page boundary, then it won’t necessarily
be contiguous in memory
◦ If the transfer won’t fit in a single page, it can be broken into a
series of transfers (each of which fit in a page) which are handled
individually and chained together
If working with virtual addresses
◦ The DMA controller will have to translate the virtual address to
a physical address (i.e., will need a TLB structure)
Whichever is used, the OS must cooperate by not remapping
pages while a DMA transfer involving that page is in progress

Dept. Arquitectura de Computadores 104


BUSES
A Typical I/O System
Interrupts
Processor

Cache

Memory - I/O Bus

Main I/O I/O I/O


Memory Controller Controller Controller

Graphics Network
Disk Disk

Dept. Arquitectura de Computadores 107


I/O System Interconnect Issues
A bus is a shared communication link (a single set of
wires used to connect multiple subsystems) that
needs to support a range of devices with widely
varying latencies and data transfer rates
◦ Advantages
Versatile – new devices can be added easily and can be moved
between computer systems that use the same bus standard
Low cost – a single set of wires is shared in multiple ways
◦ Disadvantages
Creates a communication bottleneck – bus bandwidth limits the
maximum I/O throughput

The maximum bus speed is largely limited by


◦ The length of the bus
◦ The number of devices on the bus

Dept. Arquitectura de Computadores 108


Types of Buses
Processor-memory bus (“Front Side Bus”, proprietary)
◦ Short and high speed
◦ Matched to the memory system to maximize the memory-
processor bandwidth
◦ Optimized for cache block transfers
I/O bus (industry standard, e.g., SCSI, USB, Firewire)
◦ Usually is lengthy and slower
◦ Needs to accommodate a wide range of I/O devices
◦ Use either the processor-memory bus or a backplane bus to
connect to memory
Backplane bus (industry standard, e.g., ATA, PCIexpress)
◦ Allow processor, memory and I/O devices to coexist on a single
bus
◦ Used as an intermediary bus connecting I/O busses to the
processor-memory bus

Dept. Arquitectura de Computadores 109


Synchronous and Asynchronous Buses
Synchronous bus (e.g., processor-memory buses)
◦ Includes a clock in the control lines and has a fixed
protocol for communication that is relative to the clock
◦ Advantage: involves very little logic and can run very fast
◦ Disadvantages:
Every device communicating on the bus must use same clock rate
To avoid clock skew, they cannot be long if they are fast
Asynchronous bus (e.g., I/O buses)
◦ It is not clocked, so requires a handshaking protocol and
additional control lines (ReadReq, Ack, DataRdy)
◦ Advantages:
Can accommodate a wide range of devices and device speeds
Can be lengthened without worrying about clock skew or
synchronization problems
◦ Disadvantage: slow(er)

Dept. Arquitectura de Computadores 111


ATA Cable Sizes
Companies have transitioned from synchronous, parallel wide buses
to synchronous narrow buses
◦ Reflection on wires and clock skew makes it difficult to use 16 to 64
parallel wires running at a high clock rate (e.g., ~400MHz) so companies
have moved to buses with a few one-way wires running at a very high
“clock” rate (~2GHz)

◦ Serial ATA cables (red) are much thinner than parallel ATA cables
(green)

Dept. Arquitectura de Computadores 112


Asynchronous Bus Handshaking
Output (read) data from memory to an I/O device
Protocol
ReadReq 1
2
Data addr data
3
Ack 4 6
5 7
DataRdy

I/O device signals a request by raising ReadReq and putting the addr on
the data lines

1. Memory sees ReadReq, reads addr from data lines, and raises Ack
2. I/O device sees Ack and releases the ReadReq and data lines
3. Memory sees ReadReq go low and drops Ack
4. When memory has data ready, it places it on data lines and raises DataRdy
5. I/O device sees DataRdy, reads the data from data lines, and raises Ack
6. Memory sees Ack, releases the data lines, and drops DataRdy
7. I/O device sees DataRdy go low and drops Ack
Dept. Arquitectura de Computadores 113
Key Characteristics of I/O
Standards
Firewire USB 2.0 PCIe Serial SA SCSI
ATA
Use External External Internal Internal External
Devices 63 127 1 1 4
per channel
Max length 4.5 meters 5 meters 0.5 meters 1 meter 8 meters
Data Width 4 2 2 per lane 4 4
Peak 50MB/sec 0.2MB/sec 250MB/sec 300MB/sec 300MB/sec
Bandwidth (400) (low) per lane
100MB/sec 1.5MB/sec (1x)
(800) (full) Come as
60MB/sec 1x, 2x, 4x,
(high) 8x, 16x,
32x
Hot Yes Yes Depends Yes Yes
pluggable?
Dept. Arquitectura de Computadores 114
A Typical I/O System
Intel Xeon 5300 Intel Xeon 5300
processor processor

Front Side Bus


Memory (1333MHz, 10.5GB/sec)
FB DDR2 667
Main Controller
(5.3GB/sec)
memory Hub
DIMMs (north bridge)
5000P
ESI (2GB/sec) PCIe 8x (2GB/sec)
PCIe 4x
Disk (1GB/sec)
I/O
PCIe 4x
Controller
(1GB/sec)
Disk Serial ATA Hub
PCI-X bus
(300MB/sec) (south bridge) (1GB/sec)
Entreprise
Keyboard, LPC PCI-X bus
Mouse, … (1MB/sec) South (1GB/sec)
Bridge 2
USB 2.0 Parallel ATA
USB ports CD/DVD
(60MB/sec) (100MB/sec)

Dept. Arquitectura de Computadores 115


Interfacing I/O Devices to the
Processor, Memory, and OS
The operating system acts as the interface between the I/O
hardware and the program requesting I/O since
◦ Multiple programs using the processor share the I/O system
◦ I/O systems usually use interrupts which are handled by the OS
◦ Low-level control of an I/O device is complex and detailed
Thus OS must handle interrupts generated by I/O devices
and supply routines for low-level I/O device operations,
provide equitable access to the shared I/O resources, protect
those I/O devices/activities to which a user program doesn’t
have access, and schedule I/O requests to enhance system
throughput
◦ OS must be able to give commands to the I/O devices
◦ I/O device must be able to notify the OS about its status
◦ Must be able to transfer data between the memory and the I/O
device

Dept. Arquitectura de Computadores 117

You might also like