ch3 1
ch3 1
ch3 1
I/O devices
CPU
status
reg
data
reg
mechanism
Serial communication
Characters are transmitted separately:
no
char
start
bit 0
bit 1
...
Serial communication
parameters
CPU
status
(8 bit)
data
(8 bit)
xmit/
8251
rcv
serial
port
Programming I/O
Two types of instructions can support I/O:
special-purpose I/O instructions;
memory-mapped load/store instructions.
Busy/wait output
Simplest way to program device.
Use instructions to test when device is ready.
current_char=mystring;
while(*current_char!=\0){
poke(OUT_CHAR,*current_char);
while(peek(OUT_STATUS)!=0);
current_char++;
}
Simultaneous busy/wait
input and output
while(TRUE){
/*read*/
while(peek(IN_STATUS)==0);
achar=(char)peek(IN_DATA);
/*write*/
poke(OUT_DATA,achar);
poke(OUT_STATUS,1);
while(peek(OUT_STATUS)!=0);
}
2008 Wayne Wolf
Interrupt I/O
Busy/wait is very inefficient.
CPU cant do other work while testing
device.
Hard to do simultaneous I/O.
CPU
PC
IR
intr request
intr ack
data/address
status
reg
data
reg
mechanism
Interrupt interface
Interrupt behavior
Based on subroutine call mechanism.
Interrupt forces next instruction to
be a subroutine call to a
predetermined location.
Return address is saved to resume
executing foreground program.
Interrupt physical
interface
CPU and device are connected by
CPU bus.
CPU and device handshake:
device asserts interrupt request;
CPU asserts interrupt acknowledge
when it can handle the interrupt.
Example: interrupt-driven
main program
main(){
while(TRUE){
if(gotchar){
poke(OUT_DATA,achar);
poke(OUT_STATUS,1);
gotchar=FALSE;
}
}
}
2008 Wayne Wolf
head tail
tail
:input
:output
:queue
empty
a
empty
b
bc
c
Prioritized interrupts
device 1
device 2
interrupt
acknowledge
L1 L2 .. Ln
CPU
device n
Interrupt prioritization
Masking: interrupt with priority lower
than current priority is not
recognized until pending interrupt is
complete.
Non-maskable interrupt (NMI):
highest-priority, never masked.
Often used for power-down.
:foreground
:A
B
C
A
A,B
:B
:C
Interrupt vectors
Allow different devices to be handled
by different code.
Interrupt vector table:
Interrupt
vector
table head
handler 0
handler 1
handler 2
handler 3
Overheads for Computers as
Components 2nd ed.
Interrupt vector
acquisition
:CPU
:device
receive
request
receive
ack
receive
vector
Generic interrupt
mechanism
continue
execution
ignore
intr?
Y
bus error
timeout?
vector?
Y
call table[vector]
2008 Wayne Wolf
Interrupt sequence
Sources of interrupt
overhead
ARM interrupts
ARM7 supports two types of
interrupts:
Fast interrupt requests (FIQs).
Interrupt requests (IRQs).
Handler responsibilities:
Restore proper PC.
Restore CPSR from SPSR.
Clear interrupt disable flags.
2008 Wayne Wolf
C55x interrupts
Latency is between 7 and 13 cycles.
Maskable interrupt sequence:
Supervisor mode
May want to provide protective
barriers between programs.
Avoid memory corruption.
Sets PC to 0x08.
Argument to SWI is passed to
supervisor mode code.
Saves CPSR in SPSR.
2008 Wayne Wolf
Exception
Exception: internally detected error.
Exceptions are synchronous with
instructions but unpredictable.
Build exception mechanism on top of
interrupt mechanism.
Exceptions are usually prioritized
and vectorized.
2008 Wayne Wolf
Trap
Trap (software interrupt): an exception
generated by an instruction.
Call supervisor mode.
Co-processor
Co-processor: added function unit that
is called by instruction.
Floating-point units are often structured as
co-processors.
Available extensions:
DCT/IDCT.
Pixel interpolation
Motion estimation.
DCT/IDCT
2-D DCT/IDCT is
computed from
two 1-D
DCT/IDCT.
Put data in
different banks
to maximize
throughput.
2008 Wayne Wolf
block
Column DCT
interim
Row
DCT
DCT
Special:
ACy=copr(k8,ACx,ACy)
2008 Wayne Wolf
Software pipelined
load/compute/store for DCT
Iteration i-1
Iteration i
Iteration i+1
Dual_load
Dual_load
Dual_load
4 empty
4 empty
4 empty
3
Dual_load
3
Dual_load
3
Dual_load
8
compute
8
compute
8
compute
empty
empty
empty
4
Long_store
2008 Wayne Wolf
4
4
Long_store
Long_store
Overheads for Computers
as
Components 2nd ed.
op_i(0), load_i+1(0,1)
op_i(1), store_i-1(0,1)
op_i(2), store_i-1(2,3)
op_i(2), store_i-1(4,5)
op_i(2), store_i-1(6,7)
op_i(2), load_i+1(2,3)
Accuracy:
Full-pixel vs. half-pixel.
Algorithms:
3-step algorithm (distance 4,2,1).
4-step algorithm (distance 8,4,2,1).
4-step with half-pixel refinement.
2008 Wayne Wolf
R
D
Pixel interpolation
coprocessor operations
Load pixels and compute:
ACy=copr(k8,AC,Lmem)