Unit II
Unit II
Embedded code must not only provide rich functionality, it must also often run at a
required rate to meet system deadlines, fit into the allowed amount of memory, and
meet power consumption requirements. Designing code that simultaneously meets
multiple design constraints is a considerable challenge, but luckily there are
techniques and tools that we can use to help us through the design process.
State machines are well suited to reactive systems such as user interfaces;
Circular buffers and Queues are useful in digital signal processing.
Software State Machine
When inputs appear intermittently rather than as periodic
samples, it is often convenient to think of the system as reacting
to those inputs. The reaction of most systems can be
characterized in terms of the input received and the current
state of the system. This leads naturally to a finite-state machine
The controller’s job is to turn on a buzzer if a person sits in a seat and does not
fasten the seat belt within a fixed amount of time.
The inputs are a sensor for the seat to know when a person has sat down, a seat
belt sensor that tells when the belt is fastened, and a timer that goes off when
the required time interval has elapsed. no seat/-
The output is the buzzer.no seat/
idle
buzzer off seat/timer on
no seat/- no belt
and no
buzzer Belt/buzzer on seated timer/-
belt/-
belt/
buzzer off
belted no belt/timer on
C implementation
#define IDLE 0
#define SEATED 1
#define BELTED 2
#define BUZZER 3
switch (state) {
case IDLE: if (seat) { state = SEATED; timer_on = TRUE; }
break;
case SEATED: if (belt) state = BELTED;
else if (timer) state = BUZZER;
break;
…
Circular buffer
The circular buffer is a data structure that lets us handle
streaming data in an efficient way.
Data stream
x1 x2 x3 x4 x5 x6
t1 t2 t3
x1
x5 x2
x6 x3
x7 x4
Circular buffer
Circular buffers
input d2
d2
d3 d3
use d4 d4
x = a + b;
y = c - d;
z = x * y;
y ==bb++d;d;
y1
originalassignment
single basic blockform
Data flow graph
x = a + b;
a b c d
y = c - d;
z = x * y; + -
y1 = b + d; y
x
* +
single assignment form
z y1
DFG
DFGs and partial orders
Partial
a b
order: c d
• a+b, c-d; b+d x*y
Can do+ pairs of operations
- in any order.
y
x
* +
z y1
Control-data flow graph
• CDFG: represents control and data.
• Uses data flow graphs as components.
• Two types of nodes:
– decision;
– data flow.
Data flow node
x = a + b;
y=c+d
Write operations in basic block form for simplicity.
Control
T v1 v4
cond value
v2 v3
F
Equivalent forms
CDFG example
if (cond1) bb1(); T
cond1 bb1()
else bb2();
F
bb3();
switch (test1) { bb2()
c2
bb4() bb5() bb6()
for loop
for (i=0; i<N; i++)
loop_body(); i=0
for loop
F
i<N
i=0; T
while (i<N) {
loop_body()
loop_body(); i++; }
equivalent
Assembly , linking, loading
• Assembly and linking are the last steps in the compilation process.
• they turn a list of instructions into an image of the program’s bits in
memory.
• Loading actually puts the program in memory so that it can be executed.
HLL
HLL assembly
HLL compiler assembly
assembly assembler
Object
ObjectCode
Code
Executable
loader linker
Binary
Assembly , linking, loading
• As the figure shows, most compilers do not directly generate machine
code, but instead create the instruction-level program in the form of
human-readable assembly language.
• The assembler’s job is to translate symbolic assembly language
statements into bit-level representations of instructions known as
object code.
• A linker allows a program to be stitched together out of several
smaller pieces. The linker operates on the object files created by the
assembler and modifies the assembled code to make the necessary
links between files.
• The linker, which produces an executable binary file.
• That file may not necessarily be located in the CPU’s memory, however,
unless the linker happens to create the executable directly in RAM.
• The program that brings the program into memory for execution is
called a loader.
Assemblers
• Major tasks:
– generate binary for symbolic instructions;
– translate labels into addresses;
– handle pseudo-ops (data, etc.).
• Generally one-to-one translation.
• Assembly labels:
ORG 100
label1 ADR r4,c
Pseudo-operations
• Pseudo-ops do not generate instructions:
– ORG sets program location.
– EQU generates symbol table entry without
advancing PLC.
– Data statements define data blocks.
Linking
Understanding how the compiler works can help you know when
you cannot rely on the compiler.
HLL
a b c d
a*b + 5*(c-d)
* -
5
W,X,Y,Z are temp X
W
variables *
Z
Compilation of Arithmetic expressions,
cont’d.
a b c d ADR r4,a
MOV r1,[r4]
1 * 2 - ADR r4,b
5 MOV r2,[r4]
ADD r3,r1,r2
ADR r4,c
3 * MOV r1,[r4]
ADR r4,d
MOV r5,[r4]
SUB r6,r4,r5
4 +
MUL r7,r6,#5
ADD r8,r7,r3
DFG code
Similarly for Control code generation
if (a+b > 0)
x = 5;
else a+b>0 x=5
x = 7;
x=7
Control code generation, cont’d.
ADR r5,a
LDR r1,[r5]
ADR r5,b
1 a+b>0 x=5 2 LDR r2,b
ADD r3,r1,r2
BLE label3
LDR r3,#5
3 x=7
ADR r5,x
STR r3,[r5]
B stmtent
LDR r3,#7
ADR r5,x
STR r3,[r5]
stmtent ...
Procedure linkage
Another major code generation problem is the creation of
procedures
• Need code to:
– call and return;
– pass parameters and results.
• Parameters and returns are passed on stack.
– Procedures with few parameters may use
registers.
Procedure stacks
growth
proc1(int a) {
proc1 proc2(5);
}
FP
frame pointer
(defines the end of the Last frame)
proc2
5 accessed relative to SP
SP
stack pointer
(defines the end of the current frame)
a a[0]
= *(a + 1)
a[1]
a[2]
Two-dimensional arrays
• Column-major layout:
a[0,0]
M
a[0,1]
...
N
... a[1,0]
= a[i*M+j]
a[1,1]
Structures
aptr
struct {
int field1; field1 4 bytes
char field2;
} mystruct; *(aptr+4)
field2
struct mystruct a, *aptr = &a;
Using your compiler
Loop N
exit i<N
f = f + c[i] * x[i]
i=i+1
Instruction timing
Once we know the execution path of the program, we have to measure the execution time of
the instructions executed along that path.
However , even ignoring cache effects, this technique is simplistic for the reasons summarized
below.
• Not all instructions take the same amount of time.
– Multi-cycle instructions (RISC, Fixed length instruction)
– Fetches.
• Execution times of instructions are not independent.
(many CPUs use register bypassing to speed up instruction
sequences when the result of one instruction is used in the next instruction.)
– Pipeline interlocks.
– Cache effects.
• Execution times may vary with operand value.
– This is clearly true of floating-point instructions in which a different
number of iterations may be required to calculate the result
– Some multi-cycle integer operations.
Measurement-driven performance analysis
We can avoid
N *M-1
unnecessary
executions of this
statement by
moving it before
the loop,
as shown in the
figure 2.
Induction variable elimination
An induction variable is a variable whose value is derived from the loop iteration
variable’s value. The compiler often introduces induction variables to help
it implement the loop.
In the above code, zptr and bptr are pointers to the heads of the z and b arrays
and zbinduct is the shared induction variable.
Strength reduction
• Strength reduction helps us reduce the cost of a
loop iteration.
• Consider the following assignment:
y = x * 2;
– In integer arithmetic, we can use a left shift rather
than a multiplication by 2 (as long as we properly
keep track of overflows).
– If the shift is faster than the multiply, we probably
want to perform the substitution.
Performance optimization hints
• Use registers efficiently.
• Use page mode memory accesses.
• Analyze cache behavior:
– instruction conflicts can be handled by rewriting
code, rescheudling;
– conflicting scalar data can easily be moved;
– conflicting array data can be moved, padded.
PROGRAM-LEVEL ENERGY AND POWER ANALYSIS
AND OPTIMIZATION
Energy/power optimization
• Energy: ability to do work.
– Most important in battery-powered systems.
• Power: energy per unit time.
– Important even in wall-plug systems---power
becomes heat.
Opportunities for saving power
■ We may be able to replace the algorithms with
others that do things in clever ways that consume
less power.
■ Memory accesses are a major component of power
consumption in many applications. By optimizing
memory accesses we may be able to significantly
reduce power.
■ We may be able to turn off parts of the system—
such as subsystems of the CPU, chips in the system,
and so on—when we do not need them in order to
save power.
Measuring energy consumption