VHDL Coding
Exercise 4: FIR Filter
Where to start?
Designspace
Feedback
Exploration
Algorithm Architecture
Optimization
RTL-
VHDL-Code
Block diagram
Algorithm
• High-Level System Diagram
Context of the design
Inputs and Outputs
Throughput/rates
Algorithmic requirements
y k bi x k i
N
• Algorithm Description
Mathematical Description
i 0
Performance Criteria
x k y k
Accuracy FIR
Optimization constraints
Implementation constraints
Area
Speed
Architecture (1)
• Isomorphic Architecture:
Straight forward implementation of the algorithm
x k
b0 b1 b2 bN 2 bN 1 bN
y k
Architecture (2)
• Pipelining/Retiming:
Improve timing
x k
b0 b1 b2 bN 2 bN 1 bN
y k
Insert register(s) at the inputs or outputs
Increases Latency
Architecture (2)
• Pipelining/Retiming:
Improve timing
x k
b0 b1 b2 bN 2 bN 1 bN
y k
Insert register(s) at the inputs or outputs
Increases Latency
Perform Retiming: Backwards:
Move registers through the logic
without changing functionality Forward:
Architecture (2)
• Pipelining/Retiming:
Improve timing
x k
b0 b1 b2 bN 2 bN 1 bN
y k
Insert register(s) at the inputs or outputs
Increases Latency
Perform Retiming: Backwards:
Move registers through the logic
without changing functionality Forward:
Architecture (2)
• Pipelining/Retiming:
Improve timing
x k
b0 b1 b2 bN 2 bN 1 bN
y k
Insert register(s) at the inputs or outputs
Increases Latency
Perform Retiming: Backwards:
Move registers through the logic
without changing functionality Forward:
Architecture (3)
• Retiming and simple transformation:
Optimization
x k
b0 b1 b2 bN 2 bN 1 bN
y k
Reverse the adder chain
Architecture (3)
• Retiming and simple transformation:
Optimization
x k
b0 b1 b2 bN 2 bN 1 bN
y k
Reverse the adder chain
Architecture (3)
• Retiming and simple transformation:
Optimization
x k
b0 b1 b2 bN 2 bN 1 bN
y k
Reverse the adder chain
Perform Retiming
Architecture (3)
• Retiming and simple transformation:
Optimization
x k
b0 b1 b2 bN 2 bN 1 bN
y k
Reverse the adder chain
Perform Retiming
Architecture (3)
• Retiming and simple transformation:
Optimization
x k
b0 b1 b2 bN 2 bN 1 bN
y k
Reverse the adder chain
Perform Retiming
Architecture (3)
• Retiming and simple transformation:
Optimization
x k
b0 b1 b2 bN 2 bN 1 bN
y k
Reverse the adder chain
Perform Retiming
Architecture (3)
• Retiming and simple transformation:
Optimization
x k
b0 b1 b2 bN 2 bN 1 bN
y k
Reverse the adder chain
Perform Retiming
Architecture (3)
• Retiming and simple transformation:
Optimization
x k
b0 b1 b2 bN 2 bN 1 bN
y k
Reverse the adder chain
Perform Retiming
Architecture (3)
• Retiming and simple transformation:
Optimization
x k
b0 b1 b2 bN 2 bN 1 bN
y k
Reverse the adder chain
Perform Retiming
Architecture (3)
• Retiming and simple transformation:
Optimization
x k
b0 b1 b2 bN 2 bN 1 bN
y k
Reverse the adder chain
Perform Retiming
Architecture (3)
• Retiming and simple transformation:
Optimization
x k
b0 b1 b2 bN 2 bN 1 bN
y k
Reverse the adder chain
Perform Retiming
Architecture (3)
• Retiming and simple transformation:
Optimization
x k
b0 b1 b2 bN 2 bN 1 bN
y k
Reverse the adder chain
Perform Retiming
Architecture (3)
• Retiming and simple transformation:
Optimization
x k
b0 b1 b2 bN 2 bN 1 bN
y k
Reverse the adder chain
Perform Retiming
Architecture (4)
• More pipelining:
Add one pipelining stage to the retimed circuit
x k
b0 b1 b2 bN 2 bN 1 bN
y k
The longest path is given by the multiplier
Unbalanced: The delay from input to the first pipeline stage is
much longer than the delay from the first to the second stage
Architecture (5)
• More pipelining:
Add one pipelining stage to the retimed circuit
x k
b0 b1 b2 bN 2 bN 1 bN
y k
Move the pipeline registers into the multiplier:
Paths between pipeline stages are balanced
Improved timing
Tclock = (Tadd + Tmult)/2 + Treg
Architecture (6)
• Iterative Decomposition:
Reuse Hardware
x k
b0 b1 b2 bN 2 bN 1 bN
y k
Identify regularity and reusable hardware components
Add control
x k
multiplexers
storage elements
Control
0
Increases Cycles/Sample
b0 y k
bN
RTL-Design
• Choose an architecture under the following constraints:
It meets ALL timing specifications/constraints:
Throughput Iterative
Latency Decomposition
It consumes the smallest possible area
It requires the least possible amount of power
• Decide which additional functions are needed and
how they can be implemented efficiently:
Storage of samples x(k) => MEMORY x k
Storage of coefficients bi => LUT
Address generators for MEMORY and LUT 0
=> COUNTERS b 0 y k
Control => FSM b N
RTL-Design
• RTL Block-diagram:N
Datapath y k bi x k i
i 0
x k
0
b0 y k
bN
• FSM:
Interface protocols
datapath control:
RTL-Design
• How it works: y k bi x k i
N
i 0
IDLE
Wait for new sample
RTL-Design
• How it works: y k bi x k i
N
i 0
IDLE
Wait for new sample
Store to input register
RTL-Design
• How it works: y k bi x k i
N
i 0
IDLE
Wait for new sample
Store to input register
NEW DATA:
Store new sample to memory
RTL-Design
• How it works: y k bi x k i
N
i 0
IDLE
Wait for new sample
Store to input register
NEW DATA:
Store new sample to memory
RUN:
y k bi x k i
N
i 0
RTL-Design
• How it works: y k bi x k i
N
i 0
IDLE
Wait for new sample
Store to input register
NEW DATA:
Store new sample to memory
RUN:
N
y k bi x k i
i 0
Store result to output register
RTL-Design
• How it works: y k bi x k i
N
i 0
IDLE
Wait for new sample
Store to input register
NEW DATA:
Store new sample to memory
RUN:
N
y k bi x k i
i 0
Store result to output register
DATA OUT:
Output result
RTL-Design
• How it works: y k bi x k i
N
i 0
IDLE
Wait for new sample
Store to input register
NEW DATA:
Store new sample to memory
RUN:
N
y k bi x k i
i 0
Store result to output register
DATA OUT:
Output result / Wait for ACK
RTL-Design
• How it works: y k bi x k i
N
i 0
IDLE
Wait for new sample
Store to input register
NEW DATA:
Store new sample to memory
RUN:
N
y k bi x k i
i 0
Store result to output register
DATA OUT:
Output result / Wait for ACK
IDLE: …
Translation into VHDL
• Some basic VHDL building blocks:
Signal Assignments:
Outside a process:
AxD YxD
• This is NOT allowed !!!
AxD YxD
BxD
Within a process (sequential execution):
AxD • Sequential execution
YxD • The last assignment is
BxD
kept when the process
terminates
Translation into VHDL
• Some basic VHDL building blocks:
Multiplexer:
AxD
BxD YxD
CxD Default
SELxS Assignment
Conditional Statements:
AxD
BxD
SelAxS OUTxD
CxD
DxD
SelBxS
STATExDP
Translation into VHDL
• Common mistakes with conditional statements:
Example:
AxD
??
• NO default assignment
SelAxS OUTxD
BxD
?? • NO else statement
SelBxS
STATExDP
• ASSIGNING NOTHING TO A SIGNAL IS NOT A
WAY TO KEEP ITS VALUE !!!!! => Use FlipFlops !!!
Translation into VHDL
• Some basic VHDL building blocks:
Register:
DataREGxDN DataREGxDP
Register with ENABLE:
DataREGxDN DataREGxDP
DataREGxDN DataREGxDP
Translation into VHDL
• Common mistakes with sequential processes:
DataREGxDN DataREGxDP
CLKxCI
DataRegENxS
• Can not be translated
into hardware and is
NOT allowed
DataREGxDN DataREGxDP
0
1
• Clocks are NEVER
generated within
any logic
DataREGxDN DataREGxDP
CLKxCI
• Gated clocks are more
complicated then this
• Avoid them !!!
DataRegENxS
Translation into VHDL
• Some basic rules:
Sequential processes (FlipFlops)
Only CLOCK and RESET in the sensitivity list
Logic signals are NEVER used as clock signals
Combinatorial processes
Multiple assignments to the same signal are ONLY possible within
the same process => ONLY the last assignment is valid
Something must be assigned to each signal in any case OR
There MUST be an ELSE for every IF statement
• More rules that help to avoid problems and surprises:
Use separate signals for the PRESENT state and the
NEXT state of every FlipFlop in your design.
Use variables ONLY to store intermediate results or even
avoid them whenever possible in an RTL design.
Translation into VHDL
• Write the ENTITY definition of your design to specify:
Inputs, Outputs and Generics
Translation into VHDL
• Describe the functional units in your block diagram
one after another in the architecture section:
Translation into VHDL
• Describe the functional units in your block diagram
one after another in the architecture section:
Translation into VHDL
• Describe the functional units in your block diagram
one after another in the architecture section:
Register with ENABLE
Register with ENABLE
Translation into VHDL
• Describe the functional units in your block diagram
one after another in the architecture section:
Register with CLEAR
Translation into VHDL
• Describe the functional units in your block diagram
one after another in the architecture section:
Counter
Counter
Translation into VHDL
• Describe the functional units in your block diagram
one after another in the architecture section:
Translation into VHDL
• The FSM is described with one sequential process
and one combinatorial process
Translation into VHDL
• The FSM is described with one sequential process
and one combinatorial process
Translation into VHDL
• The FSM is described with one sequential process
and one combinatorial process
Translation into VHDL
• The FSM is described with one sequential process
and one combinatorial process
MEALY
Translation into VHDL
• The FSM is described with one sequential process
and one combinatorial process
Translation into VHDL
• The FSM is described with one sequential process
and one combinatorial process
MEALY
Translation into VHDL
• The FSM is described with one sequential process
and one combinatorial process
MEALY
Translation into VHDL
• Complete and check the code:
Declare the signals and components
Check and complete the sensitivity lists of ALL combinatorial
processes with ALL signals that are:
used as condition in any IF or CASE statement
being assigned to any other signal
used in any operation with any other signal
Check the sensitivity lists of ALL sequential processes that they
contain ONLY one global clock and one global async. reset signal
no other signals
Other Good Ideas
• Keep things simple
• Partition the design (Divide et Impera):
Example:
Start processing the next sample, while the previous
result is waiting in the output register:
Just add a FIFO to at the output of you filter
• Do NOT try to optimize each Gate or FlipFlop
• Do not try to save cycles if not necessary
• VHDL code
Is usually long and that is good !!
Is just a representation of your block diagram
Does not mind hierarchy