Top-Down Design and Synthesis Issues For Sequential Processes (That's Two Tips!)
Top-Down Design and Synthesis Issues For Sequential Processes (That's Two Tips!)
Top-Down Design and Synthesis Issues For Sequential Processes (That's Two Tips!)
)
The Doulos Design Fitness Challenge is an annual event held at an EDA-oriented UK show. The focus of the Challenge this year was on measuring engineers' ability to use VHDL for a real design task. Engineers were asked to design a counter in VHDL by writing a complete register transfer level VHDL architecture from the following specification and entity declaration. The VHDL description was to be synthesizable, and had to be legal VHDL! The circuit is an 8 bit synchronous counter, with an enable, a parallel load, and an asynchronous reset. The counter loads or counts only when the enable is active. When the load is active, Data is loaded into count. The counter has two modes: binary and decade. In binary mode, it is an 8 bit binary counter. In decade mode, it counts in two 4 bit nibbles, each nibble counting from 0 to 9, and the bottom nibble carrying into the top nibble, such that it counts from 00 to 99 decimal. The truth table of the counter is as follows (- means don't care): Reset 0 1 1 1 1 Clock ^ ^ ^ ^ Enable 1 0 0 0 Load 0 1 1 Mode 0 1 Count 0 Count Data Count + 1 (binary) Count + 1 (decade)
and this is the VHDL entity declaration: library IEEE; use IEEE.Std_logic_1164.all; entity COUNTER is port (Clock : in Reset : in Enable: in Load Mode Data : in : in : in Std_logic;
Count : out Std_logic_vector(7 downto 0)); end; So, how do you apply top-down design principles, a knowledge of synthesizable VHDL constructs and good coding finesse to this problem? Let's have a look... It is important to understand that conceptually, a counter is a register with the output fed back to the input via an incrementer. Hence the VHDL code will reflect this concept. For example,
-- inside a process Q <= Q + 1; The design challenge introduces some key VHDL coding aspects that need to be borne in mind for synthesis. The most fundamental is the use of the classic asynchronous-reset-plus-clock single-process style of VHDL code. process (Clock, Reset) begin if Reset = '0' then -- reset register, Q <= 0 elsif Clock'event and Clock = '1' then -- increment register, Q <= Q + 1; end if; end process; The essence of the code structure is that the clock and reset need to be in the sensitivity list; an appropriate event on either signal will cause one of the two if' branches to execute. Now that we have defined the basic structure of the process, we will go on to fill in the two if' branches. The reset branch is very simple: -- inside a process if Reset = '0' then Q <= "00000000"; -- alternatively, Q <= (others => '0');
The clock branch needs to contain the functionality of the other four truth table entries; the code reflects the priority of those inputs directly. The enable signal has the highest priority, with nothing happening when it is high; next in priority is the load signal. Hence inside the enable block we will have, -- inside a process if Enable = '0' then -- enable counter functionality if Load = '0' then -- load data else -- implement counting based on mode signal end if;
end if; For the actual increment statement (remember, Q <= Q + 1;), it is desirable to combine this functionality for either mode to ensure that only one piece of incrementer hardware is synthesised. So far, the detailed structure of the process has been derived from the truth table in a top-down design manner. Now we need to code the VHDL with a view to its implementation. -- inside a process if Load = '0' then -- load data else if lower_nibble_count /= max_in_either_mode then -- increment lower nibble else -- wrap lower nibble back to zero if upper_nibble_count /= max_in_either_mode then -- increment upper nibble else -- wrap upper nibble back to zero end if; end if; end if; Although we are only providing a structure for the detail of the VHDL code in the above code snippet, it is notable that the word increment' appears only twice and that it applies to nibbles - the code is structured to synthesise two 4-bit incrementers. This is a subtle point that was missed by all who entered the challenge! OK, let's fill out the detail of the code structure presented so far in order to generate a model solution. This is given at then end of the section. Note that the + operator needs to be overloaded because + is not defined in the IEEE std_logic_1164 package for std_logic_vector types. Typically, it is defined in a synthesis vendor's tool-specific package (or you can write your own!). In the model solution, you will see the use clause, use ieee.std_logic_unsigned.all; which makes the overloaded + operator visible to this architecture. A more subtle aspect is the use of the + operator to produce the incrementer hardware. In order to avoid duplication of incrementers, each nibble is regarded separately. Discounting this approach may lead to the creation of duplicate incrementers. For example, -- inside a clocked process if mode = '0' then -- 8-bit binary
Q <= Q + '1'; else if Q(3 downto 0) = "1001" then Q(3 downto 0) <= "0000"; if Q(7 downto 4) /= "1001" then -- increment upper decade -- two decade counters -- wrap lower decade
Q(7 downto 4) <= Q(7 downto 4) + '1'; else Q(7 downto 4) <= "0000"; end if; else -- increment lower decade -- wrap upper decade
Q(3 downto 0) <= Q(3 downto 0) + '1'; end if; end if; This leads to the creation of an 8-bit incrementer for the binary mode of counting, plus one or two individual 4-bit incrementers for the decade count mode. The actual number of incrementers depends on the synthesis tool's ability to share resources. Finally, note that an internal signal Q was used, rather than the port signal count, which, as an output, may not be read in a process. In summary, we have applied top-down design principles to create a synthesizable VHDL architecture containing a single process. The detailed code implementation was produced with the pitfalls of synthesis clearly borne in mind. The rest of this section gives a model solution: -- counter -- 8 bits -- synchronous, positive edge -- binary/decade -- asynchronous reset, active low -- synchronous parallel load, active high -- synchronous enable, active low -- binary counter mode 0 = 8 bits -- decade counter mode 1 = 2x4 bits -- reset has priority over enable over load
entity COUNTER is port (Clock : in Reset : in Enable: in Load Mode Data : in : in : in Std_logic;
architecture Model_Solution of Counter is constant nibble_max constant decade_max : std_logic_vector(3 downto 0) := "1111"; : std_logic_vector(3 downto 0) := "1001";
constant zero_nibble : std_logic_vector(3 downto 0) := "0000"; constant zero_byte signal Q begin process (Clock, Reset) begin if Reset = '0' then Q <= zero_byte; elsif Clock'event and Clock = '1' then : std_logic_vector(7 downto 0) := "00000000"; : Std_logic_vector(7 downto 0);
if Enable = '0' then if Load = '0' then Q <= Data; elsif (Mode = '0' and Q(3 downto 0) /= nibble_max) or (Mode = '1' and Q(3 downto 0) /= decade_max) then Q(3 downto 0) <= Q(3 downto 0) + '1'; else Q(3 downto 0) <= zero_nibble; if (Mode = '0' and Q(7 downto 4) /= nibble_max) or (Mode = '1' and Q(7 downto 4) /= decade_max) then Q(7 downto 4) <= Q(7 downto 4) + '1'; else Q(7 downto 4) <= zero_nibble; end if; end if; end if; end if; end process;
Count <= Q;
end;
OK. Concept. Design for Debug. Yes. This probably means thinking more about the code before we write it, and probably writing more code, too. Is it worth it? Well, we generally find that it takes 4 to 5 times longer for engineers to debug the code they write on our training courses, than it does for them to think about the design and write the code. Yep, the old 80 : 20 rule strikes again. Let's look at one approach to designing easily debuggable code. We'll call the approach destination assignment design (TLA = DAD!). Let's write a testbench for an arbitrary design. It can be any design, but the key issue is that the design has four modes of operation. For each mode we will need to define the stimulus for the design. The source of the stimulus we write is the mode of operation. Thus we can define the stimulus for each mode in a process, as follows... -- inside an architecture -- assume signals are std_logic mode1: process begin -- assignments to input_1, input_2, input_3 end process mode1; mode2: process begin -- assignments to input_1, input_2, input_3 end process mode2; mode3: process begin -- assignments to input_1, input_2, input_3 end process mode3; mode4: process begin -- assignments to input_1, input_2, input_3 end process mode4; DUT: arbitrary port map (in_1 => input_1, in_2 => input_2, in_3 => input_3, out_1 => output_1, out_2 => output_2);
Using this coding style, we will need to ensure that the signal assignments are made in the correct temporal order, too. Let's imagine that during simulation, we notice that the waveform of the signal input_3 is not as we intended. Yes, we need to debug the code. Where do we begin? Any one (or two!) of the four processes could need examination; they might all need looking into. There's worse to come. It is possible there is a clash between the drivers in two of the processes on the signal input_3; this would show up as X's in the waveform viewer. This kind of problem is particularly hard to debug. Oh, of course! When you've found the clash between the mode1 and mode4 processes, you run the simulation again only to discover the clash is now between the mode1 and mode3 processes. Aaaaaarrrgh! It is better to write the code as one process, as follows... all_modes: process begin -- mode variable assigned here case mode is when mode_1 => -- assignments to input_1, input_2, input_3 when mode_2 => -- assignments to input_1, input_2, input_3 when mode_3 => -- assignments to input_1, input_2, input_3 when mode_4 => -- assignments to input_1, input_2, input_3 end case; end process all_modes; In the single process style, we don't have to worry about temporal ordering of signal assignments and there is no possibility of multiple drivers. We can take this approach one stage further. Each stimulus signal and the assignments to that signal are made in independent processes. -- one of four processes sig_3: process (mode) begin case mode is when mode_1 => -- assignment to input_3
when mode_2 => -- assignment to input_3 -- and so on... end case; end process sig_3;