Cmu Ece 1996 012
Cmu Ece 1996 012
Department of Electrical
MELLON
and Computer Engineering~
Behavioral Level HDLTo Layout: Physical Design Path for Synthesis Tools
Pinar
Ceyhan
1996
Advisor: Prof. Thomas
Behavioral Level HDL Layout: Physical Design Path To For Synthesis Tools
Pinar Ceyhan
Departmentof Electrical and ComputerEngineering Carnegie Mellon University Pittsburgh, Pennsylvania 15213
April 1996
Submittedin partial fulfillment of the requirementsfor the degree of Masterof Science in Electrical and Computer Engineering.
Behavioral Level HDL Layout: Physical Design Path To For Synthesis Tools
Pinar Ceyhan
Departmentof Electrical and ComputerEngineering Carnegie Mellon University Pittsburgh, Pennsylvania 15213
May 1996
Submittedin partial fulfillment of the requirementsfor the degree of Masterof Science in Electrical and Computer Engineering.
Acknowledgment
First and foremost I would like to thank myadvisor, DonThomas,for his invaluable guidance throughout this project. I wouldalso like to thank Shawn Blanton for reading this report and for his useful feedback. I would like to thank Herman Schmit and John Hagerman their patience and help in answering for my many questions. I would also like to thank Sari "Scou" Coumeri, Alok "Tub Boy/Snake/ Lokie" Jain, Mark"Lingerie Boy"Mescher, and David Pursley for their patience during mymany anxiety attacks and for helping mestay relatively sane. Last but not least, I wouldlike to thank myparents and mybrother for their constant love and support.
Abstract
Synthesis is the process of transforming a design from an abstract description into a detailed implementation. The synthesis process can be divided into three subprocesses: High-LevelSynthesis, Logic Synthesis, Layout. High-level synthesis is the process of transforminga design from a behavioral description into a register-transfer level description. Logic synthesis is the set of transformationsthat are performed the register-transfer level description to create the gate-level on implementationof the design. Finally, layout is the process of translating the design from logic gates to geometricrepresentation. There are a numberor Computer-Aided Design (CAD) tools to aide the designer in these pr~cesses. System Architects Workbench (SAW),Synopsyss Design Compiler, and Cascade Dcsi~ Automations Epochare three such tools. Eachof these tools were developedt~ s~lvc a part of the synthesis process. SAW developed to perform high-level synthesis. Design was pilers capabilities include logic synthesis and logic optimization. Epoch,on the other hand, can be used for logic synthesis and layout. In this project, weintegrated the three tools into a synthesis path that covers the entire synthesis spectrum. In doing so, wewrote translations to aid the design flow betweenvarious CAD tools.
Contents
1 Introduction 1.1 Motivation ........................................................................................................... 1 and Goals 1.2 Related ....................................................................................................................... Work 2 1.3 Our Approach ....................................................................................................................... 3 2 Design Process 2.1 Levels ofAbstraction ............................................................................................................. 4 2.2 Synthesis .................................................................................................................... Tools 6 2.2.1 SAW ....................................................................................................................... 6 2.2.2 Synopsyss Compiler Design ................................................................................. 8 2.2.3 Cascade Automations................................................................... Design Epoch 9 3 Our Implementation 1 3.1 Path SAW-Design 1: Compiler-Epoch ............................................................................... 12 3.2 Path SAW-Epoch 2: ........................................................................................................... 16 4 Examples and Results 4.1 Example In Detail: Greatest Divisor Common ............................................................... I 4.2 Other ................................................................................................................ Examples 4.2.1 Trigonometric Sine Function .................................................................. (Trig) 20 4.2.2 Handshaking (Hand) Problem ............................................................................. 24 4.2.3 Fifth-Order Wave (Flit) Elliptic Filter ................................................................. 24 4.2.4 Discrete Transform Cosine .................................................................................. 25 4.3 Summary ............................................................................................................................ 26 5 Conclusions and Future Work 28 6 Appendix A 30 7 Appendix B 34 8 Bibliography 35
Chapter 1 Introduction
1.1 Motivation and Goals
The design process for integrated circuits (ICs) has been moving towardshigher levels of abstrac~ tion. As the levels get higher, the processof translating a circuit fromthe initial descripti~m lht ~t~ final layout becomesmore complex. Fortunately, automatedcomputer-aideddesign ((AI) ttt ease the process for the designer. Thesetools performin fraction of the time the tasks thai would otherwise be done by the designer by hand. Synthesis is the process of transforminga design from an abstract description into a detailed hardware implementation.The process comprises of transforming and adding detail to the design from higher to the lower levels. Adesign can be representedin different levels of abstraction, Figure 1. HighLevel System/Behavioral Level
Register-Transfer/Structural Level
Gate Level
Thehighest level describes the basic function of the desired design. At the next level, the registertransfer description comprisesof two parts: the controller and the datapath. The formeris a finite state machine that sets the control lines that activate and deactivate the datapath modules,which consist of functional units, registers, multiplexers, I/O units, and buses. At the next level, the design is mapped onto logic gates of the specified technologylibrary. Finally, these gates are represented at the physical level by their transistor models, whichcan then be placed, routed, and fabricated onto the chip[10][21][8]. There are many different CAD tools that are aimedat performingdifferent tasks ~f the design princess. SAW [21], Synopsyss Design Compiler [20], and Cascade Design Automations Epoch are three such synthesis tools. Eachwas developedwith the aim of solving a different problem the design process. SystemArchitects Workbench (SAW),developed at Carnegie Mellon University, performs high-level synthesis. DesignCompilerperformslogic synthesis and logic optimization. Epoch,on the other hand, performssynthesis as well as layout operations such as placement, routing, and buffer sizing. In this project, we integrated these three tools into a synthesis path that covers the levels of abstraction. We used SAW high-level synthesis, Design Compilerfor logic synthesis and optifor mization, and Epochfor layout. Wedevelopedtranslations of the design descriptions betweenthe different systems. Wealso explored another synthesis path by using Epochlbr logic synthesis and layout.
1.2 Related
Work
Synthesis has been a topic of research for the past numberof years. Researchhas been performed on different aspects of the synthesis process. Researchin the area of high-level synthesis has been done on the algorithms for the scheduling and resource binding problemsis in order to improvethe final design for different performance objectives. Heijligers et. al. [13] propose to use Genetic Algorithms to solve the scheduling and allocation problem, whereas Potkonjiak and Wolf [18] integrate techniques from operating systemsand high-level synthesis to lead to efficient resource sharing and scheduling. Another approach to improving the scheduling problem is proposed by Achatz [1]. Heproposes building several schedules based uponthe already existing one and thus
letting the allocation and binding phases of the process choosethe best fitting schedule. Thescheduling problemhas also even been considered for low power. Mussolland Corladclla I1 (~1 ci~nsidc1 powerconsumption issues at the scheduling and resource binding stages of the design princess. Research has also been done at the other phases of the synthesis process. Hartenstein and Kress [12] consider the synthesis of the datapath modulesfor mappingonto a reconfigurable datapath architecture. Alidina et. al. [3] workto lower the powerconsumption precomputingthe output by logic values of the circuit to reduce the switchingactivity in the circuit. Byreusing already available designs or design modules, Girczyc and Carlson [11] propose a methodto eliminate the cost overheadgenerated during optimization and to increase the integration of the design so that more can be packedonto a single chip. Research is also being conducted on behavioral synthesis for mixed hardware/software systems [2] and for multichip modules [24], as well as array mapping within synthesis [191.
1.3
Our Approach
The goal of this project is to present a path through the synthesis process for high-level systems. The tools that are used in this path are SAW, DesignCompiler,and Epoch.In order to enable interaction betweenthe tools, we explored translation methodsbetweenthe systems. Chapter2 of this report describes the synthesis process and the three synthesis tools in moredetail. Chapter 3 describes our approachto finding these paths and the translations betweenthe systems. Chapter 4 analyzes the different examples tested through this environment. Finally, Chapter 5 offers conclusions and future work.
(a) System/Behavioral Level (e) High-LevelSynthesis (b) Register-Transfer/ 51ructural Level (f) Logic Synthesis (c) Gate Level
(g) Layout
At the highest level, the design is an algorithm, Figure2(a). Thedescription at this level is similar to a program; is a set of operations, variables, and dependencies. specities the functJ~nt~l tllc it It design without being concernedabout howit is realized. The transformations necessary t~ translate the design fromthe behaviorallevel to the register-transfer level, Figure 2(b), is called HighLevel Synthesis (HLS),Figure 2(e). During high-level synthesis, the operations implementing design are determined. These operations are then mappedonto datapath modules and assigned control steps in the schedulingprocess. Thecontrol states create a finite state machine that implements the operation schedule. Figure 3 illustrates the Greatest Common Divisor (GCD) example
through high-level synthesis. Figure 3(a) is the behavioral description. Duringhigh-level synthesis, >, <=, and - are identified as the operators of the design. These operations are scheduledat states 8, 6, and 7 respectively. Finally, the datapath modules determined be three funct.i~nal are to processors, two comparatorsand one subtracter, Figure 3(b). At the register-transfer level, the designis separated into twoparts: the controller andthe dalapalh. The controller is a finite state machine that operates the datapath modules sending c~ntr~l sigby nals to the datapath modulesaccording to the schedule of operations determined during HLS.The datapath modules compriseof memory units (registers, multiplexers), functional units, and interconnectunits (busses, I/O). Logic Synthesis(LS), Figure 2(f), is the set of transformationsthat performedon the register transfer level description to create the gate-level implementation the of design. Duringlogic synthesis, logic gates are mapped the design to realize the function of the into controller and the datapath modules,Figure 2(c). Optimizationcriteria, such as area, timing, and power,is also carried out during logic synthesis to achieve the best possible implementation the of design at the gate-level. Figure 3(c) shows the gate-level Verilog of the GCD example and
Figure 4 illustrates the gate-level schematic of the GCD example. Finally, a physical representation of the design must be generatedin order Io fabricate lhe dcsi[21~ onto the chip, Figure 2(d). At this level, the design is transformedfrom logic gates t~ layers metal, polysilicon, contacts, etc. during the layout process, Figure 2(g). Thegeometricrepresentation of the design is produced.Partitioning and placementof the cells occur at this level, most often reducing the overall size of the layout. The interconnections betweenthe cells are provided during the routing process.
always begin: god wait (Raset); done = 0; wait (start = 1); x=xi; y=yi; while (x > 0) begin if (x <=y) begin temp = y; y=x; x = temp; end x=x-y; end out = y; done = 1; wait (start = 0); end
fu_pr ocr_leq(x,y,xleqy); fu_pr ocr~tr(x,0,xgtr0); fu_procr_minus(x,y, out); always @(xor y or state) begin case (state) 1: begin case (Reset) 0: state = 1 ; 1: state = 2; endcase end 2: begin case (start) 0: state = 0; 1: state= 3; elldCase end 3: begin x = xi; y = yi; state = 8; end 4: begin out = y; done = 1; state= 5; end 5: begin case (start) O:stateI; = 1: state = 5; endcase end 6: begin case (xleqy) O: state = 7; 1: state = 9; endcase end 7: begin x = out; state = 8; end 8: begin case (xgtrO) 0:state = 4; 1 : state= 6; endstate end 9: begin temp= y; y = x; x = letup; state = 7: end
(a)Behavioral Description
*tam~nv_2x U197 f .IN00.vt,~tct3] m~Im~nv_:Zx U1 ( .IN00at~e[l] ~ ),.Yfn~01) ~td~a3_2x (J]N0(~593), U199 ,IN113~tate[2] ), J/q2~a599), .y(n597) ttdoai211 (JN0(a580), (a.~81), O201 JNI JN2(RS82), .llq3(a583), .Y0x9813] ~doai211 ( ./Nn(a581JNI(e584), U202 ), .IN2(eS~), JN3(a58~, .0x9~[2] attloei211 (JN0(a$81), (n590~. LI2t~ JN1 JN2(~91.IN3(a5929, ), ~td~o21| U2~ ( .IN0(n593), .INl(a594),JN2(d~e), ~13(a595), .Y(~*~II*I02/~_0 ~ta*o222 ( .n,~yl3l ), .~(~3, O2O6 .INJ(~t )..YOyl0gD] [3] ~dao222 ( JN0~[2] U207 ),.~5(e59S), .YtXy~0~[2] ) ~1ao222 ( J/q0(~y[1] U205 ar~5(a59~), .YOy~0e0] ) atdao222 ( JN00y[0] .INI(e596), U209 ), JN2~i[0]). Jl~3(a597), .1N4~[0] ~daei21 (.IN0(a599), U210 JNl(n600), JN2(a~0 .Y0aext_m*te404[3] l), ~/doai21 ( ~10(~tat~[0] (~602), U211 ), .1NI JN2(n603), .y0aext_state404[2] Itd~aad2 (JJq0(a6~), (n605), U212 JNI .Y0aext_~te404[1] Itdoai211U213 ~10(W~e[0] ANI(a6~0), ( ), JN2(n60~, .IN3(n603), .Y0aext_*late40,t [0]) ); sta~toi211 ( .l~l~~a~[3] JNl(n607), U214 ), J/q2(n~08)..1/OQ~.,Me[4] .YQa405[0] I~d~oi211 (.[N0(Re~et), ) ); U216 .INI(e599), .IN2(n611), J~0mteIt] .Y(a6~0) ), ~daed2 (JN~a601), U218 stdof313219 ( JN00state ), .INI 0~te[3] J]q2(a601), [4] ), .Y(e593) ~tdac~3 I)220(.IN0(n614). [4]), .IN2(~99), JN 10state .(n613) Mdaaad3 (3N0(n601), U221 $Nl(n611), I;UIII~3 (.IN.n599), (~tate U222 .INI ~la~3 13223 (.1N~599), fn593), JNI adnand3 (JN0(a$99),1 (n~02), U224 .IN .IN20stat~[2] ), .Y(n616) sldllet2U225 (J/~n618), JNl(~yl}, .Y(~617) stc~t32 ( JN0(n61~, U226 JNI(n617). .Y(n~05) ~tdand213227 (JN0(n605), .INl(n619), stdnaad2 ( dN0(a6l ~11 U2211 1), (~99), .y(1~94) ~aad213229 ( JN0(n617), ~tate[2] ,Y(m300) JN I ), ttdmux2 ( JN0(out U231 [3])..IN (~y[3]), 1 .~0(a595), 18[3] .Y0outl
versity [21]. SAW developed to transform behavioral descriptions into register-transfer was
Figure 4. Gate-LevelImplementation the GCD of Example descriptions. Thus,it is a purely high-level synthesis tool, Figure5. System/Be?vioral Level
SAW
RT/Struc~al Level
GateLevel
SAW two internal paths that can be used to synthesize the design. The firsl is the CSTEP/ has EMUCS route. This is a two-step synthesis route. During CSTEP, control steps arc assigned different operators that have beenallocated to implement function of the design. Thect~nlr~l the is created here. Using a list scheduling algorithm, CSTEP creates a control flow graph that sents the design, whereeach state of the graph correspondsto a control step. EMUCS a datapath is synthesis tool. Given the flow graph that was created during CSTEP, EMUCS determines, allocates, and binds every necessary datapath module.
Constraints can be imposed on the design during both stages of CSTEP/EMUCS. During CSTEP, timing constraints can first be imposed to set maximum minimum and times between operations. Hardware constraints are also set during CSTEP avoid any hardware scheduling conflicts. to EMUCS cost tables to set its constraints. Thefirst cost table lists the cost of allocating new uses registers, wires, functional units, etc. Thesecondcost table is called the Add-Function Tablc. This table states the cost for groupinga certain function with another function. These cosl lablcs used in determiningthe least costly implementation the design [211122l. of SAWs second internal path is the Scheduling, Allocation, and Mapping (SAM) option. SAM performs both of the individual goals of CSTEP EMUCS one single tool. SAM and in performs the scheduling, allocation, and mapping tasks in parallel It considers every possible option for the three tasks in every cycle, thus allowing the algorithm to consider all consequences any action of immediately[7]. CSTEP/EMUCSthe original path that was developed in SAW is therefore a more estabis and lished route than SAM. Thus, we will use CSTEP/EMUCS in this project. Every reference to SAW in the rest of this report is for the CSTEP/EMUCS of SAW. path
2.2.2
SynopsyssDesignCompileris a widely used commercial logic synthesis tool 1201. I1 is desigtaed to synthesize a design fromthe register-transfer level to the gate level, Figure 6. Duringthis process, it also performslogic optimization, makingsure that the transformed design meets certain performancespecifications, such as area, timing, and power. Design Compilerperforms this optimizationso that the end result is the best possible circuit that can be realized with the desired technologylibrary and that meets the functional and performance requirements of the target design. In optimizing the design, Design Compiler takes into consideration two types of constraints: design rule constraints and optimization constraints. Thedesign rule constraints are those that are set automatically whenthe target technology library is chosen. Constraints such as maximum fanout or maximum transition fall under this category. Performanceconstraints such as area and speed are specified by the user and are thus considered after the design rule constraints. Design
System/Behavioral Level
RT/Structural Level
DESIGN COMPILER
Gate Level
Transistoi/Physical Level Figure6. DesignCompiler the Design in Compiler allows the user to constrain all synchronous paths in the design by specifying lhc ch~ck for the system. Design Compilerassigns costs to these constraints and optimizes the design such that the final designis that with the mi~timalcost functions, both for the designrule constrainls and for the user specified constraints [ 15][20].
2.2.3
Cascade Design AutomationsEpochtool is designed to perform synthesis as well as layout [6]. Givena very structured register-transfer level description, Epochhas the capability to perform logic synthesis. Epochalso performs automatic layout operations such as placement, routing, and buffer railsizing, Figure7. Theinput for Epoch be one of two possible formats. It is required to be either a register-transmay fer level description or a gate-level implementation.In performing the synthesis process. Epoch flattens the design hierarchy unless otherwise specified as a fixed block. After flatlcning is complete, Epochtakes the design and partitions it into two newgroups, the datapath and the chip c~rc. Finally, it buildsleaf cells. The implementation the datapath modules be realized in three different ways. First, all the of can modules can be implementedas standard cells only. Or, they can be implemented as alreadyblocked EpochDatapath library modules. Finally, they can be implementedas a combination of the two previous methods.The goal of the tool is to find a goodbalance with both types of imple-
RT/Structural L~vel
Epoch
Gate Level Trans~~stor/Physi zal Level Figure7. Epoch the DesignProcess in mentations, as using only standard cells will lead to too many blocks in the design and using only Epochsready moduleswill lead to too few blocks. Epochallows the user to either automatically place and route the design or perform these tasks manually. The automatic compilation option deals with the design from the lowes! level of the hierarchy to the current level. For placement, the user has the option to ch~ose from differcnl methodsof placement. Oneoption is to carry out the task based on area efficiency and minimization of wiring. Another option also considers these options as well as optimizing for timing requirements. A third option is called the Incremental Placement.Here, if there is a changein the netlist, the existing placement is preserved as muchas possible while redoing the placement.
Finally, the last possible wayof performing placement is by optimizing the datapath groups for such constraints as routing and track usage. Epochalso comeswith an in-depth set of library parts that can be utilized by the user instead of redesigning these parts. These library parts range from simple multiplexers to ALUs RAMs. to All of these library parts also comewith functional Verilog descriptions and can be used to simulate the designafter mapping these library parts[6]. in
10
Algorithmic~ii Behavioral
::~;
Design Compiler
RT Structurai
Key:
SAW Design Epoch ~
( Epoch ~,~
Figure8. Possible SynthesisPaths necessarily cover the entire spectrum. As seen in Figure 8(a), the tools used in this project, SAW, Design Compiler, and Epoch, each cover a certain range of the synthesis spectrum. Epoch also holds the added advantageof covering multiple levels of the synthesis process. In this project, we consider two possible paths that cover the entire spectrum. Figure 8(b) illustrates these two paths. In the first path, all three tools are used. SAW used for high-level syntheis
11
sis,
Design Compiler is used for logic synthesis, Epoch for logic synthesis instead
we utilize
of Design Compiler.
flowgraph,
clarifying
only a portion of the spectrum are subsets of the "complete" paths, that cover the entire scope, from a behavioral description to layout.
we will describe
Gate Level
Register-Transfer
T Conroller
Con~oller
12
In performing high-level synthesis through SAW, avoid any extra transformations and perform we only the necessarysteps in carrying out the synthesis. We resource constraints of 4 add operaset tors, 4 subtract operators, 2 multiply operators, and 1 divide operator for schedulingof the control states, Oneof our goals in using this design path is to be able to use Epochslibrary parts in our datapath. In order to properly mapthese library parts to the datapath modules,wespecify an addfunction table that prevents morethan one function to be groupedtogether. The synthesized design is a single Verilog file that contains the code for all the modulesin the design: the root module,the controller module,the top level datapath module,and all t~l lhc vidual datapath modules. Before running the design through Design Compiler. however. must be madeto the Verilog code. First, the header, the root module,the controller, and lhc level datapath module separated into individual files. The controller module then rewritten in are is a format acceptable by Design Compiler. The current SAW output of the controller consists of two alwaysstatements. Thefirst statement disables the finite state machine whenthe Reset line is set. Theother alwaysstatementis the finite state machine that makesup the controller. It is clockedat each state of the case statement and the current state register is simplyrewritten. In order for the code to be DesignCompiler compatible, the first always statement is rewritten so that it is clocked by the positive edge of the Clock and the disabling edge of the Reset lines. Within the alwaysloop, it consists of an if-else statement.If the Resetline is activated, the state register initialized. Otherwise, the state register is updatedto the next state. The secondalways statement is again an FSM. However, this alwaysstatements sensitivity list consists of the current state register and the ccr~ taln select lines that are accessedwithin the finite state machine. Eachstate of the FSM ct~llSisls t~l the next state register being updated. Onehas to be careful in writing Verilog code for Design Compiler include every possible scenario in a case statement or in if-else statements or all necand essary values in sensitivity lists. If something omitted, DesignCompiler is will automaticallyinfer a latch in the design and this mayaffect the simulationof the design. Thus,a default state updating the next state register to the initial state is added.Figure 10 shows "before"and "after" controlthe ler descriptions [23].
13
always begin :MAIN wait(Reset); @(posedge Clock) current._state = 65537; stack_pointer = 0; forever begin case (current_state) 65537: begin always @(current_state or state131073 select or @(posedgeClock) state393217_selector state458~-53_select or current_state = 131073; state917505_select) begin end case (current_state) 131073: begin 65537: next_state = 131073; @(posedge Clock) 131073: begin casex ( state131073_select casex ( statel31073_select lbl: begin lbl: begin current_state = 196609 next_state = 196609; end end lbX: begin default: begin current_state = 262145 next_state = 262145: end end endcase endcase end 196609:next_state = 327681: 196609: begin @(posedge Clock) current_state = 327681; end default: next_state = 65537: endcase end endcase end end (a) Post-SAW Figure 10. The Controller Verilog (b) Pre-Design Compiler
always @(posedgeClock or negedge Reset) begin if (!Reset) begin current_state = 65537: stack_pointer = 0; end else begin current_state = next_state; end end
Oncethe newcontroller code is written, it is run through DesignCompilerfor logic synthesis and optimization. Becausewewill be using Epochfor modulegeneration and layout later in the design path, we specify the target libraries in DesignCompiler be those provided by Epoch. to The next step in the process is to mapthe datapath modulesto Epochs library modules. Weare unable to mapany functional unit with multiple functions directly onto Epoch. Thus, these modules have to be rewritten. The post-SAW format is in terms of assign statements, whereasthe new format consists of a single always statement. This always statement is madeup of a case slalcmenl whereeach state correspondsto a function, Figure 11 There are somefunctions that cannot be realized by a single Epochmoduleat the top level bu~can be realized by multiple modules a lower level. Anexample this is the "greater than or equal at of to" function. Epochhas one module that determines whether two values are equal and another modulethat determines whether one numberis greater than the other. However,there is no single
14
assign#0 outl = ((cttl} --opconcat)? (/* ( op: concat} */(inl,in2)) assign #0 outl = ({ctrl} ==oppadO) (/*t op: padO */{ bO,inl }) : ? ~ assign #0 out1 = (({ctrl} != opconcat) ({ctrl} != oppad0)) bx II ?
reg outl; always @(inlor in2 or ctrl) begin case(ctrl) opconcat: outl = {inl,in2}; oppad0:outl = {d0,inl }; default: outl = 0; endcase end
(a) Post-SAW
module satisfy the ~ function. Thus,in this case, the module rewritten so that it instalatiatcs to is the equality checkerandthe comparator "OR"s results of the two instanliations I~ acqu~rt" and the the desiredresult. Figure12 illustrates this example.
module procr (inl, in2, outl ,ctrl); input [3:0} inI, in).; inputctrl; output outl; reg outl, xgy, xly, geq, eql; equal eq (inl, in2, eql); mcomp (inl, in2, xgy, rdy); gtr
always @(inl or in2 or c~l) begin case(etal) opgeq: outl = geq; default: outl = 0; endcase end endmodule
Anothermoduleconstruct that cannot be mapped directly into the Epochlibrary is the descripti~m for register moduleswherevalues are written to the register. The newconstruct for this case is similar in style to the format that was described earlier for the controller, Figure Ill. Figure 13 illustrates the old and the newVerilog code styles for the register modules.
initial if (!Reset) state_reg= always @(negedge Reset) state_reg = always @(posedge Clock) case (tmp_ctrl) "opfwrite: state_reg=/*{ op: fwrite }*/tmp_data; endcase endmodule always @(posedge Clock or negedge Reset) begin if (!Reset) state_reg = 0; else state_reg = n_state_reg; end always @(tmp_ctrl or tmp_data) begin case (tmp_ctrl) "opfwrite: n_state_reg=/*{op: fwrite }*/trap_data; default: n_state_reg= state_reg; endcase end ~ndmodule
(a) Post-SAW
15
cussed above The datapath modules that can be implemented by the Epoch modulcs arc mapped directly in the top level datapath modulefile. and their mapped Epoch modules. TABLE Frequently Seen Functions and Their Mapped 1. Epoch Modules SAW operations FREAD EQL GTR LSS PLUS MINUS MULT MUX 3-, 4- input only) (2-, NOT OR(2-input) AND (2-input) Finally, Epoch Modules barrelright equal mcomp mcomp alu alu mult mux2, mux3, mux4 inv or2 and2 Design Compiler is used to Table 1 lists some of the frequently seen operations
perform logic synthesis and Epoch is used to acquire the layout of the design. In order to preserve the hierarchy of the design in the layout, the controller classified and the unmappeddatapath modules are these modules when
it flattens the rest of the design. For these modules, the comment //epoch set._attribute FIXEDBLOCK =1
is added to each of the synthesized modules before running Epoch. To the top level datapath module, the comment //epoch pre_compiled <module_name> is added for each module that was declared to be "FIXEDBLOCK 1". We proceed with Epoch = Wethen compile lhe lop
by compiling all of the unmapped datapath modules and the controller. level datapath modulethat instantiates
take place, the register-transfer level Verilog description needsto be translated to a format acceptable by Epoch. However,Epochs Verilog compiler accepts the identical format of Design Compiler. Thus, the translations that need to be done are identical to those discussed in the previous section in preparation for Design Compiler. However,the same comment lines that were added in the previous section after logic synthesis must be added here before synthesis with Epochcan bc performed. //epoch set_attribute FIXEDBLOCK added individualdatapathmodulc.~ =1 to //epoch pre._compiled <module_name> addedto the top level datapathmodule Oncethe translation is complete, the design is compiledto layout through Epoch.Thenext sccti~n will discuss the various examples ran through these paths and their results. we
17
being the smallest and the DCT example being the largest designs within the set. Whilerunning either path, weneed to simulate the resulting design at each level, before and alter the translations have been madein order to check that the function of the design has been preserved. Avery important goal of the design path is to give the best possible implementation the of design without losing its functionality. Thetest vectors used in these simulations were selected specifically for each design. GCD a mathematical is function. Thus, its simulation test vectors is a set of randomlygenerated numbers.For the three following examples,the simulation file sent the design a stimulus and appropriate requests and acknowledgments. the Trig example, the stimuIn lus starts at 0 and increments by 1536 at each iteration, whereasin the handshakingexample,il increments only 6. Tt~e Filter example by different in that the initial stimulusis I andthe rest arc In this example,the state registers are not initialized betweeniterations and thus lead I~ unique results at each iteration for the samestimulus. Finally, the DCT design is tested by putting thr~mgh an actual image.
18
During optimization, Design Compiler performs somebuffer-sizing and uses special gates that have been buffer-sized. This, however, has no effect on the function of the gate. Therefore, for simulation, weremoveall of the buffersizing information and replace those gates by instantiating their basic implementations. However,whenrunning Epoch, we use the buffersized gates in the input. In the following sections, each of the designs that were passed through the synthesis paths will be described.
The smallest design among the examples is the Greatest Common Divisor (GCD). The GCD exampleis a simple algorithm that finds the largest numberthat divides the two given integers evenly. The algorithm repetitively computesthe difference betweenthe two integers, replacing the larger integer with this difference at each iteration, until the difference of the two integers is no longer greater than zero. At this point, the other number returned as the greatest common is divisor. The main behavioral loop of the GCD examplewas seen in Figure 3 on page 6. The GCD example was synthesized into a register-transfer level description with SAW. VetThe
flog codefor the controller andthe datapath is illustrated in Figure14. Thereare 14 functional processors, 3 multiplexers, 3 inputs, 2 outputs, and 1 register modulein the datapath. Bcli)rc continuing with the synthesis process through Design Compiler, transformations arc madett~ design. The 14 functional processors are mapped 7 barrelshitl units, 1 ALU, magnitude to I parator, 2 equality checkers, and 1 inverter from the Epochlibraries, Figure 15. The unmapped datapath modules are transformed into synthesizable Verilog and synthesized to gate-level using Design Compilerfor path 1 or to layout using Epochfor path 2. The gate-level implementation the controller for path 1 can be seen in Figure 16. of
(a) Datapath
262145
131073
196609 65537
589825
851969
(EQ 1)
20
Figure 16. Gate-Level Implementation of GCD Controller The behavioral description of this equation consists of a single while statement that computeseach of the quotients and adds them successively, Figure 17. Since the original expansi~)n (EQ.
21
alwaysbegin: trig /*{RESET}*/wait(!Reset): InSync = 1: qx = InVal; #100 wait(l);//tnext InSync = 0; qterm = qx; qtsin = qx; qi= 1; qtemp = qx * qx; qx2 = qtemp[29:14]; while (qterm) begi~ qi = qi + 2; qtemp = qterm * qx2; qtop = qtemp[29:14]; qbot = qi * (qi-l); qterm= qtop / qbot; if (qi[1]) qtsin = qtsin - qterm; else qtsin = qtsin + qterm; end OutVal= qtsin; OutSync = 1; #100 wait(l);//tnext OutSync = 0; end
Figure17. Algorithm Series Expansion for Sin(x) Function implies that there is no upper limit on the summation, limit is imposed stop the compulation a to whenthe summationis longer changing "significantly".
i
as the controlling value, i.e., if the term x is smaller than the error value, this term is considered ~ insignificant to the summation the computationis terminated. and This examplewas also successful whenrun through Pathl. Dueto the nature of Eq. 1, this design requires a functional unit that performsthe divide operation. However, there is no such unit that can be mapped through the Epochlibrary parts. Additionally, Design Compilerdoes not allow the division operation performedon variables, only static numbers.In order to maintain the functionality of this design, wewrotea behavioral description for a dividing urdt that is based on subtraction, similar to the GCD example,Figure 18. The divide function waswritten in order to simulate the design correctly. Becausethis modulewas simply instantiated from the functional unit, Design Compilerwas able to synthesize the l~nctional unit by classifying the divide unit as an "unresolvedreference" and instantiating it fromthe gate-level description. However,in order to create the layout of the completesystem, the divide
22
module div(xi,yi,out); II class: BEHAVIOR input[ 15:0]xi, yi; output[15:0] out; reg[15:0] x, y, temp, out,count; always @(xi or yi) begin
X = xi;
y= yi;
count = 0;
while (x > 0) begin if(x < y) begin x=0; end else if (y == 0) begin
count = 0; end else begin x=x-y; if (x >=0) count = count + 1; end end out = count; end //always endmodule Figure 18. The Division Problem
function needs to be synthesizable. The above description cannot be directly synthesized through DesignCompiler Epoch.Thus, wewrote it in the register-transfer level before synthesizi ng it. nor Oneproblemthat commonly occurs in synthesis is the treatment of delays in the original dc,~cript.ion. Most synthesis tools, including SAW, Design Compiler, and Epoch, ignore these delays and
consequently the behavior of the synthesized modelchanges due to the missing delays. The Trig example includes two places wheredelays are essential for it to simulate correctly. In order to indicate the start and end of the computation, a pulse is created. When this pulse is removed, the design will iterate indefinitely. Thus, it is essential to describe this event with a delay statement. However, during the synthesis process, it is lost. BecauseTrig is a relatively small example,we were able to locate the multiplexer that controls these pulses and add the appropriate delay, restoring the original behavior of the design. We were then able to proceed with the synthesis and layout processes for both paths from the altered RTL description.
23
4.2.2
meansof control betweenthe two functions. Figure 19 illustrates this algorithm. We note that the
always begin: hand /*{reset}*/wait (! Reset); forever begin wait( inReq); inl = inVal; /*{lpg:in}*/wait( 1 ); inAck= 1; wait( ! inReq); inAck= 0 ; gol = 1; in2 = h60 - inl; go2= 1; wait (! rdyl); gol = 0; wait (! rdy2); go2 = 0; wait (rdyl); wait (rdy2); outVal = outl + out2 ; /*{lpg:out}*/wait( 1 ); outReq 1 ; = wait( outAck); outReq = 0; wait( ! outAck ) end end Figure 19. Handshaking Example
structure of this example the algorithmic level consists mainly of wait statements and register at value assignments. These assignments and statements enact the sending and receipt of acknowledgmentsand requests that comprise the handshakingprocedure. Like the two previous examples, this design was easily translated for, synthesized through, and simulatedfor both paths.
24
forever begin wait( inreq); iol = invai; /*{lpg:in}*/wait( inack = 1; wait( ! inreq ); inack = 0 ; op3 = iol + sv2; op32 = sv33 + sv39; op12 = op3 svl3; op20 = op12 + sv26; op25 = op20 + op32; mult(op21,el, op25); mult(op24,c2, op25); op19 = op12 + op21; op27 = op2a + op32; opll = op12~- op19; op22 -- op19+ op25; op29 = op27 + op32; mult(op9,c3, opl 1); sv26 = op22 + op27; mult(op30,c4, op29); op8 = op3 + op9; op31 = op30 + sv39; op7 = op3 + opS; oplO = op8 + op19; op28= op27 + op31 ; op41 = op31 + sv39; mult(op6, c5, opT); op15 =opl0 +svlS; op35 = sv38 + op28; mult(io43, c6, op41); op4 = iol + op6; mult(op16, c7, oplS); mult(op36,c8, op35); sv39 --- op31+ io43; sv2 = op4 + opS; svl8 ~ op16+ svlS; sv38 = sv38 + op36; svl3 = op15+ svlS; sv33 = sv38 + op35; outval = io43 ; /*{lpg:out}*/wait( 1 ); outreq = 1; wait( outack); outreq= O; wait( ! outack ) end end g FILT
1 );
Figure20. Main Functionof the Filter Example TheSAW-synthesized outputof this designis not simulatable.However, order tt~ verify that the in translationsare valid for the larger designs, this example put through designpaths. Theparsis the ing of this example into both DesignCompiler Epoch,for paths 1 and2 respectively, wassucand cessful, as wasthe logic synthesis through DesignCompiler Path1. However, both cases, the in in design wastoo big for Epoch it wasunablefinish the compilation the design into its layout and of after overnight simulations.
4.2.4
The Discrete Cosine Transform (DCT) algorithm is a transformationfrom the spatial domain the frequencydomain.This particular DCT exampleis the first part of the JPEG (Joint PhotographicExpertsGroup) function. In the spatial domain, "image" dividedinto 64 pixels, each the is consisting of 8 bits/pixel. In order to transformthis 2-dimensionalimageinto the frequency
2.5
domain,the process is separated into two 1-dimensionalDCT processes, one for the rows of pixels and the other for the columns. Within each 1-dimensional DCT function, the values representing the imageare read and transformed through a series of adds, subtracts, and most importantly multiplies by cosine fractional coefficients. Because the two 1-dimensional DCT operations are so similar, weare only concernedwith one of these operations. Like the Filter of the previous example,this design also ceases to be simulatable once synthesized with SAW. Becauseof the large size and complexityof this design, it is nearly impossible to pinpoint the source of the simulation problemand alter the description to "fix it", as wasdone in the Trig example Section4.2.1. Once in again, in order to verify the validity of the translations and the design paths, we transform and synthesize the SAW-output description through both paths. Once again, the description was parsed into Design Compiler and Epochsuccessfully, verilying the compatibility of the translations for both tools. However,once again, the designs were too large for the machine running the tools. Similar to the previous example,wealso wrote a register-transfer level description to pass through Design Compiler and Epoch. The RT-level description was successfully synthesized through Design Compiler, but once again was too large for to successfully finish through Epoch.
4.3
Summary
In order to verify our translations and the design paths, wesynthesized five designs of various sizes, constructs, and complexity through these paths. Westopped and simulated each design before and after every translation was performedto verify that the newdescription preserved the function of the design. The smaller examplesremainedcorrectly simulatable throughout the process. However.the larger examples stopped simulating properly after the high-level synthesis process. SAW knownt~ is contain someminor bugs that were not resolved during its development. Someof these bugs lead to bit mismatches,port size mismatches,and incorrect assignmentsof registers. Anotherproperty of SAW maybe the cause of the simulation problemsis the cyclic nature of the register-transthat fer level description. This can affect the timing of the datapath modules maylead to early or and
26
late assignments register values. Finally, as discussedin Section 4.2, any delay statementsin the of behavioral codeis ignored by the synthesis tool and mayalso lead to incorrect simulation.
27
Wehave explored and presented possible design synthesis paths for high-level systems. Weused System Architects Workbench,Synopsyss Design Compiler, and Cascade Design Automations Epochtools in these paths. We explored a path that used all three tools, SAW high-level synfor thesis, Design Compilerfor logic synthesis, and Epochfor layout. Wealso explored an alternate path that used only SAW Epoch. The smaller examples run through these tools showedthat and these paths are effective in the designprocess. There are different ways in whichthe design process utilizing these paths can be extended. This project dealt with the actual presentation of these paths. Anextension of this project is to compare the resulting designs of each path in terms of performance criteria, such as area, timing, power,etc. In the previous section, webriefly discussed various reasons whysomeof the designs simulaled incorrectly after high-level synthesis with SAW. of these reasons was that the synthesis One ignore delay statements in the original description of the design. Other reasons were bil matches, port mismatches,incorrect assignments, and the cyclic nature of the synthesized description. Research can be done to identify and solve these and other problemswith SAW that lead to the incorrect simulation. In this project, weused library parts that were madeavailable from Epoch.Researchis currently underway CarnegieMellonUniversity in creating application specific libraries for DesignCornat
28
piler.
The timing,
area, power, etc. specifications wouldbe specific to the project for which they arc crcatcd. Another extension of this project is to automate these paths. Finally, these paths can be utili/c0 an actual design project.
29
Appendix A
SAW-Design Compiler- Epoch Synthesis Path
Tutorial
Purpose: The SAW-Design Compiler-Epoch Synthesis Path is a design flow that covers the entire synthesis spectrum by utilizing SAW high-level synthesis, DesignCompilerfor logic synthesis, and Epochfor layout. It is a for multi-part systemthat takes a design through these levels. Syntax:
30
Compilation Information: The programs saw2syn and syn2csd are shell script sary changes can be made directly to the files.
Any
Saw2syn calls 3 programs: verivt, saw, and rtparse, The source files for these programs are found under the directory ~/SAW/src/. Every directory under -/SAW/src/has an individual Makefile. Therefore, files that were additionally written needs to be linked during compilation by including it in the Makefile in the directory to which the file was added. To compile, you need to go to the top level directory, -/SAW/src, and enter "make <program>"at the prompt, i.e. ~nake rtparse" to compile RTPARSE. executable files will be The stored in the directory -/SAW/PATH/bin/. Syn2csd uses 2 programs: preesd and precsdseq. The source files for these two programs can be found in the -/S AW/src/ directory. They are individually compiled using the GNU- compiler, gcc~ C
Set-up
Your .cshrc file should contain the following lines: setenv CASCADE/afs/ece/usr/cascade setenv SYNOPSYS/afs/ece/usr/synopsys setenv DISPLAY <your-machine-name>:0.0 Your Andrew.cshrc file should contain the following lines: setenv setenv setenv setenv CDA_LICENSE_FILE/afs/ece/usr/cascade/licenses/license.dat CASCADE/afs/ece/usr/cascade DISPLAY <your-machine-name>.ece:0.0 PATH:${PATH}:${CASCADE}/bin
Create the following directories: <design> <design>lverilog <desi gn>lverilogl~b_parts.dir Copy the following file named .synopsys_dc.setup lib_parts.dir directories: into the <design>/verilog and the <design>/verilog/
designer = "<your name>"; company = "CEDA ECE"; at SYNOPSYS get_unix_variable = ("SYNOPSYS"); CASCADE get_unix_variable = ("CASCADE"); RULESET= CDA.7u3m lp LIBS_DIR = CASCADE "/tech/cmosf + RULESET SYMBOL_LIB CASCADE "/data/synopsys/epoch_std.sdb" = + STDCELL_LIB RULESET " std cmos2.db" = + DPCELL_LIB RULESET+ "_dp_cmos2.db" = search_path = {., LIBS_DIR } link_library = { STDCELL_LIB, DPCELL_LIB } target_library = {STDCELL_LIB } symbol_library = {SYMBOL_LIB } Copy the behavioral description into the <design>/verilog directory under the <design>. v.behav extension. Go to the <design>/verilog directory to begin the synthesis process.
31
%cd .. % syn2csd Log on to a SUNSPARC (hendrix.ece) % cd <design> % epoch Project -> Project -> New enter the path for the design and an 3-character identification Project -> Ruleset from the list, choose Ruleset 7u3mlp for every gate-level file, do the following: Input -> Verilog Compile select the appropriateverilog file select the SynopsysBuffer Sizes option select Run close all information windows Input->Netlist Input select the current module select Run close all information windows Physical Design -> AutomaticCompile select the current module select Run close all information windows Physical Design -> Open select the current design select Run
activate Epoch
all options, except "M"whichstands fl~r "Manual",should be green, signifying that the partitioning, placement,routing, zmd
32
buffer sizing during compilation was successful Oncedonefor every file in the lib_parts.dir directory and the sequencermodule,repeat the abovesteps for the modulethat instantiates the datapath and the controller Simulation -> Simulation Output -> Verilog to save a Verilog description of the complete design Output -> CIF to save the layout in c/f format
32,
Appendix B
FILE ORGANIZATION DURING SYNTHESIS FLOW FOR GCD EXAMPLE
gcd.vtr gcd.all
ged.cfg
gcd.cs gcd.vsl gcd.v.str RTPARSE: ged.v compute.v compute_sequencer_syn.v lib_parts.dir/compute_fu_procr_ 1 _syn.v lib_parts.dir/compute_fu_procr_3_syn.v lib_parts.dir/compute_p_output_l_syn.v lib_pa~s.dir/compute_p_outpu t_2_syn.v lib_parts.dir/compute_regr_ .v 1 _syn lib_parts.dir/compute_regr_2_syn.v dc_shell Files that are additionally created: compute_sequencer.v lib_parts.dir/compute_fu_procr_l.v lib_parts.dir/compute_fu_procr_3.v lib_parts.dir/compute_p_output_l ov lib_parts.dir/compute_p_output_2.v lib_parts.dir/compute_regr_l.v lib_parts.dir/compute_regr_2.v syn2csd In directorylib__parts.dir, everyfile withthe extension _syn.vis moved a directorysyn_dir.Therest of the to files remain where are. they top level module instantiates all datapathmodules controller unmapped functional processor unmapped functional processor outputregister outputregister internalregister internalregister
34
Bibliography
[1] H. Achatz. "Generating Several Solutions for the Scheduling Problem in High-Level Synthesis," EuroDA Proceedings, 1995. C [2] J.K. Adamsand D.E. Thomas. "Multiple-Process Behavioral Synthesis for Mixed Hardware/Software Systems." Sequential Logic Optimi[3] M. Alidina, J. Monterio, and S. Devadas. "Precomputation-Based zation for LowPower," Proceedings of the International Design, 1994. [4] M. Bombana, P. Cavalloro, S. Conigliaro, R.B. Hughes, G. Musgravc. and G Zaza. Conference on Compuler-Aidcd
"Design-Plowand Synthesis for ASICs: a case study," 32nd ACM/IEEE Design Ataomatio~ ConferenceProceedings, 1995, pp. 292-297~ [5] M.R. Burich. "Design of ModuleGenerators and Silicon Compilers," Silicon Compilation, Reading: Addison-Wesley Publishing: 1988, pp. 49- 94. [6] Cascade Design Automation. Epoch Users Manual, 1994. [7] R.J. Cloutier. "Synthesis of Pipelined Instruction Set Processors," Ph.D. Thesis, Carnegie MellonUniversity, January 1993. [8] G. De Micheli. Synthesis and Optimization of Digital Circuits, NewYork: McGraw-Hill Inc., 1994. [9] R.G. Dromey.How Solve It By Computer,Prentice-Hall, 1982, pp.62. To [10] D.D. Gajski. "Essential Issues and Possible Solutions in High-LevelSynthesis" Higt~-Lev~l VLSI Synthesis, Boston: KluwerAcademic Publishers, 1991. [11] E. Girczyc and S. Carlson. "Increasing Design Quality and Engineering Productivity Through Design Reuse," 30th ACM/IEEE Design Automation Conference Proceedings, 1993, pp. 48-53. [12] R.W.Hartenstein and R. Kress. "A Datapath Synthesis Systemfor the ReconfigurableDatapath Architecture," ASP-DAC Proceedings, 1995. 35
[13] M.J.M. Heijligers, L.J.M. Cluitmans, and J.A.G. Jess. "High-Level Synthesis Scheduling and Allocation Using Genetic Algorithms," ASP-DA C Proceedings, 1995. [14] D. Knapp, T. Ly, D. MacMillen, and R. Miller. "Behavioral Synthesis Methodology HDL-BasedSpecification and Validation," 32nd ACM/IEEE Design Ataomation
ence Proceedings, 1995, pp. 286-291. [15] Po Kurup and T. Abbasi. Logic Synthesis Using Synopsys, Boston: Kluwer AcademicPublishers, 1995. [16] E. Musoll and J. Cortadella. "Scheduling and Resource Binding for LowPower," Proceedings of the Eighth International Symposium System Synthesis, 1995, pp. 104-109. on [17] Y. Nakamura,K. Oguri, and A. Nagoya. "Synthesis FromPure Behavioral Descriptions," High-Level VLSI Synthesis, Boston: KluwerAcademic Publishers, 1990, pp. 205-229. [18] M. Potkonjiak and W. Wolf. "Cost Optimization in ASIC-implementation Periodic Hardof Real TimeSystems Using Behavioral Synthesis Techniques," Proceedings of International Conference on Computer-Aided Design, 1995, pp. 446-451. in [19] H. Schmit and D.E. Thomas."Array Mapping Behavioral Synthesis," Pro~.eeding.~ ~t Eighth International Symposium System Synthesis, 1995, pp. 90-95. on [20] Synopsys Inc. Synopsys Reference Manual. [21] D.E. Thomas,E.D. Lagnese, R.A. Walker, J.A. Nestor, J.V. Rajan, and R.L. Blackburn. Algorithmic and Register-Transfer Level Synthesis: The System Architects Workbench, Boston: KluwerAcademicPublishers, 1990. [22] D.E. Thomas. The System Architects WorkbenchUsers Guide, Version 2.0 Edition, May 1991. [23] D.E. Thomasand ER. Moorby. The Verilog HardwareDescription Language, Third Edition, Boston: KluwerAcademic Publishers, 1996. [24] H. Wang,N. Dutt, A. Nicolau, and K.S. Siu. "High-Level Synthesis of Scalable Archilectures for IIR Filters Using Multichip Modules," 30th ACM/IEEE Design Ataom, atim ference Proceedings, 1993, pp. 336-342.
36
37