0% found this document useful (0 votes)
50 views109 pages

VDF Project Final

Uploaded by

ankit raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views109 pages

VDF Project Final

Uploaded by

ankit raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 109

Indraprastha Institute of

Information Technology

VLSI Design Flow


PROJECT PART 1

8-bit ALU implementing 16 logic functions


with 4-bit counter to count the number of
operations executed.

Group: 6
MT22189-Varun Singh
MT22184-Anand Dixit
MT22155-Bhanu Teja
Contents
1. Design Specification ............................................................................................... 1
1.1. Problem Statement: .......................................................................................... 1
1.2. Assumptions and modifications:..................................................................... 1
1.3. Inputs ports, output ports and functions: ....................................................... 1
1.4. Description ....................................................................................................... 2
2. Simulation and Code Coverage ............................................................................. 3
2.1. RTL Code ............................................................................................................. 3
2.2.1. Testbench-1 ...................................................................................................... 7
2.2.2. Testbench-2 .................................................................................................... 11
2.2.3. Testbench-3 .................................................................................................... 20
2.3. Analysis of Simulation and Code Coverage: .................................................. 29
2.3.1 Analysis of Testbench1:.............................................................................. 29
2.3.2 Analysis of Testbench2:.............................................................................. 29
2.3.3 Analysis of Testbench3:.............................................................................. 29
3. Design Synthesis and Constraints ...................................................................... 30
3.1. Synthesize for minimum area, keeping timing constraints highly relaxed. . 30
3.2. Synthesize for best timing, keep timing constraints tight. Keep making the
timing constraints tighter unless you observe a negative slack. The timing
analysis should show slight negative slack for this constraint. .......................... 34
3.3. Synthesize for timing constraints that is between (3.1) and (3.2) above. .... 38
4. Formal Equivalence checking .............................................................................. 43
4.1. Equivalence checking for minimum area constraint ..................................... 44
4.2. Equivalence checking for tight timing constraint .......................................... 46
4.3. Equivalence checking for relaxed constraint ................................................. 48
4.4. Bad Netlist for Formal Equivalence error ....................................................... 50
4.4.1 D Flip Flop commented out......................................................................... 50
4.4.2 AND gate changed to OR gate .................................................................... 55
4.4.3. Change of NET Name ................................................................................. 59
4.4.4. Addition of new port ................................................................................... 63
4.4.5. Rename of port ........................................................................................... 67
5. Static timing Analysis ........................................................................................... 71
5.1. STA for minimum area netlist .......................................................................... 72
5.2. STA for tight timing netlist ............................................................................... 74
5.3. STA for relaxed constraint netlist .................................................................... 76
6. Test Insertion (DFT – Design for Testability) ...................................................... 78
6.1. Minimum Area ................................................................................................... 79
6.2. Minimum Timing ............................................................................................... 83
6.3. Relaxed Constraint ........................................................................................... 87
7. Violation Reports..................................................................................................... 98
7.1. Violation Report after STA: .............................................................................. 98
7.2. Violation Report STA post DFT: .................................................................... 101
Appendix .................................................................................................................... 104
1. Design Specification

1.1. Problem Statement:

To design an 8-bit arithmetic and logic unit implementing 16 logic functions with a 4-
bit counter to count the number of operations executed.

1.2. Assumptions and modifications:

In the problem statement we have mentioned a 4-bit counter, but we have used an
8-bit counter to increase the count of number of operations, we have used two 8-bit
output ports so that we store the result of multiplication.

1.3. Inputs ports, output ports and functions:

Input ports:

In1: 8-bit input


In2: 8-bit input
Sel: 4-bit select line
clk: clock
rst: reset

Output ports:

Out1: 8-bit port


Out2: 8-bit port
Count: 8-bit port

Functions implemented:

1. Addition
2. Subtraction
3. Multiplication
4. Division
5. Logical Shift left
6. Logical shift right
7. Rotate left
8. Rotate right
9. Logical AND
10. Logical OR
11. Logical NAND

1|Page
12. Logical NOR
13. Logical XOR
14. Logical XNOR
15. Greater than comparison
16. Equal comparison

1.4. Description

In1, In2 are 8 bit inputs of the ALU, sel is a 4 bit select line, Out1 and Out2 are 8 bit
output ports, count is the counter output. Flip flops are used at the input ports and
the output ports to store the inputs and outputs respectively. For every rising edge of
clock, inputs In1, In2 and the select line are stored in the flip flop, based on the select
line the corresponding functionality is implemented. A counter is used to count the
number of functions executed, it gets updated at positive edge of clock. ”rst” is reset,
which is used to reset the counter and the outputs Out1 and Out2. At every positive
edge of clock cycle(when rst is set low) firstly the inputs are stored in the Flip flops,
select line selects the functionality to be implemented and they take inputs from the
flip flops and at the next positive edge of clock the output gets updated.

2|Page
2. Simulation and Code Coverage

This section describes the RTC code in Verilog, testbench used to test the
functionality and the code coverage report.

2.1. RTL Code

module topmodule (

input[7:0] in1,in2,

input[3:0] Sel,

input clk,

input rst,

output reg[7:0] Out1,count,

output reg[7:0] Out2 );

// in1,in2 8 bit inputs,Sel 4 bit select line, clk is clock,rst is reset

// Out1,Out2 8 bit output, count is 8 bit counter

reg [7:0] A,B;

reg [3:0] sel_f;

always @(posedge clk)

begin

A <= in1; //flip flop used to store in1

B <= in2; //flip flop used to store in2

sel_f <=Sel; //flip flop used to store sel

3|Page
//******counter**********
// to check is reset is low, else to increment the counter
if(rst) begin
//******counter**********
// to check is reset is low, else to increment the counter
if(rst) begin
count <= 8'b0;
Out2 <= 8'b0;
Out1 <= 8'b0;
end
else
count <= count + 8'b00000001;

//********* ALU Implementation *************


case (sel_f)

4'b0000: // Addition
begin
{Out2,Out1} <= A + B;
end

4'b0001: // Subtraction
begin
{Out2,Out1} <= A - B ;
end

4'b0010: // Multiplication
begin
{Out2,Out1}<= A * B;
end

4'b0011: // Division
begin
Out1 <= A/B;
Out2 <=8'b0;
end

4'b0100: // Logical left shift


begin
Out1 <= A<<1;
Out2 <= 8'b0;
end

4|Page
4'b0101: // Logical right shift
begin
Out1 <= A>>1;
Out2 <=8'b0;
end

4'b0110: // Rotate left


begin
Out1 <= {A[6:0],A[7]};
Out2 <=8'b0;
end

4'b0111: // Rotate right


begin
Out1 <= {A[0],A[7:1]};
Out2 <=8'b0;
end

4'b1000: // Logical AND


begin
Out1 <= A & B;
Out2 <=8'b0;
end

4'b1001: // Logical OR
begin
Out1 <= A | B;
Out2 <=8'b0;
end
4'b1010: // Logical XOR
begin
Out1 <= A ^ B;
Out2 <=8'b0;
end

4'b1011: // Logical NOR


begin
Out1 <= ~(A | B);
Out2 <=8'b0;
end

5|Page
4'b1100: // Logical NAND
begin
Out1 <= ~(A & B);
Out2 <=8'b0;
end

4'b1101: // Logical XNOR


begin
Out1 <= ~(A ^ B);
Out2 <=8'b0;
end

4'b1110: // Greater comparison


begin
Out1 <= (A>B)?8'd1:8'd0 ;
Out2 <=8'b0;
end

4'b1111: // Equal comparison


begin
Out1 <= (A==B)?8'd1:8'd0 ;
Out2 <=8'b0;

end
endcase

end
endmodule

6|Page
2.2.1. Testbench-1

module testbench;

reg clk,rst;

reg[3:0] Sel;

reg[7:0] A,B;

wire[7:0] Out1;

wire[7:0] Out2;

wire[7:0] count;

reg[8:0] i;

topmodule t1 (A,B,Sel,clk,rst,Out1,Out2,count);

initial begin

$dumpfile("testbench.vcd");

$dumpvars(0, testbench);

clk = 1'b0;

rst = 1'b1;

#15 rst = 1'b0;

end

always #5 clk = ~clk;

initial begin

//count = 8'd0;

A = 8'd8;

B = 8'd2;

Sel = 4'd0;

7|Page
for (i=0;i<=15;i=i+1)

begin

#10;

Sel = Sel + 4'd1;

end

rst<=1;

#10;

rst<=0;

#10;

for (i=0;i<=255;i=i+1)

begin

#10;

Sel = 4'd0;

end

$finish;

end

endmodule

8|Page
Coverage report for Testbench-1

9|Page
Simulation:

10 | P a g e
2.2.2. Testbench-2

module testbench;

reg clk,rst;

reg[3:0] Sel;

reg[7:0] A,B;

wire[7:0] Out1;

wire[7:0] Out2;

wire[7:0] count;

reg[8:0] i;

topmodule t1 (A,B,Sel,clk,rst,Out1,Out2,count);

initial begin

$dumpfile("testbench.vcd");

$dumpvars(0, testbench);

clk = 1'b0;

rst = 1'b1;

#15 rst = 1'b0;

end

always #5 clk = ~clk;

initial begin

//count = 8'd0;

A = 8'd8;

B = 8'd2;

Sel = 4'd0;

11 | P a g e
for (i=0;i<=15;i=i+1)
begin
#10;
Sel = Sel + 4'd1;
end
rst<=1;
#10;
rst<=0;
#10;

A = 8'd1;
B = 8'd0;
Sel = 4'd0;
#10;

A = 8'd2;
B = 8'd0;
Sel = 4'd0;
#10;
A = 8'd4;
B = 8'd0;
Sel = 4'd0;
#10;

A = 8'd8;
B = 8'd0;
Sel = 4'd0;
#10;

A = 8'd16;
B = 8'd0;
Sel = 4'd0;
#10;

A = 8'd32;
B = 8'd0;
Sel = 4'd0;
#10;

12 | P a g e
A = 8'd64;
B = 8'd0;
Sel = 4'd0;
#10;

A = 8'd128;
B = 8'd0;
Sel = 4'd0;
#10;

A = 8'd0;
B = 8'd0;
Sel = 4'd0;
#10;

B = 8'd1;
A = 8'd0;
Sel = 4'd0;
#10;

B = 8'd2;
A = 8'd0;
Sel = 4'd0;
#10;

B = 8'd4;
A = 8'd0;
Sel = 4'd0;
#10;

B = 8'd8;
A = 8'd0;
Sel = 4'd0;
#10;

B = 8'd16;
A = 8'd0;
Sel = 4'd0;
#10;

13 | P a g e
B = 8'd32;
A = 8'd0;
Sel = 4'd0;
#10;

B = 8'd64;
A = 8'd0;
Sel = 4'd0;
#10;

B = 8'd128;
A = 8'd0;
Sel = 4'd0;
#10;

B = 8'd0;
A = 8'd0;
Sel = 4'd0;
#10;

A = 8'd16;
B = 8'd16;
Sel = 4'd2;
#10;

A = 8'd16;
B = 8'd16;
Sel = 4'd2;
#10;

A = 8'd128;
B = 8'd4;
Sel = 4'd2;
#10;

A = 8'd128;
B = 8'd8;
Sel = 4'd2;
#10;

14 | P a g e
A = 8'd128;
B = 8'd16;
Sel = 4'd2;
#10;

A = 8'd128;
B = 8'd32;
Sel = 4'd2;
#10;

A = 8'd128;
B = 8'd64;
Sel = 4'd2;
#10;

A = 8'd128;
B = 8'd128;
Sel = 4'd2;
#10;

A = 8'd128;
B = 8'd2;
Sel = 4'd2;
#10;

for (i=0;i<=255;i=i+1)

begin

#10;

Sel = 4'd0;

end

$finish;

end

endmodule

15 | P a g e
Coverage report for Testbench-2

16 | P a g e
Simulation

17 | P a g e
18 | P a g e
19 | P a g e
2.2.3. Testbench-3

module testbench;

reg clk,rst;

reg[3:0] Sel;

reg[7:0] A,B;

wire[7:0] Out1;

wire[6:0] Out2;

wire[7:0] count;

reg[8:0] i;

topmodule t1 (A,B,Sel,clk,rst,Out1,Out2,count);

initial begin

$dumpfile("testbench.vcd");

$dumpvars(0, testbench);

clk = 1'b0;

rst = 1'b1;

#15 rst = 1'b0;

end

always #5 clk = ~clk;

initial begin

//count = 8'd0;

A = 8'd8;

B = 8'd2;

Sel = 4'd0;

20 | P a g e
for (i=0;i<=15;i=i+1)
begin
#10;
Sel = Sel + 4'd1;
end
rst<=1;
#10;
rst<=0;
#10;

A = 8'd1;
B = 8'd0;
Sel = 4'd0;
#10;

A = 8'd2;
B = 8'd0;
Sel = 4'd0;
#10;

A = 8'd4;
B = 8'd0;
Sel = 4'd0;
#10;

A = 8'd8;
B = 8'd0;
Sel = 4'd0;
#10;

A = 8'd16;
B = 8'd0;
Sel = 4'd0;
#10;

A = 8'd32;
B = 8'd0;
Sel = 4'd0;
#10;

21 | P a g e
A = 8'd64;
B = 8'd0;
Sel = 4'd0;
#10;

A = 8'd128;
B = 8'd0;
Sel = 4'd0;
#10;

A = 8'd0;
B = 8'd0;
Sel = 4'd0;
#10;

B = 8'd1;
A = 8'd0;
Sel = 4'd0;
#10;

B = 8'd2;
A = 8'd0;
Sel = 4'd0;
#10;

B = 8'd4;
A = 8'd0;
Sel = 4'd0;
#10;

B = 8'd8;
A = 8'd0;
Sel = 4'd0;
#10;

B = 8'd16;
A = 8'd0;
Sel = 4'd0;
#10;

22 | P a g e
B = 8'd32;
A = 8'd0;
Sel = 4'd0;
#10;

B = 8'd64;
A = 8'd0;
Sel = 4'd0;
#10;

B = 8'd128;
A = 8'd0;
Sel = 4'd0;
#10;

B = 8'd0;
A = 8'd0;
Sel = 4'd0;
#10;

A = 8'd16;
B = 8'd16;
Sel = 4'd2;
#10;

A = 8'd16;
B = 8'd16;
Sel = 4'd2;
#10;

A = 8'd128;
B = 8'd4;
Sel = 4'd2;
#10;

A = 8'd128;
B = 8'd8;
Sel = 4'd2;
#10;

23 | P a g e
A = 8'd128;
B = 8'd16;
Sel = 4'd2;
#10;

A = 8'd128;
B = 8'd32;
Sel = 4'd2;
#10;

A = 8'd128;
B = 8'd64;
Sel = 4'd2;
#10;

A = 8'd128;
B = 8'd128;
Sel = 4'd2;
#10;

A = 8'd128;
B = 8'd2;
Sel = 4'd2;
#10;

for (i=0;i<=255;i=i+1)

begin

#10;

Sel = 4'd0;

end

$finish;

end

endmodule

24 | P a g e
Coverage report for Testbench-3

25 | P a g e
Simulation

26 | P a g e
27 | P a g e
28 | P a g e
2.3. Analysis of Simulation and Code Coverage:

2.3.1 Analysis of Testbench1:

The generated waveform produced by the simulation using nclaunch confirms


the desired functionality and satisfies the written testbench.

Code coverage shows 100% block coverage i.e. line coverage because we have
covered all of the 16 logic functions. The toggle coverage is 39.9% ( 26/66). The
total is 66. In the RTL code we have in1, in2, A, B, Out1, Out2, count of 8 bit
each therefore 7*8 = 56. Sel and self of 4 bit therefore 8, clk of 1 bit and rst of 1
bit. Hence total toggle = 66. Because we have not toggled many of the bits the
toggle coverage is bad. Overall code coverage is average of Block coverage and
toggle coverage and it is 69.7%.

2.3.2 Analysis of Testbench2:

The generated waveform produced by the simulation using nclaunch confirms


the desired functionality and satisfies the written testbench.

Code coverage shows 100% block coverage i.e. line coverage because we have
covered all of the 16 logic functions. The toggle coverage is 98.48% ( 65/66). The
reason for remaining 1 bit not toggling is because of Out2’s MSB which can not
be made 1 for any multiplication of A and B. Overall code coverage is average of
Block coverage and toggle coverage and it is 99.24%.

2.3.3 Analysis of Testbench3:

The generated waveform produced by the simulation using nclaunch confirms


the desired functionality and satisfies the written testbench.

Code coverage shows 100% block coverage i.e. line coverage because we have
covered all of the 16 logic functions. The toggle coverage is 100% ( 65/65). In
this testbench we have made Out2 of 7 bits hence total toggle = 65. After this we
were able to have 100% toggle coverage. Overall code coverage is average of
Block coverage and toggle coverage and it is 100%.

29 | P a g e
3. Design Synthesis and Constraints

3.1. Synthesize for minimum area, keeping timing constraints highly relaxed.

Constraint file (.sdc)

create_clock -name clk -period 14 [get_ports clk]

set_clock_transition -rise 0.4 [get_clocks clk]

set_clock_transition -fall 0.4 [get_clocks clk]

set_input_delay -clock [get_clocks clk] 0.4 [get_ports in1]

set_input_delay -clock [get_clocks clk] 0.4 [get_ports in2]

set_input_delay -clock [get_clocks clk] 0.4 [get_ports Sel]

set_input_delay -clock [get_clocks clk] 0.4 [get_ports rst]

set_output_delay -clock [get_clocks clk] 0.2 [get_ports Out1]

set_output_delay -clock [get_clocks clk] 0.2 [get_ports Out2]

set_output_delay -clock [get_clocks clk] 0.2 [get_ports count]

In the above constraint file we have increased the clock time period to 14 ns and the
clock is having a 50% duty cycle. Because of the increased time period the tool can use
the smaller area cells because of extra margin in delay that can be there satisfying the
clock time constraint. Thus we get the minimum area. We confirmed if its the minimum
area by increasing the clock time further and observed no further decrease in the area
hence confirming its the minimum area.

30 | P a g e
Area Report

Power Report

31 | P a g e
Timing Report

32 | P a g e
Cell Report

We can observe the slack of 3202ps in the path from B_reg[6]/CK to Out1_reg[0]/D.
The area of 3633.877 um sq is obtained for the synthesized netlist. The power is
1.417 x 10^-4 W.

About 53% of the area is consumed by Multiplier and Divider modules.

In the timing report the worst path in terms of delay is taken which goes through the
divider which is a major contribution of delay.

In the power report as it is the minimum area the leakage power is high = 11.42%,
Internal power = 69.15% and Switching power = 19.43% of total power. The leakage
power percentage is high because smaller cells have small capacitances and there
is no output load in the constraint so the internal power and switching power is less.

33 | P a g e
3.2. Synthesize for best timing, keep timing constraints tight. Keep making the
timing constraints tighter unless you observe a negative slack. The timing
analysis should show slight negative slack for this constraint.

Constraint file (.sdc)

create_clock -name clk -period 2.8 [get_ports clk]

set_clock_transition -rise 0.4 [get_clocks clk]

set_clock_transition -fall 0.4 [get_clocks clk]

set_clock_uncertainity 0.4 [get_clocks clk]

set_input_delay -clock [get_clocks clk] 0.4 [get_ports in1]

set_input_delay -clock [get_clocks clk] 0.4 [get_ports in2]

set_input_delay -clock [get_clocks clk] 0.4 [get_ports Sel]

set_input_delay -clock [get_clocks clk] 0.4 [get_ports rst]

set_output_delay -clock [get_clocks clk] 0.2 [get_ports Out1]

set_output_delay -clock [get_clocks clk] 0.2 [get_ports Out2]

set_output_delay -clock [get_clocks clk] 0.2 [get_ports count]

set_load 1 [get_ports "count"]

set_load 1 [get_ports "Out1"]

set_load 1 [get_ports "Out2"]

In the above constraint file we have decreased the clock time period to 2.8 ns to make a
tight timing constraint and the clock is having a 50% duty cycle. Because of the
decreased time period the tool can use the bigger cells to decrease the delay that can
be there satisfying the clock time constraint. Also we have used load at the three
outputs of 1 pf. This increases the delay and less clock pushes the limit of the synthesis
tool to make the slack 0 but it gets slight negative slack. We confirmed if it’s slight
negative by increasing the clock time period to 2.9 and 3ns and observed slack of 0.

34 | P a g e
Area Report

Power Report

35 | P a g e
Timing Report

Cell Report

36 | P a g e
We can observe the slack of -65ps in the path from B_reg[2]/CK to Out1_reg[0]/D. The
area of 5280.891 um sq is obtained for the synthesized netlist. The power is 1.95 x 10^-
3 W.

About 61% of the area is consumed by Multiplier and Divider modules. The percentage
of area of the divider is 30% in minimum area constraint it was 18% so the increase in
divider is more because it's the critical path which the synthesis tool is trying to make
slack non negative. The total area is larger than that of the minimum area constraint.

In the timing report the worst path in terms of delay or critical path is taken which goes
through the divider which is a major contribution of delay.

In the power the leakage power percentage is low = 1.57%, Internal power = 40.67%
and Switching power = 57.77% of total power. The leakage power percentage is less
than minimum area constraint because of the use of larger cells. The Switching power
percentage is larger than minimum area constraint because of the large load at output
and bigger cells. The magnitude of the internal power is more than minimum area
constraint because of bigger cells hence more internal capacitances.

37 | P a g e
3.3. Synthesize for timing constraints that is between (3.1) and (3.2) above.

Constraint file (.sdc)

create_clock -name clk -period 7 [get_ports clk]

set_clock_transition -rise 0.1 [get_clocks clk]

set_clock_transition -fall 0.1 [get_clocks clk]

set_input_delay -clock [get_clocks clk] 0.2 [get_ports in1]

set_input_delay -clock [get_clocks clk] 0.2 [get_ports in2]

set_input_delay -clock [get_clocks clk] 0.2 [get_ports Sel]

set_input_delay -clock [get_clocks clk] 0.2 [get_ports rst]

set_output_delay -clock [get_clocks clk] 0.2 [get_ports Out1]

set_output_delay -clock [get_clocks clk] 0.2 [get_ports Out2]

set_output_delay -clock [get_clocks clk] 0.2 [get_ports count]

set_load 1 [get_ports "count"]

set_load 1 [get_ports "Out1"]

set_load 1 [get_ports "Out2"]

In the above constraint file we have set the clock time period to 7 ns that is in between
minimum area constraint and tight timing constraint and the clock is having a 50% duty
cycle. Because of this time period the tool can use the cells of size in between minimum
area constraint and tight timing constraint. Also we have used load at the three outputs
of 1 pf.

38 | P a g e
Area Report

Power Report

39 | P a g e
Timing Report

40 | P a g e
Cell Report

We can observe the slack of 1318ps in the path from B_reg[2]/CK to Out1_reg[0]/D.
The area of 4141.757 um sq is obtained for the synthesized netlist. The power is 7.011
x 10^-4 W.

About 54% of the area is consumed by Multiplier and Divider modules. The percentage
of area of the divider is 23%, in the minimum area constraint it was 18% and in the tight
timing constraint it was 30%. So the area is in between the minimum area and the tight
timing constraint.

In the timing report the worst path in terms of delay or critical path is taken which goes
through the divider which is a major contribution of delay.

In the power the leakage power is high = 3.03%, Internal power = 36.06% and
Switching power = 60.91% of total power. The leakage power percentage is in between
the minimum area and the tight timing constraint.

The magnitude of internal power is less than the tight timing constraint because here
the cells are smaller than tight timing constraint cells so the internal capacitance is less.

41 | P a g e
Constraint Power (W) Timing slack (ps) Area(um sq)

Minimum area
1.417 x 10^-4 3202 3633.877

Tight timing
1.95 x 10^-3 -65 5280.891

Intermediate
7.011 x 10^-4 1318 4141.757

42 | P a g e
4. Formal Equivalence checking

Conformal tools operate in setup and LEC mode. In the setup mode, it reads two
designs along with the standard library. Two designs are the golden design and the
revised design (netlist, generated after logic synthesis). In the transition from setup to
LEC, conformal flattens both the designs and automatically maps the key point where
the key point is defined as primary input/output, D Flip Flop, D-Latches, Black boxes,
etc. In LEC mode, conformal compares the points, and comparison examines these
points to determine whether it is equivalent or not.

.do file used

set log file logical_equivalence_checking.log -replace

read library slow.v -verilog -both

read design topmodule.v -verilog -golden

read design topmodule_min_area_synth.v -verilog -revised

set system mode lec

add compared point -all

compare

report messages -compare -verb

report compare data -noneq

report verification

set verification information Equivalence_checking

write verification information

exit

43 | P a g e
4.1. Equivalence checking for minimum area constraint

The above is the result generated by the conformal for the minimum area constraint. In
the verification report, the 6th point is compare result showing PASS which means RTL
and generated netlist is equivalent.

44 | P a g e
The above represents Mapping Manager which contains two columns. The left column
contains the golden design and the right column contains the revised design
information. Conformal tools employ three name-based methods to map key points.
Name-based mapping is useful for gate-to-gate comparisons. Unmapped points are the
points that conformal does not map. Mapped Points contains a list of all mapped points
in both the designs and Compared points contain all of the compared points.
Green circle represents compared point is Equivalent.

Golden (RTL) and Revised(Generated Netlist) Schematic Comparison

45 | P a g e
4.2. Equivalence checking for tight timing constraint

The above is the result generated by the conformal for the tight timing constraint. In the
verification report, the 6th point is compare result showing PASS which means RTL and
generated Netlist is equivalent.

46 | P a g e
The above represents Mapping Manager which contains two columns. The left column
contains the golden design (RTL) and the right column contains the revised design
(generated netlist) information.

Golden (RTL) and Revised (Generated Netlist) Schematic Comparison

47 | P a g e
4.3. Equivalence checking for relaxed constraint

The above is the result generated by the conformal for the intermediate/relaxed
constraint. In the verification report, the 6th point is compare result showing PASS
which means RTL and generated netlist is equivalent.

The above represents Mapping Manager, which contains two columns. The left column
contains the golden design (RTL), and the right column contains the revised design
(generated netlist) information.

48 | P a g e
Golden (RTL) and Revised (Generated Netlist) Schematic Comparison

49 | P a g e
4.4. Bad Netlist for Formal Equivalence error

4.4.1 D Flip Flop commented out

Good Netlist

In the above-generated netlist DFFQX1 (D-Flip Flop) is removed to create a bad netlist.
Equivalence Checking is performed for this bad netlist.

50 | P a g e
Bad Netlist

51 | P a g e
Formal Equivalence Check

The above is the result generated by the conformal when the revised design is manually
changed to make it a bad netlist. In the verification report, the 6th point is compare
result showing FAIL: NONEQ, which means RTL and netlist is not equivalent.

52 | P a g e
Mapping

In the Unmapped Points red circle represents the Not-mapped unmapped key point.
In the setup mode conformal maps key points, the unmapped key is classified into three
categories which are extra, unreachable, or not-mapped. Not-mapped are the key
points that are reachable but do not have a corresponding point in the logic fan-in of the
design.

53 | P a g e
Schematic of the error which shows in the golden design D flip flop is connected to input
pin of count[4] but in revised case manually d flip flop was removed.

54 | P a g e
4.4.2 AND gate changed to OR gate

Good Netlist

In the above-generated netlist AND2XL(AND gate) is replaced by OR2XL(OR gate) to


create a bad netlist. Equivalence Checking is performed for this bad netlist.

55 | P a g e
Bad Netlist

56 | P a g e
Formal Equivalence Check

The above, is the result generated by the conformal when the revised design is
manually changed to make it a bad netlist. In the verification report, the 6th point is
compare result showing FAIL: NONEQ, which means RTL and netlist is not equivalent.

57 | P a g e
Mapping

The above report shows that all the points are not mapped correctly. Red colour in
compared points indicates the comparison between two keys is different due to which it
is not equivalent. Since the AND gate is changed to OR gate.

58 | P a g e
4.4.3. Change of NET Name

Good Netlist

In the above-generated netlist net name(n_216) is changed to net name(n_218) to


create a bad netlist. Equivalence Checking is performed for this bad netlist.

59 | P a g e
Bad Netlist

60 | P a g e
Formal Equivalence Check

The above, is the result generated by the conformal when the revised design is
manually changed to make it a bad netlist. In the verification report, the 6th point is
compare result showing FAIL: NONEQ, which means RTL and netlist is not equivalent.

61 | P a g e
Mapping

The above report shows that all the points in golden design are not mapped correctly
with the revised design. Red colour in compared points indicates that the comparison
between two keys is different due to which it is not equivalent. Since the net name is
changed with other net name.

62 | P a g e
4.4.4. Addition of new port

Good Netlist

In the above-generated netlist new port(Out3) is added to create a bad netlist.


Equivalence Checking is performed for this bad netlist.

63 | P a g e
Bad Netlist

64 | P a g e
Formal Equivalence Check

The above, is the result generated by the conformal when the revised design is
manually changed to make it a bad netlist. In the verification report, the 6th point is
compare result showing FAIL: NONEQ, which means RTL and netlist is not equivalent.

65 | P a g e
Mapping

In the Unmapped Points orange circle with E letter written in it represents the Extra
unmapped key point. In the setup mode conformal maps key points, the unmapped
key is classified into three categories which are extra, unreachable, or not-mapped.
Extra unmapped points are key points that are present in only one of the designs,
Golden or Revised. Since in the revised design extra port Out3 is added which was not
present in the golden design.

66 | P a g e
4.4.5. Rename of port

Good Netlist

In the above-generated netlist rst port is renamed to reset port to create a bad netlist.
Equivalence Checking is performed for this bad netlist.

67 | P a g e
Bad Netlist

68 | P a g e
Formal Equivalence Check

The above, is the result generated by the conformal when the revised design is
manually changed to make it a bad netlist. In the verification report, the 6th point is
compare result showing FAIL: NONEQ, which means RTL and netlist is not equivalent.

69 | P a g e
Mapping

In the Unmapped Points orange circle with E letter written in it represents the Extra
unmapped key point. In the setup mode conformal maps key points, the unmapped
key is classified into three categories which are extra, unreachable, or not-mapped.
Extra unmapped points are key points that are present in only one of the designs,
Golden or Revised. Here it appears in the both the design since rst is not present in
revised design and reset is not present in golden design. rst port is present in the
golden design and reset port is present in the revised design.

70 | P a g e
5. Static timing Analysis

STA is used to verify the synchronicity of the design. In STA we give the constraints
including clock time period. It checks for the setup and hold violations using a
pessimistic approach. It considers the worst path, for setup its path with maximum delay
and for holds its path with minimum delay.

The slack is measured by the STA tool. For setup slack = Required Time - Arrival Time,
for hold slack = Arrival Time - Required time.

If the slack is negative for setup and hold we have corresponding setup violation and
hold violation respectively.

STA is done using the Timing Graph. Arrival Time is computed using Forward Traversal
of Timing Graph starting from the corresponding input port of the path. Required Time is
computed using Backward Traversal of Timing Graph starting from the corresponding
output port of the path.

The STA can do the analysis using GBA that is Graph Based Analysis which is safe
bound. It is much more pessimistic than PBA.

PBA is path specific but it is computationally difficult. In PBA we have to give the
maximum number of paths at which the tool has to measure the arrival time and check
for the slack. Thus PBA is more accurate than GBA.

If GBA is giving setup/hold violation for some worst path we can confirm if it would be
the violation in a real circuit using PBA, If there is no violation using PBA then we can
be sure there is no violation. If PBA also gives the corresponding violation then we need
to resolve it.

71 | P a g e
5.1. STA for minimum area netlist

GBA

The Path is mentioned above with an endpoint of Out1_reg[0]/D.

In the above GBA the Required Time = End Arrival Time (latency of clock which is 0 by
default but can be changed using constraint) - Setup + Phase Shift(clock time period
defined in constraint file) = 0 - 0.065 + 14 = 13.935

Arrival time is computed using Forward Traversal of Timing Graph. Arrival time = 10.769

Slack = Required Time - Arrival Time = 3.166

72 | P a g e
PBA

The Path 1 mentioned above has an endpoint of Out1_reg[0]/D which is the same as
the GBA worst path. The Slack = 3.5 which is higher than what GBA gave which was
3.166 because GBA is more pessimistic than PBA.

73 | P a g e
5.2. STA for tight timing netlist

GBA

The Path is mentioned above with an endpoint of Out1_reg[0]/D.

In the above GBA the Required Time = End Arrival Time (latency of clock which is 0 by
default but can be changed using constraint) - Setup + Phase Shift(clock time period
defined in constraint file) = 0 - 0.109 + 2.8 = 2.691

Arrival time is computed using Forward Traversal of Timing Graph. Arrival time = 2.776

Slack = Required Time - Arrival Time = -0.086 thus giving the setup violation.

74 | P a g e
PBA

The Path 1 mentioned above has an endpoint of Out1_reg[0]/D which is the same as
the GBA worst path. The Slack = -0.063 which is higher than what GBA gave which was
-0.086 because GBA is more pessimistic than PBA. PBA confirms the setup violation.

75 | P a g e
5.3. STA for relaxed constraint netlist

GBA

The Path is mentioned above with an endpoint of Out1_reg[0]/D.

In the above GBA the Required Time = End Arrival Time(latency of clock which is 0 by
default but can be changed using constraint) - Setup + Phase Shift(clock time period
defined in constraint file) = 0 - 0.088 + 7 = 6.912

Arrival time is computed using Forward Traversal of Timing Graph. Arrival time = 5.611

Slack = Required Time - Arrival Time = 1.301

76 | P a g e
PBA

The Path 1 mentioned above has an endpoint of Out1_reg[0]/D which is the same as
the GBA worst path. The Slack = 1.421 which is higher than what GBA gave which was
1.301 because GBA is more pessimistic than PBA.

Netlist Required Time (ps) Arrival Time(ps) Slack (ps) = RT - AT

Minimum Area
13.935 10.769 3.166

Tight Timing
2.691 2.776 -0.086

Intermediate
6.912 5.611 1.301

77 | P a g e
6. Test Insertion (DFT – Design for Testability)

Why DFT?

The Chip Manufacturing process is prone to defects, and the representation of a


defect that causes a circuit to fail in a specified manner is called Fault. To make the
task of detecting faults feasible, additional logic is added; this design technique is
referred to as DFT (Design for testability). To solve the problem of controllability and
observability of memory elements, extra primary ports(Test Mode, Scan Enable,
Scan In, Scan Out) are added, D Flip Flop is replaced with other memory elements
called scan cells. With this addition, there is improvement in controllability and
observability of memory elements in a sequential design. Scan cells are reconnected
to form a scan chain.

Purpose of new entities:

Purpose of all the new entities that appear in the scan-chain inserted netlist and how
it is used to detect failure:

DFT is used to detect the faults. Test vectors are generated to detect the majority of
faults. Structural testing is used to decrease the number of test patterns. Structural
testing tests the components that implement the logic function.

In a sequential circuit the output depends on the present state and the input which
reduces the controllability and observability because to reach a particular state it
needs to do state traversal which is exponential in terms of number of state
elements thus increasing the test sequence, increasing the time of ATPG tool.

Using scan-chain inserted netlist we replace the D flip flops with scan cells. The
connected scan cells form the shift register called scan chain. This improves the
controllability and observability by removing the need of state traversal.

In the netlist we can see DFF is replaced by the SDFF. With SDFF extra ports are
inserted SI(scan input), SE(scan enable), D( normal input), Q(output or SO).

If there are N scan cells then during testing the shift mode is done so SE = 1, it shifts
the test vector using SI port in N clock cycle. Next in capture mode SE = 0, here the
normal inputs are inserted in 1 clock cycle and then shift mode is used to shift the
captured output in N clock cycle. Thus a linear number of clock cycles are required
using the scan chain. The output can be compared to check for any fault.

78 | P a g e
6.1. Minimum Area

Area Report for scan chain inserted netlist

Area Report pre DFT

With Scan Chain Without Scan Chain


Area 3820.074 um2 3633.877 um2

79 | P a g e
Timing Report for scan chain inserted netlist

80 | P a g e
Timing Report pre DFT

With Scan Chain Without Scan Chain


Timing slack 3591ps 3202ps

81 | P a g e
Power Report for scan chain inserted netlist

Power Report pre DFT

With Scan Chain Without Scan Chain


Power 149.591uW 141.672uW

82 | P a g e
6.2. Minimum Timing

Area Report for scan chain inserted netlist

Area Report pre DFT

With Scan Chain Without Scan Chain


Area 5465.575 um2 5280.891 um2

83 | P a g e
Timing Report for scan chain inserted netlist

84 | P a g e
Timing Report for pre DFT

With Scan Chain Without Scan Chain


Timing slack -203ps -65ps

85 | P a g e
Power Report for scan chain inserted netlist

Power report pre DFT

With Scan Chain Without Scan Chain


Power 1971.18uW 1954.16uW

86 | P a g e
6.3. Relaxed Constraint

Area Report for scan chain inserted netlist

Area Report pre DFT

With Scan Chain Without Scan Chain


Area 4425.594 um2 4141.757 um2

87 | P a g e
Timing Report for scan chain inserted netlist

88 | P a g e
Timing Report for DFT

With Scan Chain Without Scan Chain


Timing slack 1592ps 1318ps

89 | P a g e
Power Report for scan chain inserted netlist

Power Report pre DFT

With Scan Chain Without Scan Chain


Power 733.851uW 701.121 uW

90 | P a g e
Analysis:

QoR comparison for design with and without scan insertion

constraint Minimum Area Tight Timing Relaxed

Pre DFT Post DFT Pre DFT Post DFT Pre DFT Post DFT

Area(um) 3633.877 3820.074 5280.891 5465.575 4141.757 4425.594

Timing slack (ps) 3202 3591 -65 -203 1318 1592

Power (uW) 141.672 149.591 1968.61 1971.18 701.121 733.851

Area: With the insertion of scan cells area of design increases because flip flop in the
design is replaced with scan cells and the popular scan cell used is MUXED-D scan cell
which consist of MUX and Flip Flop. In the above table we can observe increment in
area in all the cases.

Power: From the above table it is observed that power is increasing after scan cell is
inserted. From the above report mentioned leakage power, Internal and switching power
is increased after scan insertion in all the three cases.

Timing slack: Timing slack decreases in tight timing constraint this is because arrival
time increases due to inclusion of scan cells in the path. This can be observed from the
above table and from the below mentioned timing reports.

91 | P a g e
Timing Reports

Timing Report same path tight timing constraint (scan chain- performed STA post DFT)

92 | P a g e
Timing Report for tight timing constraint (pre DFT-STA report)

93 | P a g e
In case of Minimum area constraint timing slack is increasing this can be observed from
the above table and below mentioned timing report. From the report we can observe
increment in setup to 0.153 from 0.063 at the same time required time is decreased
from 13.937 to 13.847 but the decrease in arrival time is more compared to required
time since the tool has taken different logic gates after the DFT operation.

Timing Report same path min area constraint (scan chain- performed STA post DFT)

94 | P a g e
Timing Report for min area constraint ( pre DFT-STA report)

95 | P a g e
In case of Relaxed constraint timing slack is increasing this can be observed from the
table and below mentioned timing report. From the below report we can observe
increment in setup to 0.201 from 0.093 at the same time required time is decreased
from 6.907 to 6.799 but the decrease in arrival time is more compared to required time
since the tool has taken different logic gates after the DFT operation.

Timing Report for relaxed constraint (scan chain-performed STA post DFT)

Timing Report for relaxed constraint (pre DFT-STA report)

96 | P a g e
Schematic representing scan insertion

97 | P a g e
7. Violation Reports

7.1. Violation Report after STA:

Minimum Area Constraint

98 | P a g e
Relaxed/Intermediate Constraint

99 | P a g e
Tight Timing Constraint

In case of tight timing constraint setup violation occurs since the timing was kept tighter
unless a negative slack is observed.

100 | P a g e
7.2. Violation Report STA post DFT:

Minimum Area Constraint

101 | P a g e
Relaxed/Intermediate Constraint

102 | P a g e
Tight Timing Constraint

In case of tight timing constraint setup violation occurs since the timing was kept tighter
unless a negative slack is observed.

103 | P a g e
Appendix
1. Command for Formal Equivalence Checking

Tool Used: Conformal (Cadence)

Command Used to invoke tool:

tcsh

source /cadence/cshrc

lec -lpgxl -dofile logic_equ_check.do

2. Command for STA

Tool Used: Tempus (Cadence)

Command Used to invoke tool:

csh

source /cadence/cshrc

tempus -nowin

source staaftersynthesis.tcl

104 | P a g e
TCL File used for STA

The above STA Tcl file is used during STA.

report_timing > $report_dir/timing_report.rpt: It is for the Graph Based Analysis.

report_timing -retime path_slew_propagation -max_path 50 -nworst 50 -path_type


full_clock > $report_dir/pba.rpt : It is for the Path Based Analysis. Given input of
max_path of 50 for which PBA will be done.

3. Command for Test Insertion (DFT)

Tool Used: Genus (Cadence)

Command Used to invoke tool:

tcsh

source /cadence/cshrc

genus -legacy_ui

source dft_script.tcl

105 | P a g e
TCL File for DFT

106 | P a g e

You might also like