0% found this document useful (0 votes)
10 views26 pages

Week 11 B

Uploaded by

Gqgqg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views26 pages

Week 11 B

Uploaded by

Gqgqg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Lecture 19

Performance Optimization

Xuan ‘Silvia’ Zhang


Washington University in St. Louis

http://classes.engineering.wustl.edu/ese461/
Project FAQ

• Correction
– typo in optical flow: Iy(i, j) = I1(i, j+1) – I1(i, j-1)
– I1(i, j+1) might not exist
• Mid-project report
– behavioral Verilog code and testbench
– show proof of working functional simulation
– ensure synthesizable codes
• Use of external memory
– instantiate in the test bench
– used for large data array or buffers

2
Arrays, Vectors, and Memories

3
Useful Verilog Features

• Display tasks
– $display, $displayb (h, o) in binary, hex, and octal
– $write, $strobe, $monitor
• File I/O tasks
– $fopen, $fclose
– $fdisplay, $fwrite, $fstrobe, $fmonitor
– $readmemb, $readmemh: read a text file into memory

4
Module Partitioning

• Where possible, register module outputs and


keep critical path in one block
• Design Registering
– pipelining
– restructure a long data path with several levels of logic
and break it up over multiple cycles

5
Pipelining

6
Pipelining

7
Adding Structure

• Control the structure by using separate


assignment and parentheses
• Example
– 32-bit arithmetic shift right
– design 1

– design 2

8
32-Bit Arithmetic Shift Right

• Design 3

9
32-Bit Arithmetic Shift Right

• Optimal structured design

10
32-Bit Arithmetic Shift Right

• Without specifying the mux instantiations

11
Horizontal Partitioning

• Break circuit into horizontal slices to minimize


maximum fan-in
• Example
– carry lookahead adder:
32-bit adder broken to eight 4-bit blocks
– 32-bit priority encoder

12
32-Bit Priority Encoder

• Restructured with four 8-bit blocks

13
Priority-Encoded Logic vs Balanced Logic

• If-Then-Else vs Case Statement


– redundant priority

14
Hierarchy

• Collapse hierarchy (flattening)


– more efficient synthesis
• Add Hierarchy
– benefit results from structure preservation
– example: 32-bit decoder

– least-efficient implementation

15
32-Bit Decoder

• More concise representation

• A balanced tree decoder is even better

16
32-Bit Balanced-Tree Decoder

17
Performing Operations in Parallel

• Example
– linear search

18
Performing Operations in Parallel

• Example
– binary search

19
Performing Operations in Parallel

• Example
– parallel search

20
MUX for Conditional Assignment

• Example: counter

21
MUX for Conditional Assignment

• Example: counter

22
Replication

• Large fanout
– manual register duplication to reduce congestion

23
Resource Sharing

• Optimize area but hurt speed


– with resource sharing

24
Resource Sharing

• Optimize area but hurt speed


– without resource sharing

25
Questions?

Comments?

Discussion?

26

You might also like