0% found this document useful (0 votes)
40 views22 pages

Vlsi Design

The document details an internship experience focused on VHDL and Verilog coding, culminating in the design of a QDR Controller IP implemented on a Xilinx Spartan 6 FPGA. The project involved writing application software in C++ and successfully testing the controller at a data rate of 100 megabytes per second. Additionally, the document covers the study of the EMIF bus for interfacing DSP with peripherals, including ADC and DAC, and outlines various VLSI concepts and applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views22 pages

Vlsi Design

The document details an internship experience focused on VHDL and Verilog coding, culminating in the design of a QDR Controller IP implemented on a Xilinx Spartan 6 FPGA. The project involved writing application software in C++ and successfully testing the controller at a data rate of 100 megabytes per second. Additionally, the document covers the study of the EMIF bus for interfacing DSP with peripherals, including ADC and DAC, and outlines various VLSI concepts and applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Abstract

The first week of the internship period was spent in revising basic concepts of VHDL and
Verilog coding. I was also introduced to other hardware design related issues like static
timing analysis, simulations, synthesis and basics of PLDs (internal architecture, various
types etc.). To get better acquainted with the design and coding techniques, various test-codes
were written for testing the various peripherals (starting from basic LED testing and including
tests for ADC, DAC, RS232 and Flash memory testing). Through these codes, I got well
acquainted with Xilinx design tools such as ISE, Plan Ahead as well as Altera tools like
Quartus and more advanced NIOS.

After getting well-versed with design techniques, the project for designing a QDR Controller
IP was started. The IP consisted of hardware design code and application software. The
hardware design code was written in VHDL and implemented on Xilinx Spartan 6 FPGA.
The application software was written in C++ programming language to transact data and
commands between computer and the FPGA. At the end of the designing phase, the
Controller was tested at a satisfactory data rate of about 100 megabytes per second on the
Spartan 6 FPGA. Designing and successful testing of this controller took about 5 weeks.

The last two weeks were spent in studying and understanding the EMIF (External Memory
Interface) bus, which is used to interface DSP with its peripherals including FPGA. After
acquiring a detailed understanding of the EMIF bus, an application was written to accept data
from ADC (interfaced to Xilinx Vertex 4 FPGA) and to transfer it to Texas Instruments DSP.
This data was then looped back using the EMIF bus and displayed live on Oscilloscope
through DAC interfaced again to FPGA. The data taken from ADC can be output to DAC
using the EMIF interface after the data is processed by the DSP.

1
Table of Contents
Abstract......................................................................................................................................0
QDR Controller IP...................................................................................................................01
Basics of QDR......................................................................................................................01
Timing Issues........................................................................................................................02
Outline of the Hardware Design...........................................................................................03
Specifications....................................................................................................................03
Design Details......................................................................................................................04
Application Software............................................................................................................05
Tests Performed....................................................................................................................06
EMIF Bus Application.............................................................................................................07
Basics of EMIF Bus..............................................................................................................08
Design Details: Board Specifications...................................................................................09
Basic Codes..........................................................................................................................10
FPGA Program..................................................................................................................11
DSP Program....................................................................................................................12
Programming DSP PLL using GEL File..............................................................................13
Basics of GEL File............................................................................................................14
DSP Registers to be Programmed using GEL File...........................................................15
Configuring DSP Timing Characteristics.............................................................................15
Tests......................................................................................................................................16
References................................................................................................................................16
Conclusion……………………………………………………………… ………………….17

2
What is VLSI?
Before knowing about VLSI, let us see what is meant by Integration Technology to understand about VLSI.
Integrated chip is a single chip in which all the active and passive components are fabricated. But the
devices needed more efficiency to run and more circuits were included. This, in turn, made the electronic
devices bulkier. This was the time when integration technology came in to increase more components in a
single chip. This tremendously increased the speed of the device and decreased the size of the device. There
are more categories of integration which are determined by the number of components, that is, transistors
that are integrated into it. They are:

 SSI (Small Scale Integration): 1-100 transistors

 MSI (Medium Scale Integration): 100-1000 transistors

 LSI (Large Scale Integration): 1000-10000 transistors

 VLSI (Very Large Scale Integration): 10000-1 million transistors

 ULSI (Ultra Large Scale Integration): 1m-10m transistors

 GSI (Giant Scale Integration): more than 10 million transistors

VLSI was conceived during the late 1970s. It is now one of the used technologies for component
designing, integrated circuit, and microchip processor. VLSI has the capacity to hold a huge number
of transistors that are integrated and embedded in a microchip.

QDR Controller IP

Basics of QDR
A QDR is a type of SRAM that can transfer up to four words in every clock cycle, hence the name
Quad Data Rate memory. QDR memory has separate ports for read and write and each port is Dual
Data Rate (DDR). The main purpose of QDR is that it enables fast and independent reads and
writes, at high clock frequencies without losing bandwidth in due to bus turnaround time as
opposed to DDR memories.

Timing Issues
Each port of a Quad Data Rate memory transacts data twice every clock cycle and is designed to
operate at high frequencies (typically few hundred MHz) in burst mode, whereas a design
implemented on FPGA cannot operate at this frequency. This is one of the main challenges in
FPGA designs involving QDR. This problem was overcome by making use of input/output
DDRs† (ODDR and IDDR). Timing is a very critical issue in QDR designs. Hence, softwares such
as Xilinx Plan Ahead were used to achieve a good timing performance.

3
Outline of the Hardware Design
The various steps involved are:
1. Implementing the design file by file in ISE
2. Download .bit file into Spartan 6 FPGA
3. Checking read and write functionality using GUI based software

Specifications
 Cypress CY7C15632KV18 Quad Data Rate memory, 72 Mbit, 4 word burst
 Xilinx Spartan 6 FPGA

Design Details
Given below is the block diagram of the QDR Controller IP.

Data sent by the application software interfaced passed on to the FPGA via USB connection. The
data is accepted within the FPGA in a USB controller. USB controller consists of two FIFOs, one
which accepts data from the USB and the other which sends data to USB. The two FIFOs are
controlled by read/write logic of the USB controller.

The command structure consisted of a start byte and an end byte, three bytes each for passing the
file size and start address and one byte for the actual command (in this case read or write). The
command is decoded in the instruction decoder and important information such as the starting
address, file size, read/write etc. are sent to the QDR read-write module.

7 steps of the design process

 Define the problem. Crucial to solving any design problem is to begin by asking the right questions. ...
 Conduct research. ...
 Brainstorm and conceptualize. ...
 Create a prototype. ...
 Select and finalize. ...
 Product analysis. ...
 Improve.

RTL design :
RTL design is a crucial step in the VLSI design flow, which involves the creation of electronic circuits using
integrated circuits (ICs). It involves the specification of a digital circuit in terms of the flow of digital signals
between hardware registers, and the logical operations performed on those signals.

DFT in VLSI :

Design-for-testability (DFT) techniques attempt to reduce the high cost in time and effort required to
generate test vector sequences for VLSI circuits. The identification of faulty chips in the field can also be
greatly simplified if the chips are designed for testability.

4
Advantages & Applications of VLSI :

Apart from making our lives easier, VLSI is more advantageous.

 It tremendously decreased the size of the circuits


 It does not need more space and relatively a smaller area is enough
 Comparing to Discrete components, it requires only less power
 It is more reliable
 The operating speed of circuits are increased by VLSI
 The effective cost for devices are reduced

With all the advantages being laid, it is one of the used and preferred technology in various other fields.
Especially, it is applied in many other branches of engineering such as Commercial Electronics,
Automobiles, Computers, Digital Signal Processing, Data and Voice Communication Networks, Medicine
and so on.

1. VLSI Starter
This VLSI online course will be apt for you if you are a beginner in learning about VLSI. Using Xilinx ISE,
an industry-wide used tool, you will get to understand the basics of VLSI. You will get to program in
Verilog and build combinational circuits. As a part of this course, you will learn the concepts of encoders
and multiplexers by building the digital circuits and VLSI projects. Also, you will get to work with
Comparators and Logic Gates.

2. VLSI Explorer
This VLSI online course would be an eye-opener when you are seeking to know more about VLSI. It lets
you learn the basics of VLSI through Xilinx ISE, one of the industry-wide used tools. You will also learn to
program Verilog, concepts such as multiplexers, encoders, decoders through building digital circuits.
Eventually, you will build combinational circuits by working with Logic Gates, Comparators and so on.
3. VLSI Champion
As the very name suggests, this course is best for you to excel in VLSI technology. Apart from learning the
basics like programming in Verilog, building digital circuits and combinational circuits, learning concepts
such as decoders, encoders, multiplexers, so on and so forth, you will eventually learn to develop a Static
RAM Design and Traffic Light Control.
4. VLSI (Career Building Course)
This course is the best to add a weightage to your profile if you are seeking a career in the Electronics and
Communication background. You will learn the various concepts such as multiplexers, encoders, decoders,
flip flops, comparators, registers through building the combinational circuits, digital circuits, sequential
circuits and so on. Eventually, you will learn to develop projects like Traffic Light Control and Static RAM
Design.
VLSI Design is an iterative cycle. Designing a VLSI Chip includes a few problems such as functional
design, logic design, circuit design, and physical design. The design is verified for accuracy by the process
ofsimulation

5
Figure 1: Top Level Block Diagram

The QDR read-write module accepts data coming from USB and passes it on to QDR via
ODDRs during the write operation. Similarly, during a read operation, data is read from QDR
and passed to USB controller through IDDR. The ODDRs and IDDRs help in reducing the
frequency at which FPGA must operate for desired data transaction rate in QDR. Data
transacted to or from the QDR using state machines written after taking into consideration the
timing characteristics of the QDR memory chip.

Figure 2: Detailed Block diagram

6
Application Software
Commands are accepted from the computer via the application software. The command is
then processed in hardware and data is read from or written to the QDR memory accordingly.
The application software reads data from the test files and sends data to USB using 512 byte
buffer. Similarly, data is accepted from USB into the buffer and then written to the output
file. The input and output files are compared to test the read and write operations performed
on QDR memory.

Tests Performed
 Simple Counter Test: Serial values were written to serial memory locations, read back
and verified.
 Data Loopback: After the counter test was successful, the developed IP was tested
through read/write loopback testing for files of various sizes. The IP gave a
satisfactory performance with a data rate of about 100 megabytes per second.

EMIF Bus Application

Basics of EMIF Bus


EMIF stands for external memory interface. It is an interface used to connect DSP processors
to various peripherals. All devices connected to DSP are mapped in a memory map. DSP is
able to communicate with these devices by writing appropriate values to various control
registers of the EMIF bus and by reading or writing data to the correct address locations in
this memory map.

FPGA DSP interface is a little more complicated as both FPGA as well as DSP need to be
programmed. DSP is programmed with an application program written in C++, while the
FPGA is programmed with .bit file.

Design Details: Board Specifications


 FPGA: Xilinx Virtex 4 (XC4VSX55-FF1148) having 55,296 logic cells.
 DSP: DaVinci Processor from Texas Instruments (TMS320DM6467).
 ADC: High Speed ADC (ADS5527)
o 4 channels.
o 12 bit Analog to Digital Converter from Texas Instruments.
o Sampling Speed – 210 Msps.
o Analog Input range - 2 Vpp.
 DAC: High Speed DAC (AD9742)
o 4 channels.
o 12 bit Digital to Analog Converter from Texas Instruments.

7
o Sampling Speed – 210 Msps.
o Analog Input range - 2 Vpp.
 EMIF Bus: 16 bit data width, Asynchronous.

8
Basic Codes

FPGA Program
FPGA is programmed using the .bit file in <FOLDER>. Data is accepted from ADC into a
FIFO. The clock-out given by ADC is used as write clock for this FIFO. Data from this
FIFO is sent to DSP using the bi-directional EMIF (External Memory Interface) bus. From
DSP data is looped back using the same EMIF bus. This data is accepted in another FIFO
from where it is read by DAC. The SYNC-clock for DAC is used as read clock for this DAC-
FIFO.

Figure 3: Top Level Block


Diagram

9
Figure 4: Detail Block
Diagram

10
DSP Program
DSP is coded using Code Composer Studio Version 4. Data is accepted from FPGA using
EMIF bus and then sent back using the same bus. Single data is read and then written
immediately. Only after the previous data is written back to FPGA, next data is read from it.
This is done to maintain a continuous supply of data to the DAC.

Programming DSP PLL using GEL File

Basics of GEL File


GEL file is a C program file which the DSP uses to configure itself before loading the user
application program. GEL file is loaded into the DSP at the beginning of “Debug” session in
CCSv4. Writing correct setting in the GEL file is necessary for proper functioning of the
application program in DSP.

DSP Registers to be Programmed using GEL File

PLL Control Register (PLL_CTL)


 Function: Controls all the important settings of PLL
 Location: This is a 32 bit register located at address 0x01C40900.
 Setting:
 CLKMODE (bit 8): To be set to 1 if using external clock source for PLL (as in our case).
 PLLENSRC (bit 5): This bit must be cleared before PLLEN will have any effect.
 PLLDIS (bit 4): Setting this bit disables PLL. When using PLL this bit must be cleared.
 PLLRST (bit 3): Making this bit 0 asserts reset to PLL. When using PLL this bit must 1.
 PLLPWRDN (bit 1): Make this bit 1 to power down PLL; 0 for normal operation.
 PLLEN (bit 0): PLL is bypassed when this bit is cleared. Set this bit to use PLL.
 PLL Initialization: Following steps must be followed in your GEL file to initialize PLL.
1. Set clock mode (CLKMODE)
2. Set PLL to bypass, wait for PLL to stabilize (PLLEN = 0, wait)
3. Reset PLL (PLLRST = 0)
4. Disable PLL (PLLDIS = 1)
5. Power up PLL (PLLPWRDN = 0)
6. Enable PLL (PLLDIS = 0)
7. Wait for PLL to stabilize
8. Load PLL multiplier (Refer PLLM register)
9. Set PLL post dividers (Refer PLLDIVn registers)
10. Wait for PLL to reset (PLLRST = 0, wait)
11. Release from reset (PLLRST = 1)
12. Wait for PLL to re-lock
13. Switch out of bypass mode (PLLEN = 1)

11
PLL Multiplier Control Register (PLLM)
 Function: Controls PLL multiplier value
 Location: This is a 32 bit register located at address 0x01C40910.
 Setting: Multiplier value = PLLM (4 : 0) + 1
 Note: There is an allowable range for PLL multiplier (PLLM). There is a minimum
and maximum operating frequency for DEV_MXI/DEV_CLKIN, PLLOUT,
AUX_MXI/AUX_CLKIN,

12
and the device clocks (SYSCLKs). The PLL Controllers must be configured not to
exceed any of these constraints documented in this section (certain combinations of
external clock inputs, internal dividers, and PLL multiply ratios might not be
supported). For our device allowed multiplier values are between 14 and 22. PLLM
value used in the code is 21 implying a multiplier value of 22.

PLL Divider Control Registers (PLLDIVn)


PLLDIVn controls divider value for SYSCLKn. EMIF bus runs on SYSCLK3. Default value of
PLLDIV3 is 4. Values for PLLDIV1, 2, 3 are fixed while others are programmable.

Configuring DSP Timing Characteristics


 Function: DSP timing characteristics can be configured writing appropriate values
to Asynchronous n Configuration Registers (AnCR). There are four such registers,
one for each select space. For our application the register corresponding to FPGA
(CS4/A3CR) was configured.
 Location: This is a 32 bit register located at address 0x20008018.
 Setting:
 EW (bit 30): Bit must be cleared to disable Extended Wait. The bit should NOT
be set if the device does not have an EM_WAIT pin.
 W_SETUP (bits 29:26): Write Setup Time = (W_SETUP + 1)*EMIF Clock Period.
 W_STROBE (bits 25:20): Write Strobe Width = (W_STROBE + 1)*EMIF Clock
Period.
 W_HOLD (bits 19:17): Write Hold Time = (W_HOLD + 1)*EMIF Clock Period.
 R_SETUP (bits 16:13): Read Setup Time = (R_SETUP + 1)*EMIF Clock Period.
 R_STROBE (bits 12:7): Read Strobe Width = (R_STROBE + 1)*EMIF Clock Period.
 R_HOLD (bits 6:4): Read Hold Time = (R_HOLD + 1)*EMIF Clock Period.
 TA (bits 3:2): Bus Turn Around Time = (TA + 1)*EMIF Clock Period.
 ASIZE (bits 1:0): 00 if EMIF bus width is 8 bit, 01 if 16 bit.
 Note: EMIF Clock is DaVinci SYSCLK3. The frequency of this clock may be
calculated knowing the PLL setting (enabled/bypassed), multiplier values.

Tests
Live data loopback takes place at a rate of about 400 kHz. Signals fed into ADC can be
viewed on an oscilloscope. Signals with frequencies up to 200 kHz get looped back
successfully (with little or no glitches). Signals of higher frequencies get somewhat
degraded.

References
 Xilinx Documentation
 Cypress CY7C15632KV18 Datasheet
13
 FTDI Chip FT2232H Datasheet
 Interfacing Xilinx FPGAs to TI DSP Platforms using EMIF Bus: XAPP753 (v2.0.1)
 TMS320DM646x User Guide for Asynchronous EMIF (sprueq7c)
 Creating Device Initialisation GEL fIles (SPRAA74A)
 Code Composer Studio v4 Documentation

SRAM with Sleep Transistors

One of the major techniques used to control sub-threshold leakage is using sleep transistors.
In essence, sleep transistors are used for power gating. Logic runs at low Vt, and the gates
are faster and leaky; sleep transistors are high Vt. They are switched off when idle
(usually NMOS alone is
used) and can save 2-1000× leakage power.

Your goal is to design a 32 kbit SRAM (128 rows, 256 columns, 8 bit words) which uses sleep
transistors to reduce leakage power. There are a number of ways you can go: a single huge sleep
transistor, a sleep transistor per cell, or a sleep transistor per 4 cells, etc. There are power–delay
tradeoffs between these, which you should explore.

Some papers that may help:

• B. Mohammad, M. Saint-Laurent, P. Bassett and J. A. Abraham, “A Cache Design


for Low Power and High Yield,” ISQED, 2008, pp. 1-10.
• B. H. Calhoun, F. A. Honore and A.P. Chandrakasan, “A Leakage Reduction
Methodology for Distributed MTCMOS,” IEEE JSC, vol. 39, May 2004, pp.
818-826.
• A. Ramalingam, B. Zhang, D. Z. Pan and A. Devgan, “Sleep Transistor Sizing
Using Timing Criticality and Temporal Currents,” Proc. Asia South Pacific Design
Automation Conference (ASPDAC), vol. 2, Jan. 2005, pp. 1094-1097.
• S. Vangal, M. Anders, N. Borkar and E. Seligman, “5-GHz 32-bit integer
execution core in 130-nm dual-Vt/CMOS,” IEEE Journal of Solid State Circuits,
Nov. 2002, pp. 1421-1432.
• V. Khandelwal and A. Srivastava, “Leakage Control Through Fine-Grained
Placement and Sizing of Sleep Transistors,” IEEE Transactions on Computer-
aided design of integrated cir- cuits and systems, 2007, pp. 533-536.

14
Fast Fourier Transform Kernel

During the last 10 years, a lot of effort has been concentrated on mapping the FFT
architecture to silicon while making tradeoffs in performance, silicon area, the number of
I/O pins, and other manufacturing issues. The objective of this project is exploring a FFT
architecture and implement it.

Implementation. The chip should take the time-sampled data input at a set sampling
frequency of your choice and output the correct bin counts for all the points in the FFT,
within the range of error tolerance. For real-world applications, you are encouraged to aim
for 64/128-point high precision FFT kernels which are compatible with the wireless
industry’s protocols. However, to keep the project simple, 16/32-point precision would be
good enough. Make your own specification for precision, power, area, and I/O bit width.
After completing the FFT design, verify it by following the next paragraph.

Testing. Create a testing benchmark structure to test the FFT core with reasonably good
coverage using all types of signals (sine waves, noise, dc) and their random combinations,
below the chosen Nyquist frequency.

At the algorithm level, you may choose from radix-2, radix-4, and specialized FFT
implementations, etc. The final chip must be presented in layout level after synthesis, and a
code file alone would not be sufficient.

Project deliverables will include the FFT specification definitions, test files, testing
benchmark, test outputs reports (both simulated and physical), a Cadence layout of the
FFT hardware, code, or a schematic abstraction of your layout, and a report of your
algorithm (in the form of a paper or pseudo-code).

Below are several references on FFT hardware implementations.

• E. E. Swartzlander, W. K. W. Young and S. J. Joseph, “A radix 4 delay


commutator for fast Fourier transform processor implementation,” IEEE Journal of
Solid-State Circuits, vol. 19, Oct. 1984, pp. 702-709.
• K. Maharatna, E. Grass and U. Jagdhold, “A 64-point Fourier transform chip for
high-speed wireless LAN application using OFDM,” IEEE Journal of Solid-State
Circuits, vol. 39, Mar. 2004, pp. 484-493.
• M. Sala, F. Salidu, F. Stefani and C Kutschenreiter, “Design Considerations and
Implemen- tation of a DSP-Based Car-Radio IF Processor,” IEEE Journal of Solid
State Circuits, Jul. 2004, pp. 1110-1118.
• S. Ishiwata and T. Yamakage, “A single-chip MPEG-2 codec based on
customizable media embedded processor,” IEEE Journal of Solid State Circuits,
Mar. 2003, pp. 530-540.

15
Design of Robust CMOS Circuits for Soft-error Tolerance

With the continued scaling of technology, lower supply voltages and increasing operating
frequency, integrated circuits become increasingly susceptible to single event upsets (SEU)
caused by transient noise or high energy particles. A SEU may cause a bit flip in some latch
or memory element, thereby altering the state of the system, leading to a ‘soft error’. The
main objective of this project is to introduce more robustness with some redundancy in
circuits to make them less susceptible to undesired errors. The focus is to explore various
circuit level as well as system level techniques to reduce the effect of soft errors for logic
and memory circuits.

Some of the references are:


• R. C. Baumann, “Soft errors in advanced computer systems,” IEEE Des. Test.
Computer, vol. 22, May/Jun. 2005, pp. 258-266.
• R. C. Baumann, “Radiation-Induced Soft Errors in Advanced Semiconductor
Technologies,”
IEEE Trans. Device and Materials Reliability, vol. 5, Sept. 2005, pp. 305-316.
• A. U. Diril, “Circuit Level Techniques for Power and Reliability Optimization of
CMOS Logic,” PhD Dissertation, Department of Electrical and Computer
Engineering, Georgia In- stitute of Technology, May 2005.
On-chip Interconnection Network

In this project, you will get an in-depth understanding of the VLSI design of modern on-
chip interconnection network. To begin with, the following article serves as a good
introduction: “Ar- chitectural Choices in Large Scale ATM Switches,” by J. Turner
and N. Yamanaka, in the IEICE Transactions, 1998.

The major task of this project is to select and implement a switching architecture. For
instance, in the article above, a Batcher-Banyan based, self-routing network is chosen. Many
techniques have been proposed; please spend some time on selecting among them. You are
encouraged to invent new architectures and algorithms and analyze their strengths and
drawbacks. Here are some more articles that may be useful.

• Shin and Hodges, “A 250-Mbit/s CMOS Crosspoint Switch,” IEEE JSSC,


vol. 24, April 1989, pp. 478-486.
• Akata et al., “A 250-Mb/s CMOS Crosspoint LSI for ATM Switching,” IEEE
JSSC, vol. 25, Dec. 1990, pp. 1433-1439.
• Chemarin et al., “A High-speed CMOS Circuit for 1.2-Gb/s 16x16 ATM
Switching,” IEEE JSSC, vol. 27, July 1992, pp. 1116-1120.

16
• O’Neill et al., “A 200Mhz CMOS Broad-Band Switching Chip,” IEEE JSSC,
vol. 28, March 1993, pp. 269-275.

Please actively search IEEE Xplore or Google for new ideas and build upon them!

Once the architecture is defined, you may employ the skills developed in the labs to
implement a prototype (physical level). In view the limited time, you may put most of
the efforts on the core algorithm and structure and size down the whole system. Please
consider how to establish your testing benchmark of your switch from the very beginning.
Again, your testing benchmark should have fairly good coverage. As to the benchmark
setup, you may use C/C++, or scripture languages like TCL/TK, PERL, etc. As this is
more in the flavor of an open topic, your final grade will be based on your ideas,
implementation workload, and testing mechanisms, etc. Especially, your implementation
should demonstrate fair workload worthy of a serious project in our graduate class.
Design of Circuits for Sub-threshold Voltages

For ultra low power and portable applications, design of digital subthreshold logic has been
inves- tigated with transistors operated in the subthreshold region (supply voltage
corresponding to logic 1, which is less than the threshold voltage of the transistor). In this
technique, the subthreshold leakage current of the device is used for computation. Standard
design techniques suitable for superthreshold design can be used in the subthreshold region.
However, it has been shown that a complete co-design at all levels of hierarchy (device,
circuit, and architecture) is necessary to reduce the overall power consumption while achieving
acceptable performance (hundreds of kilohertz) in the subthreshold regime of operation.
Your goal in this project is to choose a suitable application, such as an adder, multiplier,
FFT module etc, and implement it using sub-threshold voltage logic.

Hardware Accelerated Monte Carlo Simulation

In its simplest form, an option gives the purchaser the right to buy an object (which could be
a stock, or a commodity, well assume stock for simplicity) for a fixed price at a given time
in the future. More generally, options exist wherein the purchaser can buy the commodity
for a fixed price at any point up to a given time, or at the lowest price up to the given
time, etc.

When the purchase time is fixed, interest rates are constant, and the object price follows
Brownian motion, the Black-Scholes formula gives an analytical way to determine the fair
price of the option. This situation is rare, and analytical techniques do not exist for general
option pricing.

Monte Carlo simulation can be used to get an idea of the fair price; it is computationally
challenging, and the goal of this project is to use hardware acceleration for pricing. It is

17
most natural to use a finite time step for the simulation.

One approach is to derive the exact distribution of the stock price. Given a distribution for a
discrete random variable X (the stock price), and a distribution for a discrete random
variable Y (its change),

the distribution for X+Y is derived by convolving the two distributions – direct
convolution can get expensive (quadratic in the range of the two variables), and FFT-based
convolution may be a good way to proceed. You may want to consider various distributions
for the increment, not just binomial, but something with a heavy tail.

On-silicon Delay Characterization

As variability increases, there is growing interesting in making adaptive chips, where


parameters such as supply voltage and body biases can be set post-manufacturing to
overcome the effects of parametric variation.

The goal of this project is to study the cost and accuracy of on-chip delay characterization
struc- tures. You should survey the state-of-the-art, as well as perform your own
experiments.

For example, Dhar et al. introduce an adaptive voltage scaling controller that uses an
inexpensive ring oscillator to measure speed. There could be multiple ring oscillators placed
throughout the design. The gate delay would be approximated based on the delay of the
nearest ring oscillator.

Another promising approach would be to implement delay characterization based on


Razor by Ernst et al. By using a shadow latch and comparator logic, Razor has
mechanisms to monitor when a delay error has taken place. In the context of an FPGA, a
test input could run through the

18
CLB in successively faster clock cycles until there is a delay error. Additionally,
neighboring CLBs could perform the shadow latching and comparator logic need for Razor
testing using existing CLB resources. The test literature (International Test Conference,
Fault-Tolerant Computing) would also be a good places to review.

References:

• R. Tayade and J. A. Abraham, “On-chip programmable capture for accurate path


delay test and characterization,” Int’l Test Conf., 2008, pp. 1-10.
• S. Dhar, D. Maksimovic and B. Kranzen, “Closed-loop adaptive voltage scaling
controller for standard-cell ASICs,” Int’l Sym. Low Power Elec. and Design, 2002,
pp. 103-107.
• D. Ernst, S. Das, S. Lee, D. Blaauw, T. Austin, T. Mudge, N. Kim and K. Flautner,
“Razor: Circuit-Level Correction of Timing Errors for Low-Power Operation,”
IEEE Micro, vol. 24, 2004, pp. 10-20.
Comparison of Circuit Families

There are many kinds of circuit families used in digital systems. One of the famous logic
families is static CMOS. It has good noise margins, is fast, consumes relatively lower
power, insensitive to device variations, easy to design, and widely supported by CAD
tools. Other circuit families are also used for the high speed operation. For example, the
dynamic circuit families were used in high performance processors.

Compare Static CMOS, Pseudo-nMOS, CVSL, Dynamic, Domino, Dual-rail Domino,


CPL, DCVSPG and so on with different criteria: the number of transistors (area), static
power consumption, ability to cascade (compose), robustness, and the existence of dynamic
nodes. Find suitable applications for each of them by using HSPICE.

Logic Built-In Self-test (BIST)

With development of the VLSI technology, the process scale goes down. Moreover, the logic
becomes highly complex. In this situation, it is difficult to test the VLSI logic. It is
because the test stimuli through external pins of the chip cannot access to the internal
logic of the chip completely.

Furthermore, the outputs of the complex sequential logic are determined based on its current
state. Thus, the complete test for the chip through its external pin may be impossible. To
solve these problems, several testing methods were suggested such as ad-hoc, scan and
BIST. BIST has an advantage against them, especially for complex systems on a chip.
BIST is an inexpensive testing method with the high fault coverage. The following
picture shows a typical BIST system.

19
You could start with the information on test in the textbook. Some other references are:

• S. Hwang, J. A. Abraham, “Optimal BIST Using an Embedded


Microprocessor,” IEEE ITC, 2002. pp. 736-745.
• A. Chatterjee and J. A. Abraham, “Test generation, design-for-testability and built-in
self-test for arithmetic units based on graph labeling,” J. Electronic Testing, 1999,
pp. 351-372.
• R. Dandapani, J. H. Patel and J. A. Abraham “Design of Test Pattern Generators
for Built-In Test,” IEEE ITC, 1984, pp. 315-319.
• L.-T. Wang, C.-W. Wu and X. Wen, “VLSI Test Principles and Architectures:
Design for Testability,” Morgan Kaufmann, 2006.
• M. Abramovici, M. A. Breuer and A. D. Friedman, “Digital Systems Testing
and Testable Design,” IEEE Press, Piscataway, NJ, 1994.

Implementation of AMBA3 AXI

The complexity of the embedded systems has increased dramatically, and they need a high
speed and smart bus system. The reason is that in an embedded system, the communication
architecture such as the bus plays an important role in orchestrating data and control signal
transactions among the system components. Recently, the AMBA 3 bus protocol was
introduced by ARM Inc., and it provides several advanced features, such as multiple
outstanding requests, which accelerate its system performance by reducing the bus-waiting
time for each component.

The objective of this project is to design your own AMBA AXI interconnect. The
AMBA3 AXI bus system needs several elements such as decoder, arbiter, slave interface,
20
and master interface. It is very similar to the wishbone bus in that it is using a
handshaking protocol. The following figure shows the interface and interconnection of
the bus system.

21
Conclusion:

In conclusion, VLSI design is an essential component of modern electronics. It has revolutionized the
electronics industry by reducing the size of electronic devices and improving their functionality.

22

You might also like