Vlsi Design
Vlsi Design
The first week of the internship period was spent in revising basic concepts of VHDL and
Verilog coding. I was also introduced to other hardware design related issues like static
timing analysis, simulations, synthesis and basics of PLDs (internal architecture, various
types etc.). To get better acquainted with the design and coding techniques, various test-codes
were written for testing the various peripherals (starting from basic LED testing and including
tests for ADC, DAC, RS232 and Flash memory testing). Through these codes, I got well
acquainted with Xilinx design tools such as ISE, Plan Ahead as well as Altera tools like
Quartus and more advanced NIOS.
After getting well-versed with design techniques, the project for designing a QDR Controller
IP was started. The IP consisted of hardware design code and application software. The
hardware design code was written in VHDL and implemented on Xilinx Spartan 6 FPGA.
The application software was written in C++ programming language to transact data and
commands between computer and the FPGA. At the end of the designing phase, the
Controller was tested at a satisfactory data rate of about 100 megabytes per second on the
Spartan 6 FPGA. Designing and successful testing of this controller took about 5 weeks.
The last two weeks were spent in studying and understanding the EMIF (External Memory
Interface) bus, which is used to interface DSP with its peripherals including FPGA. After
acquiring a detailed understanding of the EMIF bus, an application was written to accept data
from ADC (interfaced to Xilinx Vertex 4 FPGA) and to transfer it to Texas Instruments DSP.
This data was then looped back using the EMIF bus and displayed live on Oscilloscope
through DAC interfaced again to FPGA. The data taken from ADC can be output to DAC
using the EMIF interface after the data is processed by the DSP.
1
Table of Contents
Abstract......................................................................................................................................0
QDR Controller IP...................................................................................................................01
Basics of QDR......................................................................................................................01
Timing Issues........................................................................................................................02
Outline of the Hardware Design...........................................................................................03
Specifications....................................................................................................................03
Design Details......................................................................................................................04
Application Software............................................................................................................05
Tests Performed....................................................................................................................06
EMIF Bus Application.............................................................................................................07
Basics of EMIF Bus..............................................................................................................08
Design Details: Board Specifications...................................................................................09
Basic Codes..........................................................................................................................10
FPGA Program..................................................................................................................11
DSP Program....................................................................................................................12
Programming DSP PLL using GEL File..............................................................................13
Basics of GEL File............................................................................................................14
DSP Registers to be Programmed using GEL File...........................................................15
Configuring DSP Timing Characteristics.............................................................................15
Tests......................................................................................................................................16
References................................................................................................................................16
Conclusion……………………………………………………………… ………………….17
2
What is VLSI?
Before knowing about VLSI, let us see what is meant by Integration Technology to understand about VLSI.
Integrated chip is a single chip in which all the active and passive components are fabricated. But the
devices needed more efficiency to run and more circuits were included. This, in turn, made the electronic
devices bulkier. This was the time when integration technology came in to increase more components in a
single chip. This tremendously increased the speed of the device and decreased the size of the device. There
are more categories of integration which are determined by the number of components, that is, transistors
that are integrated into it. They are:
VLSI was conceived during the late 1970s. It is now one of the used technologies for component
designing, integrated circuit, and microchip processor. VLSI has the capacity to hold a huge number
of transistors that are integrated and embedded in a microchip.
QDR Controller IP
Basics of QDR
A QDR is a type of SRAM that can transfer up to four words in every clock cycle, hence the name
Quad Data Rate memory. QDR memory has separate ports for read and write and each port is Dual
Data Rate (DDR). The main purpose of QDR is that it enables fast and independent reads and
writes, at high clock frequencies without losing bandwidth in due to bus turnaround time as
opposed to DDR memories.
Timing Issues
Each port of a Quad Data Rate memory transacts data twice every clock cycle and is designed to
operate at high frequencies (typically few hundred MHz) in burst mode, whereas a design
implemented on FPGA cannot operate at this frequency. This is one of the main challenges in
FPGA designs involving QDR. This problem was overcome by making use of input/output
DDRs† (ODDR and IDDR). Timing is a very critical issue in QDR designs. Hence, softwares such
as Xilinx Plan Ahead were used to achieve a good timing performance.
3
Outline of the Hardware Design
The various steps involved are:
1. Implementing the design file by file in ISE
2. Download .bit file into Spartan 6 FPGA
3. Checking read and write functionality using GUI based software
Specifications
Cypress CY7C15632KV18 Quad Data Rate memory, 72 Mbit, 4 word burst
Xilinx Spartan 6 FPGA
Design Details
Given below is the block diagram of the QDR Controller IP.
Data sent by the application software interfaced passed on to the FPGA via USB connection. The
data is accepted within the FPGA in a USB controller. USB controller consists of two FIFOs, one
which accepts data from the USB and the other which sends data to USB. The two FIFOs are
controlled by read/write logic of the USB controller.
The command structure consisted of a start byte and an end byte, three bytes each for passing the
file size and start address and one byte for the actual command (in this case read or write). The
command is decoded in the instruction decoder and important information such as the starting
address, file size, read/write etc. are sent to the QDR read-write module.
Define the problem. Crucial to solving any design problem is to begin by asking the right questions. ...
Conduct research. ...
Brainstorm and conceptualize. ...
Create a prototype. ...
Select and finalize. ...
Product analysis. ...
Improve.
RTL design :
RTL design is a crucial step in the VLSI design flow, which involves the creation of electronic circuits using
integrated circuits (ICs). It involves the specification of a digital circuit in terms of the flow of digital signals
between hardware registers, and the logical operations performed on those signals.
DFT in VLSI :
Design-for-testability (DFT) techniques attempt to reduce the high cost in time and effort required to
generate test vector sequences for VLSI circuits. The identification of faulty chips in the field can also be
greatly simplified if the chips are designed for testability.
4
Advantages & Applications of VLSI :
With all the advantages being laid, it is one of the used and preferred technology in various other fields.
Especially, it is applied in many other branches of engineering such as Commercial Electronics,
Automobiles, Computers, Digital Signal Processing, Data and Voice Communication Networks, Medicine
and so on.
1. VLSI Starter
This VLSI online course will be apt for you if you are a beginner in learning about VLSI. Using Xilinx ISE,
an industry-wide used tool, you will get to understand the basics of VLSI. You will get to program in
Verilog and build combinational circuits. As a part of this course, you will learn the concepts of encoders
and multiplexers by building the digital circuits and VLSI projects. Also, you will get to work with
Comparators and Logic Gates.
2. VLSI Explorer
This VLSI online course would be an eye-opener when you are seeking to know more about VLSI. It lets
you learn the basics of VLSI through Xilinx ISE, one of the industry-wide used tools. You will also learn to
program Verilog, concepts such as multiplexers, encoders, decoders through building digital circuits.
Eventually, you will build combinational circuits by working with Logic Gates, Comparators and so on.
3. VLSI Champion
As the very name suggests, this course is best for you to excel in VLSI technology. Apart from learning the
basics like programming in Verilog, building digital circuits and combinational circuits, learning concepts
such as decoders, encoders, multiplexers, so on and so forth, you will eventually learn to develop a Static
RAM Design and Traffic Light Control.
4. VLSI (Career Building Course)
This course is the best to add a weightage to your profile if you are seeking a career in the Electronics and
Communication background. You will learn the various concepts such as multiplexers, encoders, decoders,
flip flops, comparators, registers through building the combinational circuits, digital circuits, sequential
circuits and so on. Eventually, you will learn to develop projects like Traffic Light Control and Static RAM
Design.
VLSI Design is an iterative cycle. Designing a VLSI Chip includes a few problems such as functional
design, logic design, circuit design, and physical design. The design is verified for accuracy by the process
ofsimulation
5
Figure 1: Top Level Block Diagram
The QDR read-write module accepts data coming from USB and passes it on to QDR via
ODDRs during the write operation. Similarly, during a read operation, data is read from QDR
and passed to USB controller through IDDR. The ODDRs and IDDRs help in reducing the
frequency at which FPGA must operate for desired data transaction rate in QDR. Data
transacted to or from the QDR using state machines written after taking into consideration the
timing characteristics of the QDR memory chip.
6
Application Software
Commands are accepted from the computer via the application software. The command is
then processed in hardware and data is read from or written to the QDR memory accordingly.
The application software reads data from the test files and sends data to USB using 512 byte
buffer. Similarly, data is accepted from USB into the buffer and then written to the output
file. The input and output files are compared to test the read and write operations performed
on QDR memory.
Tests Performed
Simple Counter Test: Serial values were written to serial memory locations, read back
and verified.
Data Loopback: After the counter test was successful, the developed IP was tested
through read/write loopback testing for files of various sizes. The IP gave a
satisfactory performance with a data rate of about 100 megabytes per second.
FPGA DSP interface is a little more complicated as both FPGA as well as DSP need to be
programmed. DSP is programmed with an application program written in C++, while the
FPGA is programmed with .bit file.
7
o Sampling Speed – 210 Msps.
o Analog Input range - 2 Vpp.
EMIF Bus: 16 bit data width, Asynchronous.
8
Basic Codes
FPGA Program
FPGA is programmed using the .bit file in <FOLDER>. Data is accepted from ADC into a
FIFO. The clock-out given by ADC is used as write clock for this FIFO. Data from this
FIFO is sent to DSP using the bi-directional EMIF (External Memory Interface) bus. From
DSP data is looped back using the same EMIF bus. This data is accepted in another FIFO
from where it is read by DAC. The SYNC-clock for DAC is used as read clock for this DAC-
FIFO.
9
Figure 4: Detail Block
Diagram
10
DSP Program
DSP is coded using Code Composer Studio Version 4. Data is accepted from FPGA using
EMIF bus and then sent back using the same bus. Single data is read and then written
immediately. Only after the previous data is written back to FPGA, next data is read from it.
This is done to maintain a continuous supply of data to the DAC.
11
PLL Multiplier Control Register (PLLM)
Function: Controls PLL multiplier value
Location: This is a 32 bit register located at address 0x01C40910.
Setting: Multiplier value = PLLM (4 : 0) + 1
Note: There is an allowable range for PLL multiplier (PLLM). There is a minimum
and maximum operating frequency for DEV_MXI/DEV_CLKIN, PLLOUT,
AUX_MXI/AUX_CLKIN,
12
and the device clocks (SYSCLKs). The PLL Controllers must be configured not to
exceed any of these constraints documented in this section (certain combinations of
external clock inputs, internal dividers, and PLL multiply ratios might not be
supported). For our device allowed multiplier values are between 14 and 22. PLLM
value used in the code is 21 implying a multiplier value of 22.
Tests
Live data loopback takes place at a rate of about 400 kHz. Signals fed into ADC can be
viewed on an oscilloscope. Signals with frequencies up to 200 kHz get looped back
successfully (with little or no glitches). Signals of higher frequencies get somewhat
degraded.
References
Xilinx Documentation
Cypress CY7C15632KV18 Datasheet
13
FTDI Chip FT2232H Datasheet
Interfacing Xilinx FPGAs to TI DSP Platforms using EMIF Bus: XAPP753 (v2.0.1)
TMS320DM646x User Guide for Asynchronous EMIF (sprueq7c)
Creating Device Initialisation GEL fIles (SPRAA74A)
Code Composer Studio v4 Documentation
One of the major techniques used to control sub-threshold leakage is using sleep transistors.
In essence, sleep transistors are used for power gating. Logic runs at low Vt, and the gates
are faster and leaky; sleep transistors are high Vt. They are switched off when idle
(usually NMOS alone is
used) and can save 2-1000× leakage power.
Your goal is to design a 32 kbit SRAM (128 rows, 256 columns, 8 bit words) which uses sleep
transistors to reduce leakage power. There are a number of ways you can go: a single huge sleep
transistor, a sleep transistor per cell, or a sleep transistor per 4 cells, etc. There are power–delay
tradeoffs between these, which you should explore.
14
Fast Fourier Transform Kernel
During the last 10 years, a lot of effort has been concentrated on mapping the FFT
architecture to silicon while making tradeoffs in performance, silicon area, the number of
I/O pins, and other manufacturing issues. The objective of this project is exploring a FFT
architecture and implement it.
Implementation. The chip should take the time-sampled data input at a set sampling
frequency of your choice and output the correct bin counts for all the points in the FFT,
within the range of error tolerance. For real-world applications, you are encouraged to aim
for 64/128-point high precision FFT kernels which are compatible with the wireless
industry’s protocols. However, to keep the project simple, 16/32-point precision would be
good enough. Make your own specification for precision, power, area, and I/O bit width.
After completing the FFT design, verify it by following the next paragraph.
Testing. Create a testing benchmark structure to test the FFT core with reasonably good
coverage using all types of signals (sine waves, noise, dc) and their random combinations,
below the chosen Nyquist frequency.
At the algorithm level, you may choose from radix-2, radix-4, and specialized FFT
implementations, etc. The final chip must be presented in layout level after synthesis, and a
code file alone would not be sufficient.
Project deliverables will include the FFT specification definitions, test files, testing
benchmark, test outputs reports (both simulated and physical), a Cadence layout of the
FFT hardware, code, or a schematic abstraction of your layout, and a report of your
algorithm (in the form of a paper or pseudo-code).
15
Design of Robust CMOS Circuits for Soft-error Tolerance
With the continued scaling of technology, lower supply voltages and increasing operating
frequency, integrated circuits become increasingly susceptible to single event upsets (SEU)
caused by transient noise or high energy particles. A SEU may cause a bit flip in some latch
or memory element, thereby altering the state of the system, leading to a ‘soft error’. The
main objective of this project is to introduce more robustness with some redundancy in
circuits to make them less susceptible to undesired errors. The focus is to explore various
circuit level as well as system level techniques to reduce the effect of soft errors for logic
and memory circuits.
In this project, you will get an in-depth understanding of the VLSI design of modern on-
chip interconnection network. To begin with, the following article serves as a good
introduction: “Ar- chitectural Choices in Large Scale ATM Switches,” by J. Turner
and N. Yamanaka, in the IEICE Transactions, 1998.
The major task of this project is to select and implement a switching architecture. For
instance, in the article above, a Batcher-Banyan based, self-routing network is chosen. Many
techniques have been proposed; please spend some time on selecting among them. You are
encouraged to invent new architectures and algorithms and analyze their strengths and
drawbacks. Here are some more articles that may be useful.
16
• O’Neill et al., “A 200Mhz CMOS Broad-Band Switching Chip,” IEEE JSSC,
vol. 28, March 1993, pp. 269-275.
Please actively search IEEE Xplore or Google for new ideas and build upon them!
Once the architecture is defined, you may employ the skills developed in the labs to
implement a prototype (physical level). In view the limited time, you may put most of
the efforts on the core algorithm and structure and size down the whole system. Please
consider how to establish your testing benchmark of your switch from the very beginning.
Again, your testing benchmark should have fairly good coverage. As to the benchmark
setup, you may use C/C++, or scripture languages like TCL/TK, PERL, etc. As this is
more in the flavor of an open topic, your final grade will be based on your ideas,
implementation workload, and testing mechanisms, etc. Especially, your implementation
should demonstrate fair workload worthy of a serious project in our graduate class.
Design of Circuits for Sub-threshold Voltages
For ultra low power and portable applications, design of digital subthreshold logic has been
inves- tigated with transistors operated in the subthreshold region (supply voltage
corresponding to logic 1, which is less than the threshold voltage of the transistor). In this
technique, the subthreshold leakage current of the device is used for computation. Standard
design techniques suitable for superthreshold design can be used in the subthreshold region.
However, it has been shown that a complete co-design at all levels of hierarchy (device,
circuit, and architecture) is necessary to reduce the overall power consumption while achieving
acceptable performance (hundreds of kilohertz) in the subthreshold regime of operation.
Your goal in this project is to choose a suitable application, such as an adder, multiplier,
FFT module etc, and implement it using sub-threshold voltage logic.
In its simplest form, an option gives the purchaser the right to buy an object (which could be
a stock, or a commodity, well assume stock for simplicity) for a fixed price at a given time
in the future. More generally, options exist wherein the purchaser can buy the commodity
for a fixed price at any point up to a given time, or at the lowest price up to the given
time, etc.
When the purchase time is fixed, interest rates are constant, and the object price follows
Brownian motion, the Black-Scholes formula gives an analytical way to determine the fair
price of the option. This situation is rare, and analytical techniques do not exist for general
option pricing.
Monte Carlo simulation can be used to get an idea of the fair price; it is computationally
challenging, and the goal of this project is to use hardware acceleration for pricing. It is
17
most natural to use a finite time step for the simulation.
One approach is to derive the exact distribution of the stock price. Given a distribution for a
discrete random variable X (the stock price), and a distribution for a discrete random
variable Y (its change),
the distribution for X+Y is derived by convolving the two distributions – direct
convolution can get expensive (quadratic in the range of the two variables), and FFT-based
convolution may be a good way to proceed. You may want to consider various distributions
for the increment, not just binomial, but something with a heavy tail.
The goal of this project is to study the cost and accuracy of on-chip delay characterization
struc- tures. You should survey the state-of-the-art, as well as perform your own
experiments.
For example, Dhar et al. introduce an adaptive voltage scaling controller that uses an
inexpensive ring oscillator to measure speed. There could be multiple ring oscillators placed
throughout the design. The gate delay would be approximated based on the delay of the
nearest ring oscillator.
18
CLB in successively faster clock cycles until there is a delay error. Additionally,
neighboring CLBs could perform the shadow latching and comparator logic need for Razor
testing using existing CLB resources. The test literature (International Test Conference,
Fault-Tolerant Computing) would also be a good places to review.
References:
There are many kinds of circuit families used in digital systems. One of the famous logic
families is static CMOS. It has good noise margins, is fast, consumes relatively lower
power, insensitive to device variations, easy to design, and widely supported by CAD
tools. Other circuit families are also used for the high speed operation. For example, the
dynamic circuit families were used in high performance processors.
With development of the VLSI technology, the process scale goes down. Moreover, the logic
becomes highly complex. In this situation, it is difficult to test the VLSI logic. It is
because the test stimuli through external pins of the chip cannot access to the internal
logic of the chip completely.
Furthermore, the outputs of the complex sequential logic are determined based on its current
state. Thus, the complete test for the chip through its external pin may be impossible. To
solve these problems, several testing methods were suggested such as ad-hoc, scan and
BIST. BIST has an advantage against them, especially for complex systems on a chip.
BIST is an inexpensive testing method with the high fault coverage. The following
picture shows a typical BIST system.
19
You could start with the information on test in the textbook. Some other references are:
The complexity of the embedded systems has increased dramatically, and they need a high
speed and smart bus system. The reason is that in an embedded system, the communication
architecture such as the bus plays an important role in orchestrating data and control signal
transactions among the system components. Recently, the AMBA 3 bus protocol was
introduced by ARM Inc., and it provides several advanced features, such as multiple
outstanding requests, which accelerate its system performance by reducing the bus-waiting
time for each component.
The objective of this project is to design your own AMBA AXI interconnect. The
AMBA3 AXI bus system needs several elements such as decoder, arbiter, slave interface,
20
and master interface. It is very similar to the wishbone bus in that it is using a
handshaking protocol. The following figure shows the interface and interconnection of
the bus system.
21
Conclusion:
In conclusion, VLSI design is an essential component of modern electronics. It has revolutionized the
electronics industry by reducing the size of electronic devices and improving their functionality.
22