The document provides an overview of FPGA technology, including:
- FPGAs are becoming larger, faster, and able to implement entire systems on a single device through advances in fabrication processes.
- FPGA families include flagship devices for large systems and lower-cost devices for high-volume applications.
- FPGAs offer more flexibility and parallelism than DSP processors, resulting in significantly higher performance for many applications.
The document provides an overview of FPGA technology, including:
- FPGAs are becoming larger, faster, and able to implement entire systems on a single device through advances in fabrication processes.
- FPGA families include flagship devices for large systems and lower-cost devices for high-volume applications.
- FPGAs offer more flexibility and parallelism than DSP processors, resulting in significantly higher performance for many applications.
The document provides an overview of FPGA technology, including:
- FPGAs are becoming larger, faster, and able to implement entire systems on a single device through advances in fabrication processes.
- FPGA families include flagship devices for large systems and lower-cost devices for high-volume applications.
- FPGAs offer more flexibility and parallelism than DSP processors, resulting in significantly higher performance for many applications.
The document provides an overview of FPGA technology, including:
- FPGAs are becoming larger, faster, and able to implement entire systems on a single device through advances in fabrication processes.
- FPGA families include flagship devices for large systems and lower-cost devices for high-volume applications.
- FPGAs offer more flexibility and parallelism than DSP processors, resulting in significantly higher performance for many applications.
The document provides an overview of FPGA technology including trends, architectures, platforms and applications. It discusses features like larger capacity, faster speed, embedded processors and dedicated hardware blocks. It also compares FPGA performance and flexibility versus other technologies.
Cutting-edge FPGAs contain larger capacity, faster speed, embedded processors, dedicated hardware blocks like DSP and transceiver blocks to support implementing entire systems on a single device.
FPGAs provide advantages over DSP processors and ASICs due to infinite reconfigurability, ability to trade area for speed, perform operations in parallel as required. FPGA performance is difficult to measure due to flexibility.
August 2005, University of Strathclyde, Scotland, UK For Academic Use Only
The DSP Primer 8
FPGA Technology DSPprimer Notes DSPprimer Home Return Return THIS SLIDE IS BLANK Top August 2005, For Academic Use Only, All Rights Reserved Introduction 8.1 This module will give a top-down overview of FPGA Technology based on various Xilinx devices; At the end of the section, the following will have been covered: FPGA Technology Roadmap and the various devices available - how FPGAs are progressing and what might lie ahead; Performance and flexibility - how FPGAs compare to DSP Processors and ASICs and why FPGAs have the advantage; FPGA Structure - a top down look at what an FPGA consists of down to the low level elements; Introduction to the FPGA design flow - an indication of the engineering process required to implement a design; How the digital logic of a design actually operates within the FPGA; Why pipelining and flip-flops/registers are free and are required for high clock rates; Memory available to designers within FPGAs and the different types/options available; How signals and clocks are effectively routed throughout the device; Input/Output interfacing capabilities of FPGAs; Dedicated arithmetic hardware. Notes: Top August 2005, For Academic Use Only, All Rights Reserved FPGA Technology Trends 8.2 General trend is bigger and faster; This is being achieved by increases in device density through ever smaller fabrication process technology; New generations of FPGAs are geared towards implementing entire systems on a single device; Features such as RAM, dedicated arithmetic hardware, clock management and transceivers are available in addition to the main programmable logic; FPGAs are also available with embedded processors (embedded in silicon or as cores within the programmable logic fabric); Notes: FPGAs are being incorporated as central processing elements in many applications such as consumer electronics, automotive, image/video processing, military/aerospace, base-stations, networking/ communications, supercomputing and wireless applications. The inclusion of embedded (i.e. actually present in silicon - not as soft IP) Power PC processors in recent Xilinx devices makes design partitioning and implementing much easier. Many low-speed algorithms that involve a lot of decision making and jumps in execution are more suited to implementation by microprocessor than FPGA. The inclusion of the Power PC blocks by Xilinx is an acknowledgement of this and goes a long way to making the System on an FPGA goal possible. Manufacturers may also provide embedded processors as soft IP cores. These cores are implemented on the main programmable logic fabric and associated development kits allow designers to write code to be executed. Features such as dedicated arithmetic hardware, clock management and multi-standard, high speed I/O blocks all assist the engineer in implementing a given design. Problems associated with such features that plague ASIC (Application Specific Integrated Circuit) designers such as clock skew have all been solved by the FPGA manufacturer and can be essentially ignored by the FPGA engineer. Top August 2005, For Academic Use Only, All Rights Reserved FPGA Families 8.3 Flagship FPGA families (e.g. Xilinx Virtex-4) are aimed at implementing large systems on a single device; Flagship families are the biggest and most expensive and are not aimed at high volume applications where cost is a primary factor; High volume applications (i.e. where an ASIC would traditionally have been used) are catered for by cheaper FPGA families (e.g. Xilinx Spartan-3); High volume devices often contain the same features offered by the flagship devices at a smaller scale to control costs; Within FPGA families, multiple device sizes are available at scaling costs with associated scaling of features such as logic fabric, RAM, I/O pins, arithmetic hardware etc. Notes: Often, low cost, high volume FPGA families are derived directly from larger families making the design process more familiar (e.g. Spartan-3 from Virtex-II, Spartan-II from Virtex) Each FPGA family comes in different sizes/packages and speed grades. The exact device required will depend on factors related to requirements of the target design/application such as: Area; Data/sampling rates; Input/Outputs and associated data rates; Memory required; Requirement for embedded processor or not; Cost ($$$). Top August 2005, For Academic Use Only, All Rights Reserved FPGA Performance and Flexibility (I) 8.4 Performance of FPGAs is difficult to quantify because algorithms/ systems can be flexibly implemented in many different ways; Multiply Accumulate (MAC) performance on flagship devices from Xilinx is in the region of hundreds of GMACs per second running at speeds of a few hundred MHz; FPGA manufacturers often give figures for maximum MAC/s using every piece of logic capable of multiplication - this of course does not reflect typical systems implemented on FPGAs; What is clear is that, due to parallelism, FPGAs easily outperform DSP Processors in terms of data/arithmetic throughput and flexibility; DSP Processors still have their place though - their design flow is better understood within the engineering community and some baseband algorithms do not yet map well to the FPGA fabric; Notes: MIPS (Millions of Instructions Per Second or perhaps Meaningless Indicator Of Performance) is often used to compare DSP Processors but cannot be used to quantify overall FPGA performance. The problem is that FPGAs are flexible enough to implement algorithms in different ways to suit the requirements of a particular application. For example, an application that requires 10 MACs (Multiply Accumulates) can be implemented on an FPGA or a DSP processor. The FPGA could implement the hardware to perform the 10 MACs one after the other in serial taking 10 clock cycles or in parallel, taking 1 clock cycle. Indeed it is possible to perform the 10 MACs in 5 clock cycles, or 2 clock cycles - as required. A DSP Processor does not have as much flexibility. Why is this flexibility useful? The reason is because, if the 10 MACs must be performed quickly, the FPGA can use a lot of area and perform them in parallel in 1 clock cycle and if the 10 MACs can be done slowly (defined by the system performance requirements), the FPGA can perform them serially using a 10th of the area but taking 10 clock cycles - i.e. the FPGA hardware implementation can be tailored to the application and take advantage of the application requirements/specification. In this way, speed and area can be traded when implementing on FPGA - DSP processors do not have this option. It should also be noted that it is very unlikely that anyone would ever implement an FPGA design that consisted only of multipliers! Figures given by manufacturers are merely intended to give an idea of the potential performance of these devices and by how far they outperform DSP Processors (considerably!) Top August 2005, For Academic Use Only, All Rights Reserved FPGA Performance and Flexibility (II) 8.5 Notes: More on DSP Processors vs FPGAs. It must be remembered that an FPGA is still an ASIC - Xilinx. are manufacturers of FPGAs but they are still fully custom integrated circuits at the end of the day - even though they are a special case due to the fact they are highly programmable... DSP Processors are also ASICs and as ASIC process technology improves and chips get faster, DSP Processors will get faster... but so will FPGAs because they are ASICs too! FPGAs already hold a performance advantage gap over DSP Processors and this gap will not close as silicon processes get better. Diagram: FPGAs: DSP for Consumer Digital Video Applications, Xilinx, http://www.xilinx.com/esp/dvt/ collateral/fpga_dsp_adv_in_dvt.pdf Top August 2005, For Academic Use Only, All Rights Reserved FPGA Performance and Flexibility (III) 8.6 Notes: A rather hand-wavy diagram that gives an indication of where FPGAs lie in the grand scheme of things in relation to Custom ICs (ASICs) and DSP Processors. The surge in FPGA use by manufacturers of electronic systems does seem to indicate that this diagram is close to the mark however. The costs and time involved in manufacturing ASICs are prohibitive (especially if bugs are found) when a designer can have a design running in hardware on an FPGA at their desk and iterate the design as many times as required with no expensive fabrication in sight! Diagram: FPGAs: DSP for Consumer Digital Video Applications, Xilinx, http://www.xilinx.com/esp/dvt/ collateral/fpga_dsp_adv_in_dvt.pdf Top August 2005, For Academic Use Only, All Rights Reserved FPGA Design Flow 8.7 This is a highly simplified overview of the Xilinx FPGA design flow; Numerous file format conversions occur between the many pieces of software; The engineer can control and influence all stages of the process via constraints and options; The FPGA market contains many companies that produce software tools for various stages of the flow; The final bitstream configures every part of the device required for the implemented design. Notes: A more detailed design flow is given below - this doesnt even show all of the possible stages although it does contain most! It may become clear why the FPGA design flow produces so many files and directories when you consider all of the processes below. Several stages are grouped/automated and can be run by the high- level software tools if desired. The engineer usually has the option of running each stage manually however! Flow diagrams: Xilinx Software Manuals, http://toolbox.xilinx.com/docsan/xilinx5/ manuals.htm Top August 2005, For Academic Use Only, All Rights Reserved Xilinx Virtex-II Pro FPGA Architecture 8.8 High-level, generic view of the Xilinx Virtex-II Pro family; As device size increases, so does the amount of available resources such as embedded multipliers, processors and configurable logic; The CLBs (Configurable Logic Blocks) form the main programmable fabric of the device; DCMs (Digital Clock Managers) solve clock management issues such as skew, phase shifting and division; Larger devices also contain more user I/O pins and I/O functionality. Notes: An FPGA is rather abstract looking and it may not appear obvious how a user design maps to the actual hardware. Luckily, the software tools can take care of a lot of the complexity of doing this once the user has defined their design. There is still a considerable amount of work for the engineer however and this is especially true when pushing the limits of the hardware - at this point the software tools may not do a good enough job and the engineer must get in and around the nuts and bolts themselves! Diagram: Virtex-II Pro Platform FPGA Complete Data Sheet, Xilinx, http://direct.xilinx.com/bvdocs/publications/ ds083.pdf Top August 2005, For Academic Use Only, All Rights Reserved Xilinx Virtex-II Configurable Logic Blocks 8.9 One Xilinx Virtex-II CLB contains four slices (Virtex/Spartan series have two slices per CLB); Any digital logic design can be implemented within the slice logic housed by the CLBs; Slices are interconnected within their CLBs and via the switch matrix that links CLBs together; The Cin and Cout signals are significant because they are highly useful for implementing arithmetic functions. Two independent Cin/ Cout columns exist per CLB column; One slice can implement a 2-bit full adder so one CLB can implement two independent 4-bit full adders as part of a larger bit-width calculation with other CLBs as required. Notes: Once the user has entered their design (via VHDL/Verilog for example), the Synthesis process takes the design and works out how to implement it on the elements of a specific FPGA. The engineer specifies exactly which device to target (i.e. manufacturer, family, size, package type, speed grade). The synthesis process is a complex one that can turn any synthesiseable VHDL/Verilog into a form that can be taken to FPGA by further software tools. In the case of Xilinx, the Synthesis tool will decide how to perform the digital logic operations of the design using the slice logic available. The FPGA manufacturer tools then take the design through many more stages in order to get the design into a form from which a bitstream is produced that can be downloaded to an FPGA to configure it. Diagram: Virtex-II Pro Platform FPGA Complete Data Sheet, Xilinx, http://direct.xilinx.com/bvdocs/publications/ ds083.pdf Top August 2005, For Academic Use Only, All Rights Reserved Xilinx Virtex-II Slices (I) 8.10 The majority of user-design functionality will be implemented by the slices contained by the CLBs; For this reason, the primary measure of Xilinx FPGA device size is the number of slices present; Many interconnection possibilities exist between slice elements (connections and many elements not shown here); The Look Up Tables (LUTs) implement any 4-input boolean function - the majority of a user digital logic design will be implemented using the 4-input LUTs to perform the actual logic operations; LUTs can also be used as Shift-Registers or RAM - discussed later. Notes: Xilinx slices are where the actual work that implements the user design happens. The different elements can be interconnected in different ways as determined by the configuration bitstream. The number of slices available on a device essentially determine its capacity since this is where it all happens! Diagram: Virtex-II Pro Platform FPGA Complete Data Sheet, Xilinx, http://direct.xilinx.com/bvdocs/publications/ ds083.pdf Top August 2005, For Academic Use Only, All Rights Reserved Xilinx Virtex-II Slices (II) 8.11 The registers provide the means of implementing synchronous logic; Registers are vital when designing for high clock rates - failure to use them will not yield high speed performance; The multiplexers and CY components provide some of the routing possibilities for signals through the slice (shown in more detail later); The Arithmetic Logic AND gate at the bottom has been included to make implementing multiplication more efficient. Notes: Diagram: Virtex-II Pro Platform FPGA Complete Data Sheet, Xilinx, http://direct.xilinx.com/bvdocs/publications/ ds083.pdf Top August 2005, For Academic Use Only, All Rights Reserved Xilinx Virtex-II Slice (top half) 8.12 Notes: All of the interconnections and components are shown. The software tools will take care of configuring every required element/connection - the user can also do so manually if required! When the FPGA is configured with a bitstream (generated by the software tools), the contents of the LUTs and the routing between the slice elements is defined - forming the user design. The bitstream will also configure the connection between slices/CLBs etc. Diagram: Virtex-II Pro Platform FPGA Complete Data Sheet, Xilinx, http://direct.xilinx.com/bvdocs/publications/ ds083.pdf Top August 2005, For Academic Use Only, All Rights Reserved Registers and Pipelining 8.13 LUT D Q LUT LUT D Q Slow Clock LUT D Q D Q LUT D Q LUT D Q Fast Clock Without Pipelining With Pipelining Possible FPGA clock rate is limited by the longest path between registers because the signals must travel further through LUTS/wires; Using the free slice registers keeps the longest path as short as possible and hence the possible clock rate as high as possible. Longest/Critical Path Notes: This is one of the fundamental design principles of FPGA design and must be understood. On each clock edge, signals must travel through their data path via routing lines, LUTs, MUXes etc. before arriving at the next flip-flop. This happens to signals within a design all over the device on every clock edge. Some signals will have further to travel than others and the longest (time) path between two flip-flops/registers is known as the critical path. It should be noted that the flip-flops are essentially free because every LUT is paired with a flip-flop that can register the LUT output as required. It is this critical path that will determine the maximum clock rate that the FPGA can be clocked at. Remember that the user can choose the clock rate arbitrarily as required. If the critical path is too long, the design may not be able to be clocked fast enough to meet the specification of the application! In this case, the engineer must return to the software tools/their design and try and make the design run faster. This may be achieved by for example: pipelining, redesign, increasing the effort level of the software tools, adding/removing design constraints or manually editing the design in order to optimise the hardware and reduce the length of the critical path! It should be noted that this is the most difficult part of FPGA design - what to do if a design does not meet timing! There are many options for the engineer to try and knowing which one(s) to use (and how to use them) can be a bit of a black art... Top August 2005, For Academic Use Only, All Rights Reserved Xilinx Virtex-II Block RAM 8.14 Xilinx Virtex-II devices have dedicated 18 Kb (Kilo-bit) Block RAMs throughout the device; One of the largest Virtex-II Pro (XC2VP125) has 556 Block RAMs and so 556 * 18 = 10,008 Kb of Block RAM in total; Block RAM can be written at device configuration time or written/read during operation; Block RAM can be single or dual port - i.e. one address gives 2 pieces of data - excellent for DSP (sample and coefficient for ex.). Notes: Engineers specify how they want to use the RAM components from within their VHDL/Verilog code - the software tools then ensure that the actual hardware is made available to the design. An example of using Block RAM could be to store the numeric values required to modulate a signal by a sine wave. Diagram: Virtex-II Pro Platform FPGA Complete Data Sheet, Xilinx, http://direct.xilinx.com/bvdocs/publications/ ds083.pdf Top August 2005, For Academic Use Only, All Rights Reserved Xilinx Virtex-II Distributed RAM 8.15 A LUT can store 16 bits and can be used as a 16x1 RAM; Two LUTs can form one 32x1 single-port RAM or one 16x1 dual- port RAM - i.e. the same address produces data from both RAMs; This flexibility allows several single/ dual port RAM configurations of the 128 bits available within one CLB (4 slices * 2 LUTs * 16 bits = 128); A Virtex-II Pro with 55,616 slices therefore has 55,616 * 2 LUTs * 16 bits = 1,738 Kb of Distributed RAM; The ability to create small RAMs anywhere on the device is extremely useful - especially for DSP purposes. Notes: An example of using a small distributed RAM could be a chipping sequence for use in a communications system. The sequence would be stored where it is needed to chip data as it proceeds through the system. The ability to form larger single/dual port configurations from the smaller ones is further testament to FPGA flexibility - distributed RAMs need only be as large as required. Diagram: Virtex-II Pro Platform FPGA Complete Data Sheet, Xilinx, http://direct.xilinx.com/bvdocs/publications/ ds083.pdf Top August 2005, For Academic Use Only, All Rights Reserved Shift Registers 8.16 Xilinx LUTs can implement a 16-bit shift register (called an SRL16) and when combined with the register available to every LUT, 17 delays are possible in one half of a slice; Shift registers can be cascaded to form longer delays; The delay can be tapped at any point using the address lines to create delay lines of length less than the maximum. Shift Reg A0 A1 A2 A3 CLK D Q D CLK Q Notes: The diagram opposite shows the SRL16s being cascaded to form a larger delay line. Note the flexibility of the Xilinx LUTs - this is the 3rd mode they can operate in addition to LUT/RAM. Diagram opposite: Virtex-II Pro Platform FPGA Complete Data Sheet, Xilinx, http://direct.xilinx.com/bvdocs/publications/ds083.pdf Top August 2005, For Academic Use Only, All Rights Reserved Xilinx Virtex-4 DSP48 Slice 8.17 The Xilinx Virtex-4 DSP48 slice offers custom DSP functionality; 500MHz throughput However, the Transposed/ Systolic FIR structures map more effectively in this case; Summation feedback is also available for serial implementations; Notes: The Virtex-4 DSP48 slice caters for two types of full-parallel FIR - Systolic and Transposed. The Systolic structure allows the highest performance due to maximum pipelining and no high input signal fanout. The Transposed structure has a fixed, low latency compared to the Systolic (whose latency increases with filter length) but the input signal fanout can limit performance, especially for large filters. Both architectures can be entirely implemented within DSP48 slices with no external logic. Diagrams: XtremeDSP Design Considerations User Guide, http://www.xilinx.com Full-Parallel Transposed FIR Full-Parallel Systolic FIR Top August 2005, For Academic Use Only, All Rights Reserved Xilinx Virtex-II Embedded Multipliers 8.18 Embedded multipliers are arranged in columns between CLBs; Multipliers are 18 x 18 bit and are associated with BlockRAM for easy access to data; Can be combinatorial or pipelined running at over 300MHz; Combining embedded multipliers with LUT implemented accumulators allows MAC engines to be created (e.g. for use in filters); Cascade multipliers to implement larger width multiplications. Notes: Each embedded multiplier is associated with an adjacent BlockRAM and hence these elements share interconnect. When the multiplier is being used without the associated BlockRAM, the BlockRAM can still be used but with only 18 bits. Again, multipliers can be implemented in the main fabric as required using purely slice logic or combining BlockRAM and slice implemented multiplier blocks. This may be necessary if no embedded multipliers are available or the design timing requirements are tight. Top August 2005, For Academic Use Only, All Rights Reserved Xilinx Virtex-II Routing 8.19 Xilinx Virtex-II series contains a multitude of routing that connects the elements of the device together; The configurable routing between CLBs (via the switch matrices) is complemented by dedicated routing for clock signals, carry chains etc. Notes: Routing signals around the device is usually left to the tools to implement. There is a massive number of possibilities to implement a design on an FPGA and the software tools may take many hours to actually produce a bitstream for a reasonable design. The routing possibilities are described as being hierarchical due to the fact that different routing options are available depending on how far a signal has to travel. Clearly, keeping signals to as short routing distances as possible is preferable to ensure high clock rates. The dedicated clock distribution lines are of special importance because when combined with the DCM (Digital Clock Management) blocks, they allow for high speed clocks to be fed throughout the device with no skew. Diagram: Virtex-II Pro Platform FPGA Complete Data Sheet, Xilinx, http://direct.xilinx.com/bvdocs/publications/ ds083.pdf Top August 2005, For Academic Use Only, All Rights Reserved Xilinx Virtex-II I/O 8.20 FPGAs are capable of interfacing with backplanes, buses and other systems at a board/system level; A multitude of current and emerging serial/parallel I/O standards are supported; In Virtex-II, up to 24 RocketIO Serial Transceiver blocks are available operating at full-duplex speeds of up 3.125Gb/s each; Also, in Virtex-II, user I/O pins support many single-ended and differential signalling standards up to 840 Mbps LVDS (Low-Voltage Differential Signalling); Virtex-II Pro X family supports up to 20 channels at 10.3125 Gbp/s. Notes: Getting signals into and out of FPGAs requires high speed signals to be routed into and out of the device on some sort of board that houses the overall system and the FPGA(s). The usual board-level difficulties with signal cross-talk, inductance, resonance etc. still exist but interfacing the FPGA to the board signals is quite achievable given the number of supported I/O standards: The Virtex-II devices have dedicated RocketIO blocks to deal with high speed I/O requirements and many more general Select I/O pins for other interfaces. The specific formats supported by each are given below: Supported standards from: http://www.xilinx.com/products/virtex2pro/rocketio.htm http://www.xilinx.com/products/virtex2pro/selectioultra.htm Top August 2005, For Academic Use Only, All Rights Reserved Xilinx ASMBL Architecture 8.21 Advanced Silicon Modular Block - basis of Virtex-4; Column based architecture with focused column types; Mixing column types in different ratios allows application domains with differing logic resource requirements to be more accurately targeted; Individual resource types (e.g. DSP/memory) can be scaled independently of the die size; Current FPGA architectures scale resource types primarily only with die size. Notes: Trivia: ASMBL was renamed to Advanced Silicon Modular Block from Application Specific Modular Block. The diagram below further illustrates how logic resources/features can be scaled independently of die size compared to traditional FPGA architectures. Xilinx see ASMBL as the next stage in programmable logic evolution. Diagrams: ASMBL Press Kit, Xilinx, http://www.xilinx.com/company/press/kits/asmbl.htm Top August 2005, For Academic Use Only, All Rights Reserved Xilinx Virtex-4 Platforms 8.22 Designers can select the most appropriate device according to feature requirements and cost; DSP is now a major focus industry-wide! Notes: Top August 2005, For Academic Use Only, All Rights Reserved Conclusion 8.23 This module has presented an overview of FPGA technology to give a high-level understanding of: What features cutting-edge FPGAs contain and the general trend of larger, faster and more features to support entire systems being implemented on FPGAs (e.g. I/O Transceivers, DSP blocks); Why FPGAs provide performance and flexibility advantages over DSP Processors and ASICs due to infinite reconfigurability, trading area for speed and performing operations in parallel as required; Why FPGA performance is difficult to measure due to their inherent flexibility; How the FPGA structure is generally organised hierarchically into CLBs/LABs, slices/LEs and elements such as LUTS/RAMs/SRL16s, MUXes and flip-flops and how these elements are used/combined to implement a design; The memory available on FPGAs; Dedicated arithmetic hardware and the various configurations available; The hierarchical routing lines that connect blocks together across the device and provide clock routing; The complexity of the FPGA design flow and the number of software tools and processes that can be involved; The various I/O standards available to allow FPGAs to interface with high-speed signals via board signals/buses/backplanes etc. Why flip-flops are free (they exist beside the LUTs anyway) and how they allow high clock rates. Notes: