0% found this document useful (0 votes)
40 views25 pages

19

Uploaded by

BIPLAV SINGH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
40 views25 pages

19

Uploaded by

BIPLAV SINGH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 25
Reconfigurable Computing CS G553 Dr. A. Amalin Prince BITS - Pilani K K Birla Goa Campus Department of Electrical and Electronics Engineering Lecture — 19 Reconfigurable Computing Device: Altera Stratix Il and Xilinx eer FPGA Market Share 201 VIRTEX VS _ STRATIX ‘We have some idea about V5 architecture, let me include some stratix I details Followed by v7 details ‘ STRATIX II Logic Fabric By Adaptive Logic Module (ALM) Block Diagram > regoutid) comboutio| > regouttt) combout(} Mi Sinput fracturable UT Two Adders ul Two Registers Configuration Description ‘ne Sth ALM can np eying unten, a ne Sal ALM ca be enue implement? eenden inl amar LUT: The, (Sriguiaton cn be ene se aecara compat nose Googe fl ae opi er the teen UT FPGAS cn easy be mite oe Sree iy serps tere 38 UV2LU" camenaten sao swaiooe’ ASS = = = Sire timon he cor Te SLUr asap! pepe Te stUT ep ws Fratpengent ows ie srg stings etneen Lets er co de Shr iscorre eat ss rors ta ae ses Ps mn ALM Flexibility resto) conmout) > ol combo) (Comparing the Statix II ALM and the Virtex-5 LUT-Hlipflap Pait Qo The ALM Advantage ‘Output ¢ Output 2 Virtex5 —, at a) @ @Q sur mt i 1 eur out 3 D ut wu 3 ° Cas? CGD o) oe aut uur 1 0 ALM vs. Virter-S LUT Flexibility oO The ALM Advantage LIF i sz : if a bebeernnbeh me eel =. eerie © Implementing S-and 3-Input Functions in Stratx 1 ALM and Virtex-S LUT-Fliptiop Pair Outline > Introduction to 7-Series FPGA > Logic Resources > Memory and DSP48 Resources >1/0 Resources >XADC > Clocking Resources >Zynq Soc > Summary 7-Series Architecture Alignment > Common elements enable easy IP reuse for quick design portability across all 7- || "~~ series families Jou © Design scalability from [sage low-cost to high- Ltt Deu > BURG performance © Expanded eco-system support © Quickest time to market Ds Artix-7 Architecture Overview 2 Outline > Introduction to 7-Series FPGA > Logic Resources > Memory and DSP48 Resources 21/0 Resources > XADC > Clocking Resources >Zyng SoC > Summary Configurable Logic Block (CLB) in 7-Series FPGAs > Primary resource for design in Xilinx FPGAs 0 Combinatorial functions © Flip-flops > CLB contains two slices > Connected to switch matrix for routing to other FPGA resources © Carry chain runs vertically ina column from one slice to the one above 8 Two Types of CLB Slices > Two types of CLB slices © SLICEM: Full slice + LUT can be used for logic.and niémory/S + Has wide multiplexers and carry chair— © SLICEL: Logic and arithmetic only + LUT can only be used for logic-(not memory) + Has wide multiplexers and cary chain Slice Resource > Four six-input Look-Up Tables (LUT) > Multiplexers > Carry chains >SRL © Cascade path is not shown > Four flip-flops/latches © Four additional flip-flops > The implementation tool will ~ pack multiple slices in the same CLB if certain rules @ are followed i : 6-Input LUT with Dual Output a > LUTs can be two 5-input LUTs with common input P © Minimal speed impact to a 6- 4. input LUT ‘© One or two outputs > Any combinatorial function of six variables or two functions of five variables M a x Wide Multiplexers me > Each F7MUX combines the outputs of two LUTs together © Can implement an arbitrary 7-input function © Can implement an 8-1 multiplexer > The F8MUX combines the outputs of the two F7MUXes © Can implement an arbitrary 8-input function © Can implement a 16-1 multiplexer > MUX is controlled by the BX/CX/DX slice input > MUX output can drive out combinatorially or to the flip-flop/latch Carry Chain a ‘subtraction © Carry out is propagated vertically through the four LUTs inaslice © The carry chain propagates from one slice to the slice in the same column in the CLB above > Carry look-ahead © Combinatorial carry look-ahead over the four LUTs in a slice © Implements faster carry cascading from slice to slice » Each slice has four flip-flop/latches (FFIL) ‘© Can be configured as either flip-flops or latches © The D input can come from fess) output, the cat rire ob ARENIONDX sn input > Each slice also has four flip-flops (FF) ° e from(65 plaput or the DX input hese don't have access to the carry chain, wide multiplexers, or the slice inputs > Ifany of the FF/L are configured as | =~ | == is © latches, the four FFs are not available Outline > Introduction to 7-Series FPGA > Logic Resources > Memory and DSP48 Resources 21/0 Resources > XADC > Clocking Resources >Zyng SoC > Summary 7-Series Block RAM and FIFO > All members of the 7-series families have the same Block RAMIFIFO > Fully synchronous operation © Alloperations are synchronous; all outputs are latched > Optional internal pipeline register for higher frequency operation > Two independent ports access common data © Individual address, clock, write enable, clock enable © Independent data widths for @ each port 7-Series DSPABEA Slice Why FPGA for Signal Processing? Communication? 2569 Finer Example 1GHz — ot _ =4msps 256 clock cycles LEDLEEEDI DS [occu “1 AWM nz) $a,la-y $a, XCN-y) Y Ia) = a, ecn-) 1%, 2%! win-b) HareinD hagnin-s) + % aye 47z p ADH ayers “i nm 6 io) ear f mod eca-Dd* % fren ep men uf ue 4 ag min-d pela-$ 44, 2IN-) 7 Series Capability = Industry's Lowest Power and First Unified Architecture = Spanning Low-Cost to Ultra High-End applications » Three new device families with breakthrough innovations in power efficiency, performance-capacity and price-performance Logie Cells Teed 3 Eero DSP Slices rd =3.960 Max. Transceivers n 6 eon Performance Record Bryon Eos Memory Performance Max. Selectlo™ 7 Ed ro ‘Selectio™ a 33Vand below | 3.3Vand below Voltages 4.8Vandbelow | _1.8V and below ceed Peed Pe DSP Performance through the DSP48E1 Slice Virtex-6, Artex-7, Kintex-7, Virtex-7 DSP48E1 Slice DSP48 Tile sinterconnect 2 DSP48E1 Slices / Tile Column Structure to avoid routing delay * Pre-adder, 25x18 bit multiplier, accumulator = Pattern detect, logic operation, convergent/symmetric rounding 638 MHz Fmax Pre-Adder DSPABE Slice + Hardened Pre-Adder leverages filter symmetry to reduce Logic, Power and Routing + No restriction to coefficient table size Filter symmetry <——}- exploited to pre-add tap delay values and reduce multiplies by “Coefficients 6, 50% a8 Greater Flexi lity with Fully Independent Multipliers + DSP48 Tile Interconnect = Full, independent access to every multiplier * One accumulator for each multipli = 5 Interconnects support up to 50 bit multiplies per tile us ee 25x18 Multiplier DSP48Et Sice ingle DSP slice supports up to 25x18 multiplies — 50% fewer DSP resources required for high-precision multiplies ~ Efficient FFT Implementations — Efficient single-precision floating- © point implementations * Single DSP Tile supports up to 50x36 multiplies * Delivers higher performance and lower power Efficient Rounding Modes using Pattern Matching = Only FPGA architecture eee ae that supports pattern oa detection - — Pattern can be constant (set by © attribute) or C input c Efficient implementation of rounding modes — Symmetric — Convergent — Saturation One Accumulator for each Multiplier " DSP48E1 slice provides an neni ee accumulator for each ei 5 multiplier — 2X more than competitive architectures ° * Up to 48-bits accumulation per DSP slice = 25x18 multiply * Up to 96-bits accumulation per DSP tile = 50x36 multiply Outline ~ > Introduction to 7-Series FPGA > Logic Resources > Memory and DSP48 Resources > VO Resources >XADC > Clocking Resources >Zynq SoC > Summary ee 7-Series FPGA I/O C Wide range of voltages ‘0 1,2V to 3.3V operation C Wide /0 standards support © Single ended and differential ‘0 Referenced voltage inputs 0 3-state capability CO Very high performance © Up to 1600 Mbps LVDS ‘0 Up to 1866 Mbps single-ended for DDR C Easy memory interfacing o Hardware support for QDRII+ and DDR3 C0 Digitally controlled impedance O Power reduction features a Outline > Introduction to 7-Series FPGA > Logic Resources > Memory and DSP48 Resources 21/0 Resources >XADC > Clocking Resources >Zyng SoC > Summary Q . XADC and AMS > XADC is a high quality and flexible analog interface new to the 7-series © Dual 12-bit 1Msps ADCs, on-chip sensors, 17 flexible analog inputs, and track & holds with programmable signal conditioning © 1V input range © 16-bit resolution conversion © Built in digital gain and offset calibration > Analog Mixed Signal (AMS) © Using the FPGA programmable logic to customize the XADC and replace other external analog functions; for example, linearization, calibration, filtering, and DC balancing to improve data conversion resolution @ . Outline > Introduction to 7-Series FPGA > Logic Resources > Memory and DSP48 Resources 21/0 Resources > XADC > Clocking Resources >Zyng SoC > Summary i 7-Series FPGAs Clock Management > Global clock buffers © High fanout clock distribution buffer > Low-skew clock distribution © Regional clock routing > Clock regions o Each clock region is 50 CLBs high and spans half the device > Clock management tile (CMT) © One Mixed-Mode Clock Managers (MMCMs) and one Phase Locked Loop (PLL) in each Clock © Performs frequency synthesis, clock de-skew, and jitter-filtering © High input frequency range > Simple design creation through the Clocking Wizard @ . Outline > Introduction to 7-Series FPGA > Logic Resources > Memory and DSP48 Resources > 1/0 Resources >XADC > Clocking Resources > Zynq Soc > Summary oot Zynq-7000 Family Highlights > Complete ARM@-based processing system © Application Processor Unit (APU) + Dual ARM Cortex™-A9 processors + Caches and support blocks © Fully integrated memory controllers © W/O peripherals > Tightly integrated programmable logic © Used to extend the processing system © Scalable density and performance > Flexible array of /O © Wide range of external multi-standard / © High-performance integrated serial transceivers © Analog-to-digital converter inputs @ . The PS and the PL > The Zynq-7000 AP SoC architecture consists of two major sections © PS: Processing system + Dual ARM Cortex-A9 processor based + Multiple peripherals + Hard silicon core ‘© PL: Programmable logic + Shares the same 7-series programmable logic as — Artix™-based devices: Z-7010 and 2.7020 (high-range V/O banks only) — Kintex™-based devices: Z-7030 and Z-7100 (mix of high-range and high-performance I/O banks) ee, INTEL® AGILEX™ FPGAs AND SoCs Inel® Agilex™ FPGA family leverages heterogeneous 3D system-inpackage (SiP) technology to Intel's first FPGA fabric built on 10nm process technology and 2nd Gen Intel® Hyperflex™ EPGA Architecture to deliver up to 40% higher performance! or up to 40% lower owerl for applications in Data Center, Networking, and Edge compute. Intel® Agilex™ SoC FPGAs also integrate the quad-core Arm* Cortex-AS% processor to provide high system integration inx ACAP a“ Tam B)BFET, Versal SCAP, a fully software-programmable, heterogencous compute platform that combines Seetat~ Engin, Adaptable gines, and Intelligent Ehgines to achieve dramatic performance improvements of up to 20X over today’s fastes EPGA implementations and over LOOX over today’s fastest CPU ismplementations—for Data Center, ‘wired network, SG wireless, and automotive driver assist applications. — + Scalar processing elements (49. CPUs) are vey efficent at complex algorithms with diverse decision tes and a broad set of libraries>but are limited in performance scaling + Vector processing elements (2.9, DSF GPUS) are more efficient ata narrower set of paralleizable compute functions—but they experience latency and efficiency penalties nanan core + rogue a Hotere ny tie! os pti como function, wich makes at latency-

You might also like