SW Controlled Modular eFPGA
SW Controlled Modular eFPGA
Background:
Flex Logix has developed embedded FPGA IP (EFLX® embedded FPGA or eFPGA) that has
been licensed for use in many commercial, aerospace and defense programs. It has also
developed an edge inferencing accelerator, InferX® to efficiently process AI edge inferencing
workloads requiring high throughput for the least power and area. This paper describes
managing and dynamically programming eFPGA designs through software by a host processor
combining patented, silicon proven techniques developed for Flex Logix eFPGA and InferX.
FPGAs have significant advantages to accelerate workloads, but they are not easy to program
and there is a much smaller pool of qualified FPGA programmers than software programmers.
Segment an eFPGA fabric into modules or containers of smaller size and provide each of them
with direct access to DRAM memory and the processor.
It is easy to add a system interconnect/NOC/AXI bus and provide every FPGA module/container
access to memory/processor.
EFLX IP Interface
Sys. Bus Master/Slave Sys. Bus Master/Slave Sys. Bus Master/Slave Sys. Bus Master/Slave
Wrappers Wrappers Wrappers Wrappers
System Interconnect/NoC
EFLX IP Interface
Sys. Bus Master/Slave Sys. Bus Master/Slave Sys. Bus Master/Slave Sys. Bus Master/Slave
Wrappers Wrappers Wrappers Wrappers
System Interconnect/NoC
BRAM BRAM BRAM BRAM
EFLX IP Interface
Sys. Bus Master/Slave Sys. Bus Master/Slave Sys. Bus Master/Slave Sys. Bus Master/Slave
Wrappers Wrappers Wrappers Wrappers
System Interconnect/NoC
Now use the scarce Verilog coders to write compute intensive “subroutines” (aka FPGA code)
that would be programmed into a container; provide it with input data or pointers to data in
system memory; have the eFPGA execute; then deliver the results as output data or as a pointer
to data in system memory.
Store the “subroutines” in on or off -chip memory,” and have the C++ coders write code on the
processor that calls the subroutines when needed using a function call or pragma in their code,
just like they do today for hard-wired co-processors or custom instructions.
Segment an eFPGA into modules or containers of smaller size and provide each of them with
direct access to DRAM memory and the processor.
Some algorithms are simpler and use fewer LUTs. Some use more LUTs.
As an example, using Flex Logix’s flexible interconnect fabric, it is possible to enable containers
to be any rectangular size up to the full size of the array. The array can be as little as 2 or 3 Flex
Logix eFPGA cores or larger as illustrated below.
n
n
tai
tai
ne
ne
r2
EFLX IP Interface
r
Sys. Bu s M aste r/Slave Sys. Bus M aster/Slave Sys. Bu s M aster/Sla ve Sys. Bu s M aste r/Slave Sys. Bus M aste r/Slave Sys. Bu s M aste r/Slave Sys. Bu s M aster/Slave Sys. Bus M a ster/Slave Sys. Bu s M aste r/Slave
1
W rap p ers W rap p e rs W rap pers W ra p p ers W rap p e rs W rap p e rs W ra ppe rs W rap pe rs W ra ppe rs
Co
System Interconnect/NoC
n ta
BRA M BRAM BRAM BRA M BRA M BRAM BRA M BRAM BRA M
i ne
Co
r3
n ta
EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core
ine
r4
EFLX IP Interface
Sys. Bus M aster/Sla ve Sys. Bu s M aster/Slave Sys. Bu s M aste r/Slave Sys. Bus M aste r/Slave Sys. Bus M aster/Slave Sys. Bu s M aster/Sla ve Sys. Bu s M aster/Slave Sys. Bu s M aster/Slave Sys. Bus M aste r/Slave
Co
W rap p ers W rapp ers W ra p pe rs W rap p e rs W rap p e rs W rapp ers W rapp e rs W rap p ers W rap p ers
System Interconnect/NoC
n
tai
ne
r6
EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core
EFLX IP Interface
Sys. Bus M aste r/Slave Sys. Bu s M a ster/Sla ve Sys. Bu s M aster/Slave Sys. Bus M aste r/Slave Sys. Bu s M aster/Slave Sys. Bus M a ster/Slave Sys. Bu s M aste r/Slave Sys. Bu s M a ster/Sla ve Sys. Bu s M aste r/Slave
Co
W rap p e rs W rapp ers W rap p ers W rap pe rs W rapp e rs W rapp e rs W ra ppe rs W rap p ers W rap p ers
System Interconnect/NoC
nt
ain
EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core
EFLX IP Interface
Sys. Bu s M aster/Sla ve Sys. Bu s M aste r/Slave Sys. Bus M aste r/Slave Sys. Bu s M aste r/Slave Sys. Bu s M a ster/Sla ve Sys. Bu s M aster/Slave Sys. Bu s M aste r/Slave Sys. Bus M aste r/Slave Sys. Bus M aster/Sla ve
W rapp ers W rap pers W ra pp ers W rap p e rs W rapp ers W rap pers W rap pe rs W rap p ers W rap p ers
System Interconnect/NoC
FPGAs have always been programmable in seconds from Flash memory – very slow and
generally done infrequently: at boot time or when an upgrade is required; like updating your
iPhone.
EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core
nta
n tai
ine
Processor
ne
r2
EFLX IP Interface
r1
Sys. Bu s M aste r/Slave Sys. Bu s M aster/Slave Sys. Bu s M aster/Slave Sys. Bu s M a ster/Sla ve Sys. Bu s M aster/Slave Sys. Bu s M aste r/Slave Sys. Bu s M aster/Slave Sys. Bu s M aste r/Slave Sys. Bu s M aster/Slave
W ra ppe rs W rapp e rs W rapp e rs W rap pers W rapp ers W ra ppe rs W rap p ers W rap p e rs W rap p ers
RAM RAM RAM
Co
3
n
EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core AES SHA LZW SHA LZW
tain
Encryption Compression
4
EFLX IP Interface
Sys. Bu s M aster/Slave Sys. Bus M a ster/Slave Sys. Bus M a ster/Sla ve Sys. Bus M aster/Slave Sys. Bu s M aster/Slave Sys. Bu s M aste r/Slave Sys. Bu s M aste r/Slave Sys. Bus M a ster/Slave Sys. Bu s M aste r/Slave
Co
W rap pe rs W rap pe rs W rap pers W rap pers W rap pe rs W rapp e rs W rapp ers W rapp e rs W rapp ers
System Interconnect/NoC
nta
EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core
SRAM EFLX IP Interface
RAM RAM RAM RAM RAM
Sys. Bus M aster/Slave Sys. Bus M aster/Slave Sys. Bus M a ster/Slave Sys. Bus M aster/Sla ve Sys. Bus M aster/Slave Sys. Bu s M aster/Slave Sys. Bu s M aster/Sla ve Sys. Bu s M aster/Slave Sys. Bu s M aste r/Slave
RECONFIGURED
Co
W ra ppe rs W rap pe rs W rap pe rs W rap pers W ra p pe rs W ra pp e rs W rapp ers W rapp e rs W rap p ers
System Interconnect/NoC
n tain
BRA M BRA M BRAM BRAM BRA M BRA M BRAM BRA M BRA M LZW UDPIP 1G/10G 4 Channel SHA AES
er
EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core EFLX Core
With the flattening of Moore’s law, dedicated accelerators are being highly leveraged to provide
more compute power, whether they are implemented in ASIC gates or as FPGAs. From a chip
architecture point of view, they are co-processors residing on an internal system bus or accessed
through a chip’s highspeed interfaces. With an integrated eFPGA approach, power is reduced
through the removal of redundant FPGA serdes, latency is greatly improved by eliminating chip
to chip data transfers and costs are lowered through reduced system chip count.
Processor SHA
Processor LWZ
EFLX Core EFLX Core EFLX Core EFLXSHA
Core Elliptic
EFLX Core EFLX Core
Subsystem AES Encryption Subsystem
Encryption Encryption Encryption Encryption
EFLX IP Interface EFLX IP Interface
Sys. Bus Master/Slave Sys. Bus Master/Slave Sys. Bus Master/Slave Sys. Bus Master/Slave Sys. Bus Master/Slave Sys. Bus Master/Slave
Wrappers Wrappers Wrappers Wrappers Wrappers Wrappers
System Interconnect/NoC System Interconnect/NoC
SRAM SRAM
BRAM BRAM BRAM BRAM BRAM BRAM
Reconfigure
EFLX UDPIP
Core 1G/10G 4 Channel
EFLX Core EFLX Core in ~10 µsec EFLX Core Proprietary
EFLX Core EFLX Core
DRAM Ctr/ Network Stack DRAM Ctr/ Accelerator
Cache EFLX IP Interface
Cache EFLX IP Interface
Sys. Bus Master/Slave Sys. Bus Master/Slave Sys. Bus Master/Slave Sys. Bus Master/Slave Sys. Bus Master/Slave Sys. Bus Master/Slave
Wrappers Wrappers Wrappers Wrappers Wrappers Wrappers
System Interconnect/NoC System Interconnect/NoC
Flex Logix eFPGA is proven and available in TSMC 40ULP, 16FFC, 12FFC, N7( in design),
Globalfoundries GF12LP, GF12LP RHBD, GF22FDX (in design) and Sandia 180nm CMOS8.
Porting to new process nodes takes 6 - 9 months. Any size array up to 2M LUTs can then be
produced within a few weeks. The portability and scalability of container/module eFPGA based
accelerators enables use on any process a customer chooses to use today and future chips on
different process nodes.
The above approach is also processor agnostic, so companies can leverage their investment in
existing software frameworks, tools, and applications.
eFPGA will enable data center and communications customers to continue to benefit from the
parallel programmability of FPGA while lowering power, shrinking size, and taking software
control of FPGA to improve productivity and time to market. For all of these reasons, eFPGA
enables a new paradigm shift in computing architecture both improving the compute density per
board or rack through integration and allowing the benefits of eFPGA to be enjoyed by the much
larger contingent of C++ programmers.
Geoff Tate is CEO &Cofounder Flex Logix Technologies, Inc. His background includes:
BSc, Computer Science, University of Alberta. MBA Harvard. MSEE (coursework), Santa Clara
University. 1979-1990 AMD, Senior VP, Microprocessors and Logic with >500 direct reports.
1990 joined 2 PhD founders as founding CEO to grow Rambus from 4 people to IPO to $2
Billion market cap, till 2005.
Add Security And Supply Chain Trust To Your ASIC Or SoC With EFPGAs
Video:
Reconfigurable Computing with Analog and MCUs