Using Advanced FPGA SoC Technologies For The Design of Industrial Control Applications
Using Advanced FPGA SoC Technologies For The Design of Industrial Control Applications
Abstract—Modern industrial control systems must offer per- Arrays (FPGAs). While DSPs generally include special pur-
formance, flexibility and reliability. On the same time, they pose computational hardware to improve performance, like
need to reach the market as early as possible and at low cost. floating point coprocessors or multiply and accumulate ALUs,
Finally, they need to operate as embedded devices with low and fewer peripherals than common microcontrollers, they can
power budget. On top of that, the algorithms that they implement be considered to belong to the same family of development
are getting even more sophisticated, advanced and demanding.
To cope with all these diverse requirements, control system
platforms. With this family, applications are written most of
designers are moving with fast steps to the digital hardware the times in C/C++ and pass through a number of powerful
design field and specifically, FPGAs, System-on-Chip architec- tools like cross-compilers, linkers, debuggers and simulators,
tures and productivity improving methodologies like High-Level to meet design constraints.
Synthesis, which uses C/C++ as an abstract hardware description
language. In this paper, using these tools, the implementation of Recently, the technological advances in FPGA devices,
3 control algorithms is shown, the classical PID algorithm, a offering hundreds of GFLOPs with maximum power efficiency,
Fuzzy Logic Controller (FLC) and an Adaptive or Tuning Fuzzy has established a second powerful development family. FPGAs
Logic Controller (TFLC). The novelty of the proposed approach is have been proposed as an implementation platform between
that through specific coding and compiler directives, the C/C++ hardware and software. They consist of specially designed
input descriptions are automatically implemented as advanced hardware modules connected with efficient circuit switch-
multicore architectures (3 most advanced of them are put to ing interconnections, offering hardware-like performance and
extensive experimentation and compared), which execute up to
software-like flexible, dynamic reconfiguration. FPGA pro-
500K algorithm iterations in less that 1 sec, taking advantage
of an embedded ARM family microcontroller and common gramming is based on Hardware Description Languages
memory blocks found in the underlying FPGA implementation (HDLs) like VHDL or Verilog. HDL programming however
device. This is a substantial performance improvements and a requires domain specific knowledge and can therefore keep
high productivity boost, with very promising future extension non-expert designers away and impose a negative impact on
capabilities. productivity.
Keywords—High-Level Synthesis; Digital Control; Multicore To improve designer productivity and reduce time-to-
Architectures; FPGAs; SoC; market, modern design techniques like High-Level Synthesis
(HLS), Electronic System Level (ESL) design or, in simpler
I. I NTRODUCTION terms, C based hardware design can be adopted. HLS, ESL
and C based hardware design [2], all more or less involve the
Modern industrial control systems need to comply to automatic translation of untimed C/C++ algorithmic descrip-
different requirements to make a high and fast market impact. tions into Register-Transfer Level (RTL) HDL architectural
From the designer’s point of view, all requirements can be descriptions, ready for FPGA implementation. As a research
summarized into two key factors: improve quality (in terms topic it started more than 30 years ago, and can be divided into
of performance, resource usage, power dissipation, etc.) and three generations [3]. The latest, third generation, starting in
reduce time-to-market. 2000 and lasting up to now, is more mature, starts from system
level languages and mainly C/C++ (so the term C based design
The first step in achieving these goals is the adoption of has prevailed), offers a different design paradigm separated
digital over analog control methodologies, accompanied by from RTL and HDLs and, based on recent advances in FPGA
efficient development environments [1]. Digital control can technology, quality of results is highly improved.
be performed with common microcontrollers, Digital Signal
Processor (DSP) controllers, or Field Programmable Gate Digital industrial control methodologies and implementa-
tion technologies, like microcontrollers, DSPs and FPGAs are
This research has been co-financed by the European Union (European
gaining wider and wider acceptance during the last years. Es-
Social Fund - ESF) and Greek national funds through the operational program
“Education and Lifelong Learning” of the National Strategic Reference Frame-
pecially FPGAs, have introduced a variety of well established
work (NSRF) - Research Funding Program: ARCHIMEDES III: Investing in and efficient hardware design solutions into the industrial
knowledge society through the European Social Fund. control arena. These include HDLs [4], C based design and
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on December 04,2022 at 20:14:16 UTC from IEEE Xplore. Restrictions apply.
HLS [5], Programmable Logic Controller (PLC) code to HDL II. P ROPOSED M ETHODOLOGY
translator [6], SoC [7] and MultiProcessor SoC (MPSoC) [8]
architectures, hardware/software codesing [9] and run-time The design methodology proposed in this paper consists of
reconfiguration [10]. Also, FPGAs have been used for the two steps. First, the generation of a high performance hard-
implementation of different controller types [11]. However, ware accelerator and second, the integration of the generated
even though the opportunities seem to be more than what accelerator together with a reference SoC architecture. In both
designers need, in each case, a cost-benefit approach should be steps, the underlying technologies aim at improving designer
followed and Pareto optimal solutions should be investigated productivity, while maintaining quality of results.
based on the constraints imposed by the specific industrial The proposed tool flow (without loss of generality and due
environment in each case. to maturity) is based on Xilinx Vivado Design Suite (or, for
Following these past approaches, this paper presents a short Vivado), which supports SoC architectures of a single or
performance improving C based tool flow applied to a scalable, dual core embedded processor (ARM, PowerPC, MicroBlaze),
multicore FPGA-based System-on-Chip (SoC) architecture for playing the role of the AXI bus [13] master. Two Vivado tools
digital controllers. This architecture combines a common RISC are involved for hardware and software specification and de-
microcontroller and a number of special purpose coprocessors sign, IP Integrator (IPI) and Xilinx Software Development Kit
on the same FPGA device. The microcontroller is used for gen- (SDK). As their name implies, IPI is a GUI enriched hardware
eral purpose chores like communication with common devices specification tool, with links to low level implementation tools
(VGA, HDMI, TFT or LCD displays, buttons and switches, for full, top-down hardware design, and SDK is an Eclipse
external memory cards, UART, ethernet, bluetooth, GSM), for based software development environment. For the proposed
which a lot of work is available (drivers, applications), either methodology, that is the design of high performance hardware
public or non-public domain. The microcontroller can even accelerators, another tool is involved, Xilinx Vivado HLS. This
host Linux flavors, improving the usability and flexibility of tool is used for the design of an AXI slave, which implements
the resulting device. On the other hand, the special purpose the functionality of the accelerator and is connected to the
coprocessors are connected to the microcontroller bus with SoC architecture through simple or streaming AXI interfaces
different architectural options, that are selected with proper (depending on throughput and latency requirements) as well as
coding guidelines and compiler directives. Coprocessors are common on-chip memory blocks. Vivado HLS accepts untimed
used to implement demanding control applications, like classi- C/C++ algorithmic descriptions and based on the dependencies
cal Proportional-Integral-Derivative (PID) algorithm, a Fuzzy described with the used language constructs (assignments,
Logic Controller (FLC) and an Adaptive or Tuning Fuzzy Logic loops, conditionals) and user specified constraints and prefer-
Controller (TFLC). Without loss of generality, the implemen- ences (through appropriate compiler directives), generates opti-
tation presented in this paper as well as the corresponding tool mum technology aware RTL descriptions. The novel approach
flow are those offered by Xilinx, because of their maturity at of this flow is that it accepts only C/C++ input and produces
the current moment. However, other FPGA vendors are also optimized final implementation bitfiles.
preparing comparable solutions, so the corresponding reference A simple run through the tool flow starts from Vivado HLS,
architecture will probably be universally supported in the near where the C/C++ input specification of the hardware accelera-
future. tor is given, and through simulation, HLS and verification, an
optimized hardware component is generated. This component
The advantages and novelties presented in this paper are:
is then packed into a special type IP, an IP-XACT, suitable
i) demanding control applications are performance enhanced
for IPI, together with the selected AXI interfaces. Next, IPI
by designing a special purpose hardware coprocessor, handling
is called to specify the SoC architecture, using an embedded
aggressive application and technology constraints, ii) fixed and
processor, the high performance IP-XACT core and generic
floating point calculations are supported, through vendor sup-
widely used components taken from an IP catalog. The whole
plied and optimized arithmetic IP cores [12], improving quality
system specification is synthesized into an FPGA specific
of results without special and time consuming designer effort,
netlist and then mapped into a feasible bitfile. Next, this bitfile
iii) the resulting embedded device offers advanced and flexible
is passed to SDK, where low level software components (OS,
integration options, taking advantage of common peripherals
drivers, libraries) are automatically generated and application
connected to a RISC microcontroller (ARM, PowerPC, Mi-
software components are developed. Finally, the generated
croBlaze) and Linux, and finally, iv) the whole design (both
application Executable and Linkable Format (ELF) file is send
hardware and software) is done in C/C++, improving designer
back to IPI, where an implementation bitfile is generated,
productivity and avoiding HDLs and other time consuming and
containing both hardware and software components, ready to
error introducing procedures, without loss of performance.
program the FPGA device. A more detailed presentation of the
These advantages are presented in the rest of this paper and above mentioned tools are out of the scope of this publication,
justified with a set of experimental results, with 3 advanced mi- as well as optimizations and flow iterations supported to satisfy
crocontroller/coprocessor connection architectures within the design constraints. These can be found in the accompanying
overall proposed SoC architecture. With these experiments its user manuals.
is shown that the proposed environment and the corresponding
tool flow is an efficient rapid prototyping development platform III. P ROPOSED A RCHITECTURES
for digital control applications reaching performance of 500K
iterations in less than 1 sec, meeting modern design constraints While the material presented in the previous section is
and requirements, and offering promising future extension a reference to widely used software packages, this section
capabilities. presents novel work, aiming at the efficient and more cost
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on December 04,2022 at 20:14:16 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. Bus based architecture. Fig. 2. BRAM architecture.
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on December 04,2022 at 20:14:16 UTC from IEEE Xplore. Restrictions apply.
Fig. 3. BRAM/LUT architecture.
access LUT memory can be fully unrolled. This way all code
that access local memory can be executed in parallel, limited
only by data dependencies of the datapath. Also, the LUT
memory initialization loop can be unrolled two times, because Fig. 4. BRAM/LUT multicore architecture.
this is the maximum parallelization that can be achieved with
a dual port BRAM memory. Finally, function calls within
the datapath can be either inlined or pipelined, and the best
result is chosen for implementation. Inlining increases the
code fragments that can be accessed in parallel, with more
complicated control hardware however. Pipelining improves
execution time by overlapping consecutive function calls. For
each algorithmic description, either inlining or pipelining may
give better results so, a trial and error approach is required.
In brief, a representative example of the directives of the
third architecture are given below, where bram is the BRAM
memory variable and lut is the LUT memory variable, ini-
tialization and computation are labels of a LUT initialization
and a heavy computational loop respectively, and datapath is Fig. 5. The structure of the lopper control system.
a computational intensive function call.
algorithm, a FLC algorithm used in the lopper control of
set directive i n t e r f a c e −mode ap memory ” f o o ” bram a rolling mill reported in [14] and the adaptive or (TFLC)
set directive a r r a y p a r t i t i o n −t y p e c o m p l e t e ” f o o ” l u t
set directive u n r o l l −f a c t o r 2 ” foo / i n i t i a l i z a t i o n ”
algorithm found in the same publication (shown in figure
set directive u n r o l l ” foo / computation ” 5). All implementations used single precision floating point
set directive pipeline ” datapath ” calculations natively supported by the latest versions of Vivado
HLS and linked to vendor supplied, optimized implementations
Finally, the fourth SoC architecture, called LUT/i (ex- [12]. The PID algorithm requires 3 floating point multiplica-
plained later) is given in figure 4. The accelerator is the same tions and 5 floating point additions, the FLC algorithm requires
as the one found in the third architecture however, more than 26 floating point multiplications, 10 floating point divisions
one accelerators are used and specifically i. This way, provided and 22 floating point additions and the TFLC algorithm
the algorithmic description contains computations that can requires 106 floating point multiplications, 57 floating point
be executed in parallel using the local LUT based memory divisions and 71 floating point additions. For each algorithm,
copy, maximum parallelism can be achieved performance a C description was written upon which algorithmic and
measurements can satisfy demanding, high throughput appli- architectural optimizations were applied, resulting in deep
cations. For this architecture no extra algorithmic constructs optimization and design space exploration.
are required or tool directives. The only difference with the
third architecture is that LUT memory in each accelerator is FPGA implementations were based on the latest develop-
a part of the BRAM memory, whose length is determined by ment in FPGA technology, Xilinx’s 7 series All Programmable
the number of accelerators used. System-on-Chip, offering breakout performance, capacity, and
system integration, while optimizing price/performance/watt,
and specifically the Zynq XC7Z020 device (53200 Look-Up
IV. E XPERIMENTAL R ESULTS Table generators - LUTs, 106400 D-type Flip-Flops - DFFs
The presented design methodology and corresponding tool and 220 special purpose DSP blocks), found in the Zedboard
flow has been tested with a number of control algorithms. For evaluation board. Of all the architectures presented in the
each algorithm, a number of FPGA implementations has been previous section, implementations were taken for the 3 most
generated and performance and hardware usage measurements advanced, the BRAM, LUT and the LUT/i with i=4. This was
have been taken. Details about all experimental setups and all selected because the first architecture, the BUS architecture,
measurements are given below. has been proven to perform much more inferior than the
others [15] (more than 50X performance overhead, due to the
Specifically, 3 control algorithms have been implemented, bus bottleneck). For all 3 architectures and all 3 algorithms
starting from C behavioral specifications. The classical PID resource usage, performance for 1024-524288 or 1K-500K
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on December 04,2022 at 20:14:16 UTC from IEEE Xplore. Restrictions apply.
TABLE I. BRAM ARCHITECTURE IMPLEMENTATION WITH THE Z YNQ
Z EDBOARD .
Algorithm Area
LUTs DFFs DSPs
PID 1287 841 8
(2.42%) (0.79%) (3.64%)
FLC 13161 6649 21
(24.74%) (6.25%) (9.55%)
TFLC 53119 30823 168
(99.85%) (28.97%) (76.36%)
Algorithm Area
LUTs DFFs DSPs
PID 1895 1244 10
(3.56%) (1.17%) (4.55%)
FLC 13715 7798 24
(25.78%) (7.33%) (10.91%)
TFLC 52936 31311 168 Fig. 6. Performance of the PID algorithm.
(99.50%) (29.43%) (76.36%)
V. C ONCLUSIONS
In this paper, a design environment and the corresponding
tool flow has been presented, that utilizes C based hardware
design, for the development of digital control applications. Fig. 8. Performance of the TFLC algorithm.
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on December 04,2022 at 20:14:16 UTC from IEEE Xplore. Restrictions apply.
TABLE III. Z EDBOARD PERFORMANCE MEASUREMENTS (1/3).
Algorithm Execution time (ms)
Iterations
1024 2048 4096 8192
Architectures
BRAM LUT LUT/4 BRAM LUT LUT/4 BRAM LUT LUT/4 BRAM LUT LUT/4
PID 0.27 0.19 0.05 0.53 0.39 0.10 1.06 0.78 0.19 2.13 1.56 0.39
FLC 1.41 1.34 0.34 2.83 2.68 0.67 5.65 5.37 1.34 11.30 10.73 2.68
TFLC 1.98 1.83 0.46 3.95 3.67 0.92 7.91 7.33 1.83 15.81 14.67 3.67
Specifically, the presented tool flow supports C input al- [6] S. Subbaraman, M. M. Patil, and P. S. Nilkund, “Novel integrated de-
gorithmic descriptions that pass through HLS and FPGA velopment environment for implementing PLC on FPGA by converting
implementation tools to implement the selected algorithms ladder diagram to synthesizable VHDL code,” in 11th International
Conference on Control Automation Robotics and Vision. IEEE, 2010,
as multicore, embedded designs, offering performance im- pp. 1791–1795.
provements and hardware utilization efficiency. Overall, the [7] A. Ben Said, M.and Hemdani, M. W. Naouar, E. Monmasson, and
proposed methodology and underlying tool flow support a I. Slama-Belkhodja, “Standard FPGA-based or full cSoC controllers
novel high productivity prototyping platform for digital con- for three-phase PWM boost rectifier, two viable solutions,” in 15th In-
trol applications, offering performance, resource and power ternational Power Electronics and Motion Control Conference. IEEE,
2012.
improvements compared other implementation architectures.
The use of local memory, through coding styles and compiler [8] S. Ben Othman, A. K. Ben Salem, H. Abdelkrim, and S. Ben Saoud,
“MPSoC design approach of FPGA-based controller for induction motor
directives, offers overall best solutions, showing that C based drive,” in International Conference on Industrial Technology. IEEE,
design can be tuned to offer both quality-of-results and reduced 2012, pp. 134–139.
time-to-market. [9] E. Monmasson, I. Bahri, L. Idkhajine, A. Maalouf, and W. M. Naouar,
“Recent advancements in FPGA-based controllers for AC drives appli-
cations,” in 13th International Conference on Optimization of Electrical
R EFERENCES and Electronic Equipment. IEEE, 2012, pp. 8–15.
[10] M. W. Naouar, E. Monmasson, A. A. Naassani, and I. Slama-Belkhodja,
[1] E. Monmasson, L. Idkhajine, M. N. Cirstea, I. Bahri, A. Tisan, and
“FPGA-based dynamic reconfiguration of sliding mode current con-
M. W. Naouar, “FPGAs in industrial control applications,” IEEE Trans-
trollers for synchronous machines,” IEEE Transactions on Industrial
actions on Industrial Informatics, vol. 7, no. 2, pp. 224–243, 2011.
Informatics, vol. 9, no. 3, pp. 1262–1271, 2013.
[2] P. Coussy and A. Morawiec, High-level Synthesis: From Algorithm to
[11] E. Monmasson and M. N. Cirstea, “FPGA design methodology for
Digital Circuit. Springer-Verlag, 2008.
industrial control systems - a review,” IEEE Transactions on Industrial
[3] G. Martin and G. Smith, “High-level synthesis: Past, present, and Electronics, vol. 54, no. 4, pp. 1824–1842, 2007.
future,” IEEE Design and Test of Computers, vol. 26, no. 4, pp. 18–25, [12] D. Bagni and D. Mackay, “Floating-point PID controller design with
2009. Vivado HLS and system generator for DSP,” Xilinx Application Note
[4] S. Ghosh, R. K. Barai, S. Bhattarcharya, P. Bhattacharyya, S. Rudra, XAPP1163, 2013.
A. Dutta, and R. Pyne, “An FPGA based implementation of a flexible [13] ARM Ltd., AMBA AXI and ACE Protocol Specification, 2013.
digital PID controller for a motion control system,” in International
Conference on Computer Communication and Informatics. IEEE, [14] F. Janabi-Sharifi and J. Fan, “A learning fuzzy system for looper control
2013. in rolling mills,” in International Conference on Systems, Man, and
Cybernetics. IEEE, 2000, pp. 3722–3727.
[5] D. Navarro, O. Lucia, L. A. Barragan, I. Urriza, and O. Jimenez,
“High-level synthesis for accelerating the FPGA implementation of [15] C. Economakos, M. Tzamtzi, M. Skarpetis, and G. Economakos,
computationally-demanding control algorithms for power converters,” “Performance improvements in a modern hardware design environment
IEEE Transactions on Industrial Informatics, vol. 9, no. 3, pp. 1371– for control applications,” in International Conference on Industrial
1379, 2013. Technology. IEEE, 2015.
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on December 04,2022 at 20:14:16 UTC from IEEE Xplore. Restrictions apply.