Entropy: Optimization For Software Implementation of Fractional Calculus Numerical Methods in An Embedded System
Entropy: Optimization For Software Implementation of Fractional Calculus Numerical Methods in An Embedded System
Article
Optimization for Software Implementation
of Fractional Calculus Numerical Methods
in an Embedded System
Mariusz Matusiak
Institute of Applied Computer Science, Lodz University of Technology, 90-924 Lodz, Poland;
[email protected]
Received: 11 March 2020; Accepted: 11 May 2020; Published: 18 May 2020
Abstract: In this article, some practical software optimization methods for implementations of fractional-
order backward difference, sum, and differintegral operator based on Grünwald–Letnikov definition
are presented. These numerical algorithms are of great interest in the context of the evaluation of
fractional-order differential equations in embedded systems, due to their more convenient form compared
to Caputo and Riemann–Liouville definitions or Laplace transforms, based on the discrete convolution
operation. A well-known difficulty relates to the non-locality of the operator, implying continually
increasing numbers of processed samples, which may reach the limits of available memory or lead
to exceeding the desired computation time. In the study presented here, several promising software
optimization techniques were analyzed and tested in the evaluation of the variable fractional-order
backward difference and derivative on two different Arm
R
Cortex
R
-M architectures. Reductions
in computation times of up to 75% and 87% were achieved compared to the initial implementation,
depending on the type of Arm
R
core.
1. Introduction
Fractional calculus is an increasingly valuable and efficient mathematical tool used in various fields of
science and engineering for the synthesis of highly accurate models of real dynamic systems, developing
precise control algorithms, and performing complex signal processing [1–3]. More and more novel
applications can be found nowadays in all fields of science and engineering. Apart from classic examples
like the modeling of viscoelastic materials [4] and the diffusion process [5,6], it was also found to be useful
in economics for modeling of financial systems and economic growth [7,8], in medicine and biomedical
engineering for disease analysis, drug modeling and administration, and signal acquisition [9–13],
in computer science [14] for the development of neural networks with memory effects, and other
applications with time-varying values of fractional-orders [15]. Only in the field of electrical engineering do
some recent advances involve mathematical descriptions of supercapacitors [16,17], lithium-ion batteries
with nonlinear capacities [18], and nonlinear coils in a ferroresonant circuit [19]. In control engineering,
the algorithms of fractional-order controllers have received much interest, in particular, the fractional-order
PID controller [20–22]. Due to the availability of a higher number of degrees of freedom (additional
two), the more robust control may, in principle, be applied, and desired plant response characteristics
thereby achieved more rapidly [23]. The difficulty of this process, however, is twofold: firstly, the digital
implementation of the fractional-order system, associated with the problem of continuously, linearly
growing numbers of calculations; secondly, the proper selection of values for fractional differentiation and
integration orders. Finite memory available in a target embedded system and the fixed sampling frequency
in real-time applications require limiting the operations, making the implementation less accurate and more
error-prone. In order to resolve these issues, one should consider optimizing the fractional-calculus-based
algorithms, which is usually not essential for simulations performed in computational software such
as MATLAB. For the Grünwald–Letnikov definitions of backward difference/sum and differintegral,
a widely applied approach is the Short Memory Principle, introduced by Podlubny in [3]. Under specific
conditions, however (selected fractional order, range and frequency of the input signal, small buffer sizes),
truncation of past samples may produce significant numerical errors [24,25]. Therefore, some additional
optimization is required. This question is particularly crucial when even more complex algorithms, based
on fractional calculus, are considered. One of the examples may be a closed-loop control system with a
variable fractional-order PID controller with adapting differentiation and integration orders.There are
several types of remedies to this issue.
The first type is a group of various Short Memory methods. The equivalent form of the
Grünwald–Letnikov differintegral, based on the Horner scheme, was demonstrated, e.g., in [21]. In [26],
a simplification was proposed based on dividing a series of coefficients into l parts and substituting groups
of them with constant values. In [6], the adaptive time steps memory method was introduced, whereby
samples in the past are taken at consecutively longer intervals with higher weights.
The second type involves approximation in the frequency domain of the Laplace fractional-order
operator G (s) = sν . Commonly applied techniques are Oustaloup Recursive filter (ORA) [27–29] and
continued fraction expansion (CFE) [30,31], with later reduction of the polynomial order using the
balancing reduction technique [32,33].
In the present article, the focus lies on an additional level of optimization, which is software
optimization, universal from the perspective of any selected fractional-order numerical algorithm.
Depending on the selected target architecture, several different programming techniques for improving
algorithm efficiency may be implemented. The considered solutions imply the use of Arm
R
CMSIS-DSP
library with intrinsic and Single Instruction Multiple Data (SIMD) functions, as well as other hardware
extensions. The research aimed to obtain the highest possible performance, while preserving the ease
of middle-level C programming, ensuring software portability and omitting CPU-specific assembly
code snippets. Several iterations of the tests were conducted using two 32-bit RISC Arm
R
Cortex
R
-M
microcontrollers manufactured by STMicroelectronics. First, the implementations of fractional-order
backward difference and derivative using constant single-precision floating-point aνj binomial coefficients
for varying buffer sizes of L1 = 32 and L2 = 256 values were analyzed. The memory limitations of
the microcontrollers were also investigated. Next, the performance of initial algorithms for fractional
backward difference/sum and differentiator/integrator of variable orders was measured. The latter is
particularly useful for the realization of adaptive fractional-order PIµ(t) Dν(t) controllers with variable
orders of I and D terms. The algorithms were then optimized using the described techniques. In the final
step, fixed-point arithmetic with the conversion of numbers to Qm.n notation [34] with m bits for the
integer part and n bits for the fractional part was applied.
The paper is organized as follows. Section 2 sets out mathematical preliminaries, including
the Grünwald–Letnikov definitions of fractional-order backward difference/sum and differintegral.
In Section 3, the characteristics of the hardware testing platform are presented. Section 4 provides a
detailed description of the algorithms and their implementation. In Section 5, optimization techniques
are proposed, and the results of the conducted experiments are presented. In Section 6, the alternative
approach using fixed-point arithmetic is discussed. Related software is published as the Supplementary
Materials to this paper.
Entropy 2020, 22, 566 3 of 14
2. Mathematical Preliminaries
Definition 1. Grünwald–Letnikov fractional-order backward difference/sum (GL-FOBD/S).
Let n, j, k ∈ N0 , N0 = N ∪ {0}. Classic n-th order backward difference of a function f (t), defined in discrete
time instants t = kh is given by the formula:
n
n
∇nk f (t) = ∑ (−1) j j
f ((k − j)h) (1)
j =0
where (nj) is a binomial coefficient and h ∈ R+ is a finite, constant sampling period. Replacing the order n with an
arbitrary real one ν ∈ R+ results in a definition of the Grünwald–Letnikov fractional-order backward difference [1,3]:
∞ ∞
(ν(ν − 1) . . . (ν − j + 1))
ν
GL
∆νk f (t) = ∑ (−1) j
f ((k − j)h) = ∑ (−1) j f ((k − j)h) =
j =0
j j =0
j!
∞ ∞
(2)
Γ ( ν + 1)
= ∑ (−1) j
f ((k − j)h) = ∑ aνj f ((k − j)h)
j =0
j!Γ(ν − j + 1) j =0
where Γ is the Euler’s gamma function and aνj denotes discrete coefficients also known as the oblivion function [35].
In practice, the function f (t) is defined for t ∈l [t0 , +m∞), where t0 is considered as a starting point. Hence, the
( t − t0 )
infinite sum in Equation (2) is limited by N = h . The recursive formula for aνj is defined as:
(
jν 1 for j = 0
aνj = (−1) = 1+ ν
(3)
j aνj−1 (1 − j ) for j = 1, 2, . . .
Fractional-order backward sum is obtained for negative values of the order µ ∈ R− , e.g., µ = −ν
N
GL [ν(t)]
∑ aj
[ν(t)]
t0 ∆ t f (t) = f ((k − j)h) (4)
j =0
GL [ν(t)] f ( t )
GL [ν(t)] t0 ∆ t
t 0 Dt f (t) = lim (5)
h →0+ h[ν(t)]
For microcontroller implementations, based on the assumption that h is as small as possible, this may be
approximated to:
GL [ν(t)] f ( t )
GL [ν(t)] t0 ∆ t
t 0 Dt f (t) ≈ (6)
h[ν(t)]
Entropy 2020, 22, 566 4 of 14
Non-local fractional operator leads to the increasing number of calculations required to evaluate
the result each step. In the k-th step, kM instructions have to be performed by a microcontroller, where
M denotes the number of instructions for processing a single sample. At a certain step, either the time
required to complete the calculations exceeds the applied constant sampling period h (tkM > h), or the
limit of the memory allocated for the buffers is reached. One of the Short Memory solutions is to limit
the number of samples to L, so the time of calculations t LM is always shorter than h. Thus, the oblivion
function is amended to [26]:
1
for j = 0
a j = (−1) j (ν(ν−1)...j!(ν− j+1))
ν
for 0 < j ≤ L (7)
(−1) L (ν(ν−1)...(ν− j+1))
for j > L
L!
Accuracy of the second solution depends on the selected value of fractional-order and may be applied
when the coefficients drop rapidly to zero (ν > 0). Then, one can approximate the result by limiting the
number of multiplications to L:
1
for j = 0
a j = (−1) j (ν(ν−1)...j!(ν− j+1))
ν
for 0 < j ≤ L (8)
0 for j > L
STM32L152RCT6
Parameter Name STM32F746ZG (Arm
R
Cortex
R
-M7)
(Arm
R
Cortex
R
-M3)
CPU clock frequency (FCPU ) up to 32 MHz up to 216 MHz
256 KB Flash + 32 KB SRAM + 8
Memory (Flash, SRAM) 1024 KB Flash + 320 KB SRAM
KB EEPROM
3× 12-bit 2.4 MSPS ADC, 2× 12-bit
Converters (ADC, DAC) 12-bit 1 MSPS ADC, 12-bit DAC
DAC
Power supply (VDD ) 1.65–3.6 V 1.8–3.6 V
floating-point unit real-time accelerator,
ultra-low-power technology, LCD
Other features DSP instructions, LCD and cam
driver, touch sensor channels
interface
Entropy 2020, 22, 566 5 of 14
The motivation for selecting these models was to test the performance of the evaluation of
fractional-order differential equations using platforms widely applied in industrial applications, equivalent
to expensive DSP processors. Cortex
R
-M microcontrollers usually do not reach the same computation
power, often due to much lower CPU clock frequency (e.g., the TMS320C6678 DSP processor operates at
1.4 GHz). However, they have been equipped with numerous extensions for accelerating calculations,
including Single Instruction, Multiple Data (SIMD) operations, optimized multiply-accumulate (MAC)
and DSP instructions, direct memory access (DMA), and hardware floating-point units (FPU). A significant
advantage is the availability of basic peripherals, memories, communication interfaces, and power
regulators. This offers a low-cost alternative to multi-core systems, with each core dedicated to specific
tasks (e.g., primary DSP core to signal processing tasks and secondary core to an operating system,
communication with external peripherals and power management).
The STM32F746ZG MCU belongs to the High-Performance STM32F7 series. The efficiency of the
Cortex
R
-M7 core has been increased by a 6-stage dual-issue pipeline capable of processing two instructions
per clock cycle. At the fourth stage of the pipeline (Issue), processed instructions are split and further
executed by one of the separated dedicated blocks—an arithmetic logic unit with a SIMD extension,
MAC pipeline, single-precision floating-point pipeline, or branch prediction block.
The STM32L152RCT6 was designed using STM32TM ultra-low-power technology. The primary
feature of the microcontroller is the availability of several low power modes dedicated to battery-powered
applications (in Standby mode current consumption is reduced to only 0.29 µA). Maximum performance
is therefore limited to only 33DMIPS at a clock frequency of 32 MHz. Due to the lack of hardware
floating-point unit, all operations on real numbers are software emulated, which strongly affects the
computation time.
R
Embedded GCC compiler, distributed as part of the GNU Arm Embedded Toolchain v9.2. Several
available optimization levels were tested, starting with the default –O0 (no optimizations) flag. In that
mode, instructions are translated by the compiler line by line, and breakpoints can be placed and hit
anywhere in the executable code. This level is most suitable for the software development process,
providing the most accurate debugging experience and the possibility of reading and modifying variables
at a debug session. The second level was –O2, which is the highest standard-compliant optimization level
that does not introduce a trade-off between the size of the program and its execution speed. This option
is commonly enabled in the release building profiles of numerous GNU projects, including the Linux
kernel. The last level tested was –O3, in which more code optimizations are applied, however, usually
Entropy 2020, 22, 566 6 of 14
at the cost of the increased size of the output binary. This is an outcome of functions inlining and loops
unrolling. Therefore, the program may not become faster in all cases. The Arm
R
GCC also supports the
more aggressive –Ofast optimization level, which replaces math operations with their fast modifications.
However, due to the generation of non-standard-compliant code and potential software vulnerabilities,
this setting was not taken into consideration. A detailed description of all compiler optimization options
can be found in the GCC user manual [41].
1. TRCENA bit [24] in the Debug Exception and Monitor Control Register (DEMCR) set to 1 to enable
use of the trace and debug blocks.
2. CYCCNTENA bit [0] in the DWT Control Register (DWT_CTRL) set to 1 to enable the
CYCCNT counter.
3. Value of the DWT_CYCCNT register initialized to 0.
The addresses of the registers may differ based upon the selected microcontroller model so verification
in the relevant user manual is recommended.
MCUs performance for FOD algorithm FOD (256), output binary size
and time of compilation
5494502 40000 21,55 25
47673 16,45 20
30000
4528319 25000 12,25 15
O2 964535 10,73 10,82
76462 20000
17915 10
15000
4458084 10000
23008
36908
16196
28892
11816
26260
O3 1032832 5
72460 5000
17556
0 0
0 2000000 4000000 6000000 O0 O2 O3
Number of CPU cycles Optimization level
STM32L1 (256 samples) STM32F7 (256 samples) STM32L1 binary size [B] STM32F7 binary size [B]
STM32L1 (32 samples) STM32F7 (32 samples) STM32L1 build time [s] STM32F7 build time [s]
(a) (b)
Figure 1. (a) performance of STM32L152RCT6 (blue, gray) and STM32F746ZG (red, yellow) microcontrollers
as the number of executed CPU cycles, realizing the fractional-order differintegral (Equation (6)) (νconst (t) =
0.7) for different optimization levels O0, O2, O3, and buffer lengths L1 , L2 . Obtained improvement for
both microcontrollers (worst case vs best case, buffer length L1 ): 22% and 63%, respectively; (b) sizes of
the output binaries (columns) and compilation times (polylines) for different optimization levels of the
program. Buffer length L2 = 256.
[ν(t)]
In the next attempt, a variable fractional order ν(t) was introduced. The a j coefficients were
recalculated before the backward difference and differentiator responses in the main program loop.
The value of order ν was linearly incremented each step by small value ∆ = +10−5 . The results are
presented in Figure 2.
MCUs performance for VFOD algorithm VFOD (256), output binary size
and time of compilation
5631734 40000 21,82 25
20,56
Size of output binary [bytes]
O0 2765368
108706 35000 18,68
53276 20
30000
4654906 25000 12,10 12,42 15
O2 985372 11,41
91008 20000
20530 10
15000
4708119 10000
23104
37020
16364
29036
17256
29852
O3 1080892 5
88132 5000
15945
0 0
0 2000000 4000000 6000000 O0 O2 O3
Number of CPU cycles Optimization level
STM32L1 (256 samples) STM32F7 (256 samples) STM32L1 binary size [B] STM32F7 binary size [B]
STM32L1 (32 samples) STM32F7 (32 samples) STM32L1 build time [s] STM32F7 build time [s]
(a) (b)
Figure 2. (a) performance of STM32L152RCT6 (blue, gray) and STM32F746ZG (red, yellow) microcontrollers
realizing the variable fractional-order differintegral (Equation (6)) for different optimization levels O0, O2,
O3 and buffer lengths L1 , L2 . Obtained improvement for both microcontrollers (buffer length L1 ): 19% and
70%, respectively; (b) sizes of the output binaries (columns) and compilation times (polylines) for different
optimization levels of the program. Buffer length L2 = 256.
The numbers of cycles and sizes of the output binaries in the second case were slightly increased
[ν(t)]
due to the additional evaluation of a j . The impact of software emulated floating-point calculations
in the STM32L152RCT6 microcontroller is clearly noticeable. Flags –O2 and –O3 in all cases reduced
Entropy 2020, 22, 566 8 of 14
the computation time and program size, although in the case of the STM32F746ZG program processing
256 samples, the –O2 level gave better results than –O3. No significant differences in compilation times
were observed. The results from the second attempt served as a reference for further optimizations.
5. Optimization
5.4. Implementation
[ν(t)]
The guidelines described above were used to optimize the implementation of computing a j
coefficients, variable fractional-order backward difference, sum, and derivative. Tests were performed
with the same set of parameters, as described in Section 4. The following modifications were applied:
1. The appropriate linked CMSIS-DSP lib file: arm_cortexM3l_math.lib for STM32L152RCT6 (little-endian)
and arm_cortexM7lfsp_math.lib for STM32F746ZG (little-endian, single-precision FPU). Required
macros defined.
2. The implementation of convolution from Equation (4) from the previous step was replaced by the
CMSIS arm_conv_part_f32 function. In addition, the multiplication by [ν1(t)] factor in Equation (6) was
h
performed using the arm_scale_f32 function.
MCUs performance for optimized VFOD Optimized VFOD (256), output binary size
algorithm and time of compilation
4476548 40000 25
Size of output binary [bytes]
O0 685223
87623 35000 19,25
19555 16,70 20
30000
4462756 25000 12,17 12,65 15
O2 668713 11,15
86126 20000
17197 10
15000
4449730 10000
23056
38292
11840
27436
12632
28156
O3 677181 5
84187 5000
15937
0 0
0 1000000 2000000 3000000 4000000 5000000 O0 O2 O3
Number of CPU cycles Optimization level
STM32L1 (256 samples) STM32F7 (256 samples) STM32L1 binary size [B] STM32F7 binary size [B]
STM32L1 (32 samples) STM32F7 (32 samples) STM32L1 build time [s] STM32F7 build time [s]
(a) (b)
Figure 3. (a) performance of STM32L152RCT6 (blue, gray) and STM32F746ZG (red, yellow) microcontrollers
realizing the modified implementation of variable fractional-order differintegral (Equation (6)) for different
optimization levels O0, O2, O3 and buffer lengths L1 , L2 . Obtained improvement for both microcontrollers
(buffer length L1 ): 4% and 19%, respectively; (b) sizes of the output binaries (columns) and compilation
times (polylines) for different optimization levels of the program. Buffer length L2 = 256.
6. Fixed-Point Arithmetic
In the fixed-point arithmetic, the Qm.n notation proposed by Texas InstrumentsTM [34] is used to
represent real numbers devoting a constant number of m bits for integer parts and a constant number of
n bits for fractional parts. One additional bit is reserved for a sign , and a position of the radix point is
fixed. Numbers are stored in integer registers, and all calculations are performed using standard hardware
arithmetic logic unit. The range of a Qm.n number is defined as [−2m−1 , 2m−1 − 2−n ] and the resolution
equals 2−n . Fixed-point has been used most often for low-cost or older microcontrollers without hardware
floating-point units but is also implemented in many high-end DSP applications to increase the overall
performance of the software. The drawbacks of this approach include potential issues with saturation,
precision loss, or selecting an insufficient range of numbers. Thus, more complex implementation capable
Entropy 2020, 22, 566 10 of 14
of handling normalization (bit-shifting) and bounds checking is required. Additional scaling of real
numbers may also be needed.
To compare the performance of the fixed-point arithmetic with the previously described floating-point
approach, the following assumptions were made: in order to avoid saturation, the 32-bit signed integer
q31_t type was used for storing numbers in Q11.21 format, giving a range of [−1024, 1024–2−21 ] and a
resolution of 2−21 (4.768e − 7). This allowed Equation (2) to be implemented without scaling the maximum
number of coefficients (256) and provided a satisfactory resolution. It needs to be stressed, however,
that, for different applications, the above requirements will have to be adapted. As the CMSIS-DSP library
supports only Q1.31, Q1.15, and Q1.7 formats, the author’s implementation of the fixed-point arithmetic
was introduced. The procedure was as follows:
1. The vector of the predefined floating-point input samples, initial fractional-order ν0 = 0.7, and the
sampling time h were converted to Q11.21 format by multiplying the values by 221 and rounding to
the nearest integer.
[ν(t)]
2. The recursive function for calculating a j and fractional differintegral algorithm were modified for
handling fixed-point arithmetic in Q11.21 notation.
[ν(t)]
3. In the main loop, the νQ11.21 order was incremented by one each step and the vectors of the a j
Q11.21
coefficients, as well as the variable fractional-order backward difference and derivative responses,
were recalculated.
Results were later converted back to float type and verified using the open-source Kibo toolbox [47].
This time a significant increase in efficiency can be observed in the Cortex
R
-M3 case (see Figure 4).
Calculation time was reduced by over 84% for both sizes of the buffers with –O2 or –O3 optimization
levels applied. Moreover, Cortex
R
-M3 was found to be faster (assuming the same CPU clock frequency)
R
than Cortex -M7 at executing the same algorithm.
MCUs performance for fixed-point VFOD Fixed-point VFOD (256), output binary size
algorithm and time of compilation
2566355 40000 25
Size of output binary [bytes]
O0 3624326
48464 35000 18,72 19,21
69793 16,24 20
30000
730693 25000 12,36 12,79 15
O2 1128952 11,23
16225 20000
26576 10
15000
733420 10000
24692
37804
12860
26252
13144
26740
O3 1093331 5
15907 5000
24462
0 0
0 2000000 4000000 6000000 O0 O2 O3
Number of CPU cycles Optimization level
STM32L1 (256 samples) STM32F7 (256 samples) STM32L1 binary size [B] STM32F7 binary size [B]
STM32L1 (32 samples) STM32F7 (32 samples) STM32L1 build time [s] STM32F7 build time [s]
(a) (b)
Figure 4. (a) performance of STM32L152RCT6 (blue, gray) and STM32F746ZG (red, yellow) microcontrollers
realizing the fixed-point implementation of variable fractional-order differintegral (Equation (6)) for
different optimization levels O0, O2, O3 and buffer lengths L1 , L2 . Obtained improvement for both
microcontrollers (buffer length L1 ): 67% and 65%, respectively; (b) sizes of the output binaries (columns)
and compilation times (polylines) for different optimization levels of the program. Buffer length L2 = 256.
Entropy 2020, 22, 566 11 of 14
7. Conclusions
In this paper, several approaches have been presented for optimizing microcontroller implementations
of variable fractional-order backward difference, sum, and differintegral. A notable improvement was
achieved for the STM32F746ZG microcontroller by using the CMSIS-DSP library, SIMD extensions,
and a hardware floating-point unit. The performance of the STM32L152RCT6 was reduced, mostly
by the software emulated floating-point arithmetic; however, the conversion to the fixed-point Q11.21
improved it significantly. It should be noted that the fixed-point arithmetic involves more complex
implementation, verification of the results, and requires additional conversion of variables when used in
real-time applications with ADC and DAC converters. The size of the program and the execution speed
were improved further by the proper configuration of the compiler optimization options. For the tested
implementations, the –O2 level provided the best results in most cases. By analyzing the worst-case and
the best-case scenarios, a conclusion can be drawn that in case of STM32L152RCT6 the best combination
was the fixed-point implementation compiled with –O2 flag (87% reduction of the computation time),
whereas, for STM32F746ZG, the CMSIS and hardware FPU-based implementation, also compiled with
–O2 (75% reduction). Like mentioned in Section 4.2, the –O3 level in many cases generated larger code
resulting in longer execution times. With different algorithms and requirements, other levels should be
tested. The presented results may serve as a good starting point for further research on the implementation
of more complex fractional-calculus-based algorithms in embedded systems, including fractional-order
PID control with orders varying in time (VFOPID). In future work, optimizations of the numerical methods
will be investigated, including the parallel implementation on multicore architectures for applications in
closed-loop control systems.
Abbreviations
The following abbreviations are used in this manuscript:
References
1. Oldham, K.B.; Spanier, J. The Fractional Calculus - Theory and Applications of Differentiation and Integration
to Arbitrary Order. In Mathematics in Science and Engineering; Academic Press, Inc.: San Diego, CA, USA, 1974;
Volume 111, ISBN 978-0-12-525550-9. [CrossRef]
2. Miller, K.S.; Ross, B. An Introduction to the Fractional Calculus and Fractional Differential Equations, 1st ed.; John Wiley
& Sons: New York, NY, USA, 1993; ISBN 978-04-7158-884-9.
3. Podlubny, I. Fractional Differential Equations—An Introduction to Fractional Derivatives, Fractional Differential
Equations, to Methods of their Solution and some of their Applications. In Mathematics in Science and Engineering;
Academic Press, Inc.: San Diego, CA, USA, 1999; Volume 198, ISBN 978-01-2558-840-9. [CrossRef]
4. Parsa Moghaddam, B.; Dabiri, A.; Tenreiro Machado, J.A. Application of variable-order fractional calculus in
solid mechanics. In Handbook of Fractional Calculus with Applications. Applications in Engineering, Life and Social
Sciences, Part A; Bǎleanu, D., Mendes Lopes, A., Tenreiro Machado, J.A., Eds.; De Gruyter: Berlin, Germany, 2019;
Volume 7, pp. 207–224, ISBN 978-3-11-057091-5. [CrossRef]
5. Sierociuk, D.; Skovranek, T.; Macias, M.; Podlubny, I.; Petras, I.; Dzielinski, A.; Ziubinski, P. Diffusion process
modeling by using fractional-order models. Appl. Math. Comput. 2015, 257, 2–11. [CrossRef]
6. MacDonald, C.L.; Bhattacharya, N.; Sprouse, B.P.; Silva, G.A. Efficient computation of the Grünwald–Letnikov
fractional diffusion derivative using adaptive time step memory. J. Comput. Phys. 2015, 297, 221–236. [CrossRef]
7. Wang, S.; He, S.; Yousefpour, A.; Jahanshahi, H.; Repnik, R.; Perc, M. Chaos and complexity in a fractional-order
financial system with time delays. Chaos Solitons Fractals 2020, 131, 109521. [CrossRef]
8. Tejado, I.; Pérez, E.; Valério, D. Fractional Derivatives for Economic Growth Modelling of the Group of Twenty:
Application to Prediction. Mathematics 2020, 8, 50. [CrossRef]
9. Sopasakis, P.; Sarimveis, H. Controlled Drug Administration by a Fractional PID. IFAC Proc. Vol. 2014, 47,
8421–8426. [CrossRef]
10. Valentim, C.A.; Oliveira, N.A.; Rabi, J.A.; David, S.A. Can fractional calculus help improve tumor growth
models? J. Comput. Appl. Math. 2020, 379, 112964. [CrossRef]
11. Aliyu, A.I.; Alshomrani, A.S.; Li, Y.; Inc, M.; Baleanu, D. Existence theory and numerical simulation of HIV-I
cure model with new fractional derivative possessing a non-singular kernel. Adv. Differ. Equ. 2019, 2019, 408.
[CrossRef]
12. Al-Shamasneh, A.R.; Jalab, H.A.; Shivakumara, P.; Ibrahim, R.W.; Obaidellah, U.H. Kidney segmentation in MR
images using active contour model driven by fractional-based energy minimization. Signal Image Video Process.
2020, 1–8. [CrossRef]
13. Lv, T.; Tong, L.; Zhang, J.; Chen, Y. A real-time physiological signal acquisition and analyzing method based on
fractional calculus and stream computing. Soft Comput. 2020, 1–7. [CrossRef]
14. Huang, L.L.; Park, J.H.; Wu, G.C.; Mo, Z.W. Variable-order fractional discrete-time recurrent neural networks.
J. Comput. Appl. Math. 2020, 370, 112633. [CrossRef]
15. Patnaik, S.; Hollkamp, J.P.; Semperlotti, F. Applications of variable-order fractional operators: A review. Proc. R.
Soc. A Math. Phys. Eng. Sci. 2020, 476, 20190498. [CrossRef] [PubMed]
16. Freeborn, T.J.; Maundy, B.; Elwakil, A.S. Fractional-order models of supercapacitors, batteries and fuel cells:
A survey. Mater. Renew. Sustain. Energy 2015, 4, 9:1–9:7. [CrossRef]
17. Lewandowski, M.; Orzyłowski, M. Fractional-order models: The case study of the supercapacitor capacitance
measurement. Bull. Pol. Acad. Sci. Tech. Sci. 2017, 65, 449–457. [CrossRef]
18. Zhang, Q.; Li, Y.; Shang, Y.; Duan, B.; Cui, N.; Zhang, C. A Fractional-Order Kinetic Battery Model of Lithium-Ion
Batteries Considering a Nonlinear Capacity. Electronics 2019, 8, 394. [CrossRef]
19. Majka, L.; Klimas, M. Diagnostic approach in assessment of a ferroresonant circuit. Electr. Eng. 2019, 101,
149–164. [CrossRef]
20. Tepljakov, A.; Alagoz, B.B.; Yeroglu, C.; Gonzalez, E.; HosseinNia, S.H.; Petlenkov, E. FOPID Controllers and
Their Industrial Applications: A Survey of Recent Results. IFAC-PapersOnLine 2018, 51, 25–30. [CrossRef]
Entropy 2020, 22, 566 13 of 14
21. Ostalczyk, P.; Brzezinski, D.; Duch, P.; Łaski, M.; Sankowski, D. The variable, fractional-order discrete-time PD
controller in the IISv1.3 robot arm control. Cent. Eur. J. Phys. 2013, 11, 750–759. [CrossRef]
22. El-Khazali, R. Fractional-order PIλDµ controller design. Comput. Math. Appl. 2013, 66, 639–646. [CrossRef]
23. Petráš, I.; Vinagre, B.M. Practical application of digital fractional-order controller to temperature control.
Acta Montan. Slovaca 2002, 7, 131–137. Available online: https://actamont.tuke.sk/pdf/2002/n2/11petras.pdf
(accessed on 11 March 2020).
24. Brzeziński, D.W. Fractional Order Derivative and Integral Computation with a Small Number of Discrete Input
Values Using Grünwald–Letnikov Formula. Int. J. Comput. Methods 2019, 17, 1940006. [CrossRef]
25. Scherer, R.; Kalla, S.L.; Tang, Y.; Huang, J. The Grünwald–Letnikov method for fractional differential equations.
Comput. Math. Appl. 2011, 62, 902–917. [CrossRef]
26. Ostalczyk, P. On simplified forms of the fractional-order backward difference and related fractional-order linear
discrete-time system description. Bull. Pol. Acad. Sci. Tech. Sci. 2015, 63, 423–433. [CrossRef]
27. Oustaloup, A. La commande CRONE : Commande Robuste D’Ordre non Entier; Hermes Science Publications: Paris,
France, 1991; ISBN 978-28-6601-289-2.
28. Oprzȩdkiewicz, K.; Podsiadło, M.; Dziedzic, K. Integer order vs fractional order temperature models in the
forced air heating system. Przegla̧d Elektrotechniczny 2019, 95, 35–40. [CrossRef]
29. Baranowski, J.; Bauer, W.; Zagórowska, M.; Pia̧tek, P. On Digital Realizations of Non-integer Order Filters.
Circuits Syst. Signal Process. 2016, 35, 2083–2107. [CrossRef]
30. Monje, C.A.; Chen, Y.; Vinagre, B.M.; Xue, D.; Feliu, V. Fractional-order Systems and Controls. Fundamentals and
Applications; Advances in Industrial Control; Springer: London, UK, 2010; ISBN 978-1-84996-334-3. [CrossRef]
31. Dastjerdi, A.A.; Vinagre, B.M.; Chen, Y.; HosseinNia, S.H. Linear fractional order controllers; A survey in the
frequency domain. Annu. Rev. Control 2019, 47, 51–70. [CrossRef]
32. Caponetto, R.; Machado, J.T.; Murgano, E.; Xibilia, M.G. Model Order Reduction: A Comparison between Integer
and Non-Integer Order Systems Approaches. Entropy 2019, 21, 876. [CrossRef]
33. Tepljakov, A.; Petlenkov, E.; Belikov, J. Implementation and real-time simulation of a fractional-order controller
using a MATLAB based prototyping platform. In Proceedings of the 13th Biennial Baltic Electronics Conference,
Tallinn, Estonia, 3–5 October 2012; pp. 145–148. [CrossRef]
34. Pyeatt, L.D.; Ughetta, W. Non-Integral Mathematics. In Modern Assembly Language Programming with the ARM
Processor; Pyeatt, L.D., Ughetta, W., Eds.; Elsevier: Amsterdam, The Netherlands, 2016; Chapter 8, pp. 239–292,
ISBN 978-01-2819-221-4, [CrossRef]
35. Ostalczyk, P. Discrete Fractional Calculus: Applications in Control and Image Processing; World Scientific Publishing
Co., Inc.: Singapore, 2016; ISBN 978-98-1472-566-8
36. Mozyrska, D.; Ostalczyk, P. Variable-, fractional-order Grünwald-Letnikov backward difference selected
properties. In Proceedings of the 39th International Conference on Telecommunications and Signal Processing
(TSP 2016), Vienna, Austria, 27–29 June 2016; pp. 634–637. [CrossRef]
37. STMicroelectronics. STM32L15xCC STM32L15xRC STM32L15xUC STM32L15xVC Ultra-low-power 32-bit MCU
ARM-based Cortex-M3, 256KB Flash, 32KB SRAM, 8KB EEPROM, LCD, USB, ADC, DAC. Datasheet—Production Data.
DocID022799 Rev 13. 2017. Available online: https://www.st.com/resource/en/datasheet/stm32l152rc.pdf
(accessed on 11 March 2020).
38. STMicroelectronics. STM32F745xx STM32F746xx ARM-based Cortex-M7 32b MCU+FPU, 462DMIPS up
to 1MB Flash/320+16+4KB RAM, USB OTG HS/FS, ethernet, 18TIMs, 3ADCs, 25 com itf, cam & LCD
Datasheet—Production Data. DocID027590 Rev 4. 2016. Available online: https://doi.org/https://www.
st.com/resource/en/datasheet/stm32f746zg.pdf (accessed on 11 March 2020).
39. STMicroelectronics. UM1079 User Manual. Discovery kits with STM32L152RCT6 and STM32L152RBT6
MCUs. 2017. Available online: http://www.st.com/resource/en/user_manual/dm00093903.pdf (accessed on
11 March 2020).
40. STMicroelectronics. UM1974 User Manual STM32 Nucleo-144 Boards. 2017. Available online: http://www.st.com/
content/ccc/resource/technical/document/user_manual/group0/26/49/90/2e/33/0d/4a/da/DM00244518/
files/DM00244518.pdf/jcr:content/translations/en.DM00244518.pdf (accessed on 11 March 2020).
Entropy 2020, 22, 566 14 of 14
41. Arm Ltd. Using Common Compiler Options. Selecting optimization options. In Arm
R Compiler Version 6.12
User Guide; Arm Ltd.: Cambridge, UK, 2019; pp. 35–37. Available online: https://developer.arm.com/docs/100
748/0612 (accessed on 11 March 2020).
42. Arm Ltd. Data Watchpoint and Trace Unit. In Arm
R Cortex
-M7
R Processor Technical Reference Manual, r1p2 ed.;
Arm Ltd.: Cambridge, UK, 2018; pp. 139–143. Available online: https://developer.arm.com/docs/ddi0489/d
(accessed on 11 March 2020).
43. Arm Ltd. CMSIS-Core (Cortex-M) Intrinsic Functions for SIMD Instructions [only Cortex-M4 and Cortex-M7],
Arm Ltd.: Cambridge, UK, 2019. Available online: https://www.keil.com/pack/doc/CMSIS/Core/html/grou
p__intrinsic__SIMD__gr.html (accessed on 11 March 2020).
44. Arm Ltd. CMSIS-DSP Software Library; Arm Ltd.: Cambridge, UK, 2019. Available online: https://www.keil.c
om/pack/doc/CMSIS/DSP/html/index.html (accessed on 11 March 2020).
45. STMicroelectronics. AN4841 Application Note. Digital Signal Processing for STM32 Microcontrollers Using
CMSIS. Rev 2. 2018. Available online: https://www.st.com/content/ccc/resource/technical/document/
application_note/group0/c1/ee/18/7a/f9/45/45/3b/DM00273990/files/DM00273990.pdf/jcr:content/transl
ations/en.DM00273990.pdf (accessed on 11 March 2020).
46. ARM Ltd. Arm Cortex-M7 Processor Technical Reference Manual, r1p2 ed.; ARM Ltd.: Cambridge, UK,
2018. Available online: https://static.docs.arm.com/ddi0489/f/DDI0489F_cortex_m7_trm.pdf (accessed on
11 March 2020).
47. Noronha, D.H.; Leong, P.H.; Wilton, S.J. Kibo: An Open-Source Fixed-Point Tool-kit for Training and Inference
in FPGA-Based Deep Learning Networks. In Proceedings of the IEEE International Parallel and Distributed
Processing Symposium Workshops (IPDPSW 2018), Vancouver, BC, Canada, 21–25 May 2018; pp. 178–185.
[CrossRef]
c 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article
distributed under the terms and conditions of the Creative Commons Attribution (CC BY)
license (http://creativecommons.org/licenses/by/4.0/).