Towards Intelligent Edge Compu
Towards Intelligent Edge Compu
Article
1 School of Computer Science and Technology, North University of China, Taiyaun 030051, China;
[email protected]
2 School of Computer Science and Technology, Xidian University, Xi’an 710071, China;
[email protected]
3 School of Microelectronics, Northwestern Polytechnical University, Xi’an 710071, China; [email protected]
* Correspondence: [email protected]; Tel.: +86-1500-343-9992
Abstract: Multi-FPGA systems can form larger and more powerful computing units
through high-speed interconnects between chips, and are beginning to be widely used by
various computing service providers, especially in edge computing. However, the new
computing architecture brings new challenges to efficient and reliable task scheduling.
In this context, we propose a resource- and reliability-aware hybrid scheduling method
on Multi-FPGA systems. First, a set of models is established based on the resource/time
requirements, communication overhead, and state conversion process of tasks to further
analyze the constraints of system scheduling. On this basis, the large task is divided into
subtasks based on the data dependency matrix, and the Maintenance Multiple Sequence
(MMS) algorithm is used to generate execution sequences for each subtask to the Multi-
FPGA systems to fully exploit resources and ensure reliable operation. Compared with
state-of-the-art scheduling methods, the proposed method can achieve an average increase
in resource utilization of 7%; in terms of reliability, it achieves good execution gains, with
an average task completion rate of 98.3% and a mean time to failure of 15.7 years.
multiple FPGAs serving as servers to execute tasks and return processing results to the
client. Among them, multiple FPGAs are connected to the central client, and the link is
built based on a dual backup method.
While Multi-FPGA system architectures and Dynamic Partial Reconfiguration (DPR) [5]
techniques offer great flexibility, the overheads incurred during system scheduling manage-
ment and reconfiguration must be carefully considered, as they can easily jeopardize the
performance gains achieved through hardware acceleration [6,7]. The Multi-FPGA architec-
ture is like a single FPGA chip, a ‘black box’ that is invisible to the user. Still, there are some
issues that should be taken into account when implementing applications on it: scheduling
order, deployment regions, resource constraints, and complex communication overheads
between tasks. Combined with the application of DPR technology, further requirements for
segmentation and reorganization of large tasks are presented.
Multi-FPGA systems have multiple FPGA resources, and each FPGA can be divided
into different reconfigurable blocks for fine-grained resource control. Deploying tasks
with different resource requirements and execution times to the corresponding blocks
helps to efficiently utilize resources for further deployment of more hardware accelerators
working simultaneously. Task scheduling implements resource resilience policies as well
as management of task configuration and execution order [8–10]. The DPR technique can
reuse the resources of each FPGA in a time-shared manner, providing more options and
optimizations for resource management and task deployment. Therefore, it is a challenge
to efficiently utilize multiple FPGAs and the resources on each FPGA for more efficient
task execution.
Overall, the task scheduling problem is NP-hard because it is a more general op-
timization version of the Resource Constrained Scheduling Problem (RCSP), which is
NP-complete. At the same time, existing research methods for resource scheduling suf-
fer from problems such as low resource utilization, lack of reliable operation guarantees,
and inapplicability to the Multi-FPGA system. In this work, we propose a resource- and
reliability-aware hybrid scheduling method on a Multi-FPGA system. The proposed
method is experimentally verified and the effectiveness of the management and scheduling
is demonstrated. In this context, our contributions are summarized as follows.
(1) A resource-aware scheduler to make sure that the final system can make use of
the available FPGA resources more efficiently. When scheduling tasks, it can provide the
Electronics 2025, 14, 82 3 of 14
minimum resources for each task as a service objective, i.e., to schedule as few on-chip
resources as possible for all tasks received within a time unit to improve resource utilization.
(2) A reliability-aware scheduler to ensure the the successful execution of tasks and
improve the expected lifetime in the Multi-FPGA system. The scheduler implements multi-
task scheduling across multiple FPGAs, balancing makespan and reliability by placing
tasks in a redundant and load-balanced manner while ensuring execution efficiency.
(3) The experimental results show that the resource management method is able to
effectively combine and place the tasks, with an average increase in resource utilization
of 7%; the reliability-aware scheduling method achieves good execution gains, with an
average task completion rate of 98.3% and a system’s lifetime of 15.7 years.
The paper is structured as follows. Section 2 discusses related work on FPGA task
scheduling. In Section 3, we provide a detailed description of the resource- and reliability-
aware task scheduler. In Section 4, we evaluate the performance of our method. Section 5
concludes the paper and proposes future work.
2. Related Work
Many previous studies have focused on task planning on a single FPGA. A few other
studies in the literature specifically focused on task scheduling in Multi-FPGA systems.
The research course of task scheduling on Multi-FPGA systems is mainly reflected in the
task model, task sequence, and task state management. In application, the main difference
between scheduling algorithms is whether the task execution mode is determined and
whether the task has a deadline time. The scheduling goals of the algorithms are mostly to
achieve shorter task execution time or less resource consumption.
Tang, Y. et al. proposed a method to build task queues based on task deadline and
task slack time and to build a task model including required resources and task execution
time [11]. Deiana, E. A. et al. proposed an improved method in which resources are
requested as late as possible to reduce the amount of resources [12]. Sun, Z. et al. proposed
a two-phase task scheduling approach to optimize task execution efficiency in a multi-
FPGA system but did not consider load balancing of the system [4]. Unlike a software
strategy, the simple hardware real-time scheduler can directly configure the tasks in the
ready queue according to priority and achieve fast scheduling [13], which is simple and
fast for scheduling, but cannot control global indicators such as load balancing.
Some scheduling algorithms are based on shape modeling, by modeling tasks/
schedulable regions as rectangular blocks for task-resource mapping [14–16]. Task schedul-
ing is modeled using Mixed-Integer Linear Programming (MILP) formulas based on an
architecture with FPGA resources [17]. The grey correlation degree indicates the degree
of correlation between two parameters in the system [18]. Using the grey relation theory,
a task model including task size, task shape, execution time, and configuration time is
established, which is a way to generate a scheduling sequence with the shortest execution
time [19]. Considering the task shape and computation time in the task model [20], a task
implementation that is slow to execute but suitable for the current resource is selected
during scheduling. Therefore, the on-chip resources can satisfy the resource requirements
of more tasks, improve resource utilization, and reduce the task waiting time.
In the above-mentioned studies on task scheduling for the Multi-FPGA system, most
of them focus on improving the efficiency of task execution with little consideration of
reliability assurance. In this work, we propose a reliability-aware, multi-task multi-device
scheduling approach that aims to ensure fault tolerance during task execution and to
balance the task load among multiple FPGAs to extend the system lifetime.
Electronics 2025, 14, 82 4 of 14
3.1. Model
3.1.1. Resource Model of the Task
The task model includes the input and output of the algorithm and each task needs
to be represented in the format of the model to generate a task vector. The task model
is divided into a pre-scheduled model m pre and a post-scheduled model m pos , which
are, respectively, used to describe the resource requirements of the task and the re-
source description obtained by the task. The m pre includes width of the resources re-
quired by the task Wi and height of the resources required by the task Hi , processing
time of the data in the current task timesexe , specified deadline Deadlinei , input data
Datain , output data Dataout , and task profile partial BIT biti . The m pre is as follows:
m pre = {Wi , Hi , timeexe , Deadlinei , Datain , Dataout , biti }. The m pos includes the FPGA chip
number at which the task is placed Fi , the coordinate range of the on-chip resources xi ,
yi the start time of the on-chip resources si and end time of the on-chip resources ei , the
required storage resources Datamax , and the time occupied by the bandwidth Bi . The m pos
is as follows: m pos = { Fi , xi , yi , si , ei , Datamax , Bi }.
The data dependency matrix between tasks corresponding to the topology is as follows,
assuming that the amount of data between tasks is 1:
0 1 1 0 0
0 0 0 1 0
0 0 0 1 1 (3)
0 0 0 0 1
0 0 0 0 0
By analyzing the data dependency matrix and the task topology, it can be known that
if the column vector corresponding to a subtask is a 0 vector, it indicates that there is no
unconfigured predecessor task in the subtask, and the row vector of a particular task is a 0
vector, indicating that a subtask has no successor tasks, which signifies the end task of a
large task. By decomposing the matrix, all possible task sequences can be obtained, and the
decomposition process is as follows:
0 1 1 0 0 0
0 0 0 1 0
0 0 1 0 0
i 0
h 0 1 1
0 0 0 1 1 → 0 + 0 1 1 0 0 + (4)
0 0 0 1
0 0 0 0 1 0
0 0 0 0
0 0 0 0 0 0
There is no data dependency in the startup task that can be executed directly. First, the
row and column vector corresponding to the start task is decomposed. From the decomposed
row vector, it can be seen that the tasks of the subsequent positions of task A are B and C, and
Electronics 2025, 14, 82 7 of 14
the column vectors corresponding to tasks B and C become the 0 vectors after decomposing
the row and column vectors of task A. Priority to remove the first column:
0 0 1 0 0
0 0 1 1
0 1 1 0 h i
→ + 1 1 0 0 + 0 0 1 (5)
0 0 0 1 0
0 0 0
0 0 0 0 0
Priority to remove the second column:
0 1 0 0 0
0 0 1 0
0 1 1
0
h i
→ + 0 1 0 0 + 0 0 1 (6)
0 0 0 1 0
0 0 0
0 0 0 0 0
At this point, there is only one task E left in the matrix. After E enters the configuration
queue, all tasks enter the configuration state and an effective sequence is obtained through
the decomposition of the matrix: A → B → C → D → E. It is a safe and effective configu-
ration method to generate task sequences in strict accordance with matrix decomposition.
timeframe that does not affect other tasks. Since the goal of the scheduling algorithm is to
ensure that each task is completed before the deadline and that the same task is executed on
different FPGAs in a dual-mode hot standby for fault tolerance, the time distance (DT) and
resource usage between the current time and the deadline of each configuration sequence
is first calculated. As shown in Figure 5, the process of requesting devices by subtasks uses
back-to-front dynamic allocation. After comparing the deadlines, the largest deadline is
used as the starting point for generating the configuration sequence.
All subtasks form a Device Application Sequence and then enter the Device Config-
uration Sequence. The generation of the device application sequence starts with the last
element of each task sequence, i.e., the end node in the task topology. The device applica-
tion sequence is sorted in descending order of time distance. The task with the largest DT
first selects the device and enters the device configuration sequence. After configuration,
the next task in the same sequence is added to the Device Application Sequence and re-
ordered. Iterating through the above rules dynamically generates the final configuration
sequence. The last action before generating the final configuration sequence is to advance
the configure sequence. Check that there is no downtime when all the tasks are scheduled;
if there is, then the configuration time of all the tasks should be moved forward to eliminate
the downtime.
4. Experimental Evaluation
In this section, we use the Task Graph for Free (TGFF) [21] to generate the Task DAG.
The scheduler based on the Multi-FPGA system was tested to verify its performance and
fault tolerance when scheduling multiple large tasks.
Average Resource Vote (ARV): indicates the resource usage of the task and measures
the effect of the scheduling algorithm to save resources. The expression is as follows:
N
1
ARV =
sizetotal ∗ Ttotal ∑ (sizei ∗ Ti ) (9)
i =0
Task Completion Rate (TCR): indicates the percentage of successfully executed tasks
compared to all scheduled tasks.
Mean Time to Failure (MTTF): indicates the average time before failure. In systems
where hard fault repair cannot be achieved, it is equivalent to the system lifetime.
Table 2. The influence of the task hybrid algorithm and load balancing strategy on the scheduling
effect.
Maximum Resource
ARV Execution Time
Task Sequence Maintenance Approach Utilization Rate
1 2 3 1 2 3 1 2 3
Single-FPGA Task Continuous Placement 15.1% 16.2% 14.9% 34.4% 35.1% 31.1% 1141 1389 1312
Single-FPGA Task Mix Placement 42.6% 45.1% 43.0% 68.3% 74.9% 68.0% 858 1138 1140
Multi-FPGA Task Mix Placement 29.6% 30.4% 30.2% 34.3% 35.6% 34.9% 524 689 627
The above reasons are the task successive placement strategy causes the overall execu-
tion time of the tasks to become longer when dealing with the bursty task collection, and
the much resources remaining indicates that FPGA’s ability for parallel computing is not
utilized to the extent that it is not possible to guarantee that the tasks that are leaning back
will be executed before the deadline. When three tasks are mixed to apply for the resources
of a single FPGA, the three tasks form a mixed task set, the overall execution time decreases,
the resource utilization rises, and compared to sequential placement, the mixed placement
configures more tasks in the FPGA, taking advantage of the FPGA’s parallel nature. Three
tasks are configured in two FPGAs through complex balancing and task mixing, which
reduces the resource residual requirement for a single FPGA compared to mixing three
tasks and reduces the total execution time by joint scheduling of the two FPGA devices,
which can complete the task before the deadline.
Similarly, we use the above scenarios and three scheduling strategies to verify the
reliability performance of the proposed method. The proposed method described in this
paper also aims to extend the MTTF by load balancing to alleviate the stress accumulation on
each FPGA device for a long time. Therefore, we used the method described in Section 3.3
of reference [23] to measure MTTF. The TCR and MTTF results are shown in Table 3. As
shown in the table, the reliability-aware scheduling method proposed in this paper achieves
an average of 98.3% on TCR and 15.7 years on MTTF, which is much higher than the other
two scheduling methods.
The above reasons are due to: (1) simultaneous execution of redundant tasks through
multiple FPGA placements ensures uninterrupted task execution after failures and (2) re-
duced spike stress on a single FPGA by a load-balanced mix of placement tasks on Multi-
FPGA, resulting in extended system lifetime. The experimental results of the Single-FPGA
task mix placement strategy are better than the Single-FPGA task continuous placement
strategy due to the fact that the hybrid placement avoids fault accumulation and stress
accumulation to some extent.
Table 3. The influence of the task hybrid algorithm and load balancing strategy on the scheduling effect.
TCR MTTF
Task Sequence Maintenance Approach
1 2 3 1 2 3
Single-FPGA Task Continuous Placement 66.7% 62.5% 72.5% 5.2 6.0 5.8
Single-FPGA Task Mix Placement 85.0% 82.5% 90.0% 6.3 6.9 6.8
Multi-FPGA Task Mix Placement 100.0% 95.0% 100.0% 15.2 16.0 15.8
Task Continuous Placement, and Single-FPGA Task Mix Placement proposed in this paper.
The task groups are then randomly combined from the tasks in Table 1.
From Table 4, it can be seen that as the number of FPGAs increases, the number of
schedulable task groups increases for all scheduling policies, but the Multi-FPGA Task Mix
Placement policy grows at a much higher rate than the other two policies. The reason is that
the Single-FPGA Task Continuous Placement strategy can only schedule tasks sequentially
and cannot execute tasks simultaneously on the chip. The Single-FPGA Task Mix Placement
strategy can schedule different tasks to be executed on a single slice, so it can flexibly deploy
tasks. The Multi-FPGA Task Mix Placement strategy cannot support more tasks when
the number of FPGAs is small because it needs to deploy tasks redundantly, but with
the increase of FPGAs, its scheduling strategy will be better able to support more tasks.
Moreover, as the number of FPGAs increases, the scheduling strategy will make better use
of the resources of each FPGA, and thus, the number of tasks completed can be significantly
increased. The above experiments also demonstrate that the scheduling method described
in this paper has good scalability on Multi-FPGA systems, which improves the adaptability
of the system in large-scale edge computing scenarios.
Table 4. The number of task groups that can be scheduled with the increase of the amount of FPGAs
under different scheduling strategies.
Author Contributions: Conceptualization, Z.L. and Y.H.; methodology, Z.L. and H.G.; software,
Y.H. and H.G.; validation, Z.L. and J.Z.; formal analysis, Y.H. and J.Z.; investigation, H.G. and J.Z.;
resources, Z.L.; data curation, Y.H. and H.G.; writing—original draft preparation, Z.L. and Y.H.;
writing—review and editing, Z.L. and J.Z.; funding acquisition, Z.L. All authors have read and agreed
to the published version of the manuscript.
Funding: This work was supported in part by the Research Start-up Fund in Shanxi Province under
Grant 980020066, in part by the Management Fund of North University of China: 2310700037HX, and
in part by the Fundamental Research Program of Shanxi Province: 202403021212165.
Electronics 2025, 14, 82 13 of 14
Data Availability Statement: Data are unavailable due to privacy or ethical restrictions.
Acknowledgments: The authors would like to thank the editors and reviewers for their contributions
to our manuscript.
References
1. Bolchini, C.; Sandionigi, C. Design of hardened embedded systems on multi-FPGA platforms. ACM Trans. Des. Autom. Electron.
Syst. (TODAES) 2014, 20, 1–26. [CrossRef]
2. Shan, J.; Lazarescu, M.T.; Cortadella, J.; Lavagno, L.; Casu, M.R. CNN-on-AWS: Efficient allocation of multikernel applications on
multi-FPGA platforms. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2020, 40, 301–314. [CrossRef]
3. Wang, Y.; Liao, Y.; Yang, J.; Wang, H.; Zhao, Y.; Zhang, C.; Xiao, B.; Xu, F.; Gao, Y.; Xu, M.; et al. An FPGA-based online
reconfigurable CNN edge computing device for object detection. Microelectron. J. 2023, 137, 105805. [CrossRef]
4. Sun, Z.; Zhang, H.; Zhang, Z. Resource-Aware Task Scheduling and Placement in Multi-FPGA System. IEEE Access 2019,
7, 163851–163863. [CrossRef]
5. Wang, Z.; Tang, Q.; Guo, B.; Wei, J.B.; Wang, L. Resource Partitioning and Application Scheduling with Module Merging on
Dynamically and Partially Reconfigurable FPGAs. Electronics 2020, 9, 1461. [CrossRef]
6. Najem, M.; Bollengier, T.; Le Lann, J.C.; Lagadec, L. A cost-effective approach for efficient time-sharing of reconfigurable
architectures. In Proceedings of the IEEE 2017 International Conference on FPGA Reconfiguration for General-Purpose
Computing (FPGA4GPC), Hamburg, Germany, 9–10 May 2017; pp. 7–12.
7. Iordache, A.; Pierre, G.; Sanders, P.; de F. Coutinho, J.G.; Stillwell, M. High performance in the cloud with FPGA groups. In
Proceedings of the 9th International Conference on Utility and Cloud Computing, Shanghai, China, 6–9 December 2016; pp. 1–10.
8. Tianyang, L.; Fan, Z.; Wei, G.; Mingqian, S.; Li, C. A Survey: FPGA-Based Dynamic Scheduling of Hardware Tasks. Chin. J.
Electron. 2021, 30, 991–1007. [CrossRef]
9. Ding, B.; Huang, J.; Xu, Q.; Wang, J.; Chen, S.; Kang, Y. Memory-aware Partitioning, Scheduling, and Floorplanning for Partially
Dynamically Reconfigurable Systems. ACM Trans. Des. Autom. Electron. Syst. 2023, 28, 1–21. [CrossRef]
10. Ramezani, R. Dynamic Scheduling of Task Graphs in Multi-FPGA Systems Using Critical Path. J. Supercomput. 2021, 77, 597–618.
[CrossRef]
11. Tang, Y.; Bergmann, N.W. A Hardware Scheduler Based on Task Queues for FPGA-based Embedded Real-time Systems. IEEE
Trans. Comput. 2014, 64, 1254–1267. [CrossRef]
12. Deiana, E.A.; Rabozzi, M.; Cattaneo, R.; Santambrogio, M.D. A multiobjective reconfiguration-aware scheduler for FPGA-based
heterogeneous architectures. In Proceedings of the IEEE 2015 International Conference on ReConFigurable Computing and
FPGAs (ReConFig), Riviera Maya, Mexico, 7–9 December 2015; pp. 1–6.
13. Purgato, A.; Tantillo, D.; Rabozzi, M.; Sciuto, D.; Santambrogio, M.D. Resource-efficient scheduling for partially-reconfigurable
FPGA-based systems. In Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops
(IPDPSW), Chicago, IL, USA, 23–27 May 2016; pp. 189–197.
14. Clemente, J.A.; Resano, J.; Mozos, D. An approach to manage reconfigurations and reduce area cost in hard real-time reconfig-
urable systems. ACM Trans. Embed. Comput. Syst. (TECS) 2014, 13, 90. [CrossRef]
15. Wang, G.; Liu, S.; Nie, J.; Wang, F.; Arslan, T. An online task placement algorithm based on maximum empty rectangles in
dynamic partial reconfigurable systems. In Proceedings of the IEEE 2017 NASA/ESA Conference on Adaptive Hardware and
Systems (AHS), Pasadena, CA, USA, 24–27 July 2017; pp. 180–185.
16. Kao, C.C. Performance-Oriented Partitioning for Task Scheduling of Parallel Reconfigurable Architectures. IEEE Trans. Parallel
Distrib. Syst. 2015, 26, 858–867. [CrossRef]
17. Resano, J.; Mozos, D.; Catthoor, F. A Hybrid Prefetch Scheduling Heuristic to Minimize at Run-time the Reconfiguration Overhead
of Dynamically Reconfigurable Hardware. In Proceedings of the IEEE Design, Automation and Test in Europe, Munich, Germany,
7–11 March 2005; pp. 106–111.
18. Liu, S.; Forrest, J.; Vallee, R. Emergence and development of grey systems theory. Kybernetes 2009, 38, 1246–1256. [CrossRef]
19. Wu, J.O.; Wang, S.F.; Fan, Y.H.; Chien, W. The scheduling and placement strategies for FPGA dynamic reconfigurable system. In
Proceedings of the IEEE 2016 International Conference on Applied System Innovation (ICASI), Okinawa, Japan, 26–30 May 2016;
pp. 1–2.
20. Wassi, G.; Benkhelifa, M.E.A.; Lawday, G.; Verdier, F.; Garcia, S. Multi-shape tasks scheduling for online multitasking on FPGAs.
In Proceedings of the IEEE 2014 9th International Symposium on Reconfigurable and Communication-Centric Systems-on-Chip
(ReCoSoC), Montpellier, France, 26–28 May 2014; pp. 1–7.
21. Dick, R.P.; Rhodes, D.L.; Wolf, W. TGFF: Task graphs for free. In Proceedings of the IEEE Sixth International Workshop on
Hardware/Software Codesign (CODES/CASHE’98), Seattle, WA, USA, 15–18 March 1998; pp. 97–101.
Electronics 2025, 14, 82 14 of 14
22. Zhang, H.; Bauer, L.; Kochte, M.A.; Schneider, E.; Wunderlich, H.J.; Henkel, J. Aging resilience and fault tolerance in runtime
reconfigurable architectures. IEEE Trans. Comput. 2016, 66, 957–970. [CrossRef]
23. Li, Z.; Huang, Z.; Wang, Q.; Wang, J. AMROFloor: An Efficient Aging Mitigation and Resource Optimization Floorplanner for
Virtual Coarse-Grained Runtime Reconfigurable FPGAs. Electronics 2022, 11, 273. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
Reproduced with permission of copyright owner. Further reproduction
prohibited without permission.