FPGA-Based_Real-Time_Object_Detection_and_Classification_System_Using_YOLO_for_Edge_Computing
FPGA-Based_Real-Time_Object_Detection_and_Classification_System_Using_YOLO_for_Edge_Computing
ABSTRACT The leap forward in research progress in real-time object detection and classification has been
dramatically boosted by including Embedded Artificial Intelligence (EAI) and Deep Learning (DL). Real-
time object detection and classification with deep learning require many resources and computational power,
which makes it more difficult to use deep learning methods on edge devices. This paper proposed a new,
highly efficient Field Programmable Gate Array (FPGA) based real-time object detection and classification
system using You Only Look Once (YOLO) v3 Tiny for edge computing. However, the proposed system
has been instantiated with Advanced Driving Assistance Systems (ADAS) for evaluation. Traffic light
detection and classification are crucial in ADAS to ensure drivers’ safety. The proposed system used a
camera connected to the Kria KV260 FPGA development board to detect and classify the traffic light. Bosch
Small Traffic Light Dataset (BSTLD) has been used to train the YOLO model, and Xilinx Vitis AI has been
used to quantify and compile the YOLO model. The proposed system can detect and classify traffic light
signals from a high-definition (HD) video streaming in 15 frames per second (FPS) with 99% accuracy.
In addition, it consumes only 3.5W power, demonstrating the ability to work on edge devices. The on-road
experimental results represent fast, precise, and reliable detection and classification of traffic lights in the
proposed system. Overall, this paper demonstrates a low-cost and highly efficient FPGA-based system for
real-time object detection and classification.
INDEX TERMS FPGAs, object detection and classification, YOLO, edge computing.
2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
73268 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 12, 2024
R. A. Amin et al.: FPGA-Based Real-Time Object Detection and Classification System
offer superior performance in both localization and recogni- 1. This study presents a new system on the Xilinx Kria
tion accuracy [9]. However, a significant drawback is their KV260 FPGA board, enhancing real-time object detec-
demand for substantial amounts of annotated data for train- tion and classification with an optimized YOLO v3
ing. Another algorithm, the Single Shot MultiBox Detector Tiny deep learning model.
(SSD) [10], exhibits relatively poor performance in han- 2. Instantiate the proposed object detection system to
dling small objects [7]. The most noteworthy algorithm in effectively identify and classify traffic light signals
this category is the You Only Look Once (YOLO) [11], with precision and accuracy.
which has several versions. Compact iterations of the YOLO 3. Evaluate the performance of the proposed system
algorithm have been developed to ensure efficient execution by comparing with state-of-the-art object detection
on hardware-constrained devices [8]. Wu et al. [12] and Abra- system.
ham et al. [13] have presented traffic light detection systems
using YOLO. In addition, Fernando et al. [14] and Huy II. MOTIVATION AND RELATED WORK
et al. [15] have also demonstrated traffic light detection and Implementing YOLO-based object detection on FPGA brings
classification using YOLO. Furthermore, various methods forth both practical and theoretical implications. From a
based on Lightweight [16], PID Controller [17], Salience- practical standpoint, it enables real-time processing of object
Sensitive Loss [18], vehicular ad-hoc networks [19], support detection tasks with minimal latency, rendering it suitable
vector machine [20], and Adaptive Background Suppression for applications necessitating swift decision-making, such as
Filter [21] have been reported for traffic light detection and autonomous vehicles and surveillance systems. The FPGA’s
classification. customizable hardware architecture facilitates optimized per-
Edge computing and object detection are two critical com- formance and resource efficiency, resulting in reduced power
ponents of modern technology, working hand in hand to pro- consumption and cost-effective solutions, particularly bene-
vide efficient and real-time solutions in various applications ficial for embedded systems and IoT devices. On a theoretical
using DL frameworks. However, DL models require high level, this implementation underscores the potential synergy
computational power and network bandwidth. Accelerating between deep learning algorithms like YOLO and hard-
the DL model (e.g., YOLO) in edge devices can improve the ware acceleration techniques such as FPGA, thereby laying
system performance. Therefore, using field-programmable the groundwork for further advancements in edge com-
gate arrays (FPGAs) in object detection and classification puting, where computational resources are constrained yet
systems enhances operational security and real-time comput- real-time processing is imperative. Furthermore, delving into
ing ability while reducing prices. the theoretical realm of optimizing deep learning models for
FPGAs have proven to maintain safety critical systems FPGA architectures can drive progress in hardware-software
with their remarkable computational ability. FPGAs are expe- co-design methodologies, ultimately fostering the develop-
riencing rapid growth in the domain of Artificial Intelligence ment of efficient and scalable AI systems across diverse
(AI) acceleration, driven by their capacity for parallel pro- applications.
cessing and architectural optimizations [22]. FPGA-based Numerous methods have endeavored to develop real-time
implementations of deep learning models yield higher speed object detection systems tailored for edge devices. Neverthe-
and accuracy with lower power consumption than software less, this section delves into an exploration of contemporary
and image-based systems. To address the limitations associ- state-of-the-art object detection methodologies specifically
ated with the low flexibility and accuracy of software and designed for deployment on edge devices. Table 1 encap-
image-based algorithms, as well as the high power con- sulates a comprehensive summary of these object detection
sumption of such methods, this paper presents a real-time systems, providing a comparative overview of their respective
and lightweight system for the detection and classification features and performance metrics.
of traffic lights in autonomous vehicles. The YOLO v3 Several studies have proposed real-time object detection
Tiny algorithm proves its suitability, mainly when deployed systems for edge devices utilizing Graphics Processing Unit
on FPGA boards. However, the newly introduced Xilinx (GPU)-based architectures. However, the significant power
Kria KV260 FPGA development board demonstrates its out- consumption associated with GPU-based systems presents a
standing computational ability to accelerate deep learning considerable bottleneck for their practical deployment [6],
algorithms [23], [24], [25], [26], [27], [28]. The synergistic [12], [13], [16]. Notably, Abraham et al. [13] achieved the
pairing of the Xilinx Kria KV260 development board with the lowest power consumption among GPU-based traffic light
YOLO v3 algorithm is a compelling choice for real-time traf- detection systems. Their study focused on a traffic light detec-
fic light detection and classification on resource-constrained tion system utilizing YOLO architecture and implemented on
devices. In tandem with the detection and classification of a Nvidia Tesla T4 GPU. The research introduces a modified
traffic signals, the system also includes a critical module for YOLO model tailored for detecting traffic lights and signs.
speed control in compliance with the classification results. This model, based on a modified cross-stage partial YOLO
The main contributions of this paper are summarized as v4 architecture, processes images captured by a camera sen-
follows:, sor, leveraging a dataset comprising 1360 training data and
340 testing data, encompassing six types of traffic lights and conducted with very low image sizes, resulting in compara-
39 types of traffic signs. The network, implemented using tively lower accuracy.
the Darknet framework, attains a mean average precision of Although YOLO-based object detection on FPGA offers
79.77% at a processing speed of 29 FPS, accompanied by a a multitude of advantages including real-time processing,
power consumption of 70W. tailored optimization for FPGA architecture, low power
Alternatively, CNN based object detection serves as consumption, high throughput, low latency, flexibility for
another viable approach, characterized by notably low power various YOLO models, and resource efficiency, numerous
consumption and high throughput [9], [29], [30]. However, YOLO-based object detection systems have emerged in
a key bottleneck of CNN-based real-time object detection recent years. However, to discern the state-of-the-art among
systems lies in their requirement for low input image sizes, these systems, several notable implementations warrant atten-
necessitating substantial computational power for process- tion. Heller et al. [31] introduced an object detection system
ing larger images. Addressing this challenge, Wang et al. utilizing deep learning on embedded edge devices, focusing
[29] proposed a highly efficient CNN-based object detection on maritime object detection with the Xilinx Kria KV260
system. Their work introduces an adaptive CNN edge com- Vision AI Kit. Their study involved training and evaluat-
puting platform tailored for target detection tasks, leveraging ing multiple YOLO neural networks of varying sizes and
FPGA technology. This research capitalizes on the inherent architectural specifications, incorporating structured pruning
pipeline architecture of FPGAs to expedite network compu- techniques such as sparsifying to reduce network size while
tations, utilizing off-chip memory for storing network models preserving detection performance. The proposed deploy-
and thereby obviating the need for resource-intensive tiling ments showcased promising outcomes, achieving an infer-
techniques. Moreover, an innovative online reconfigurable ence speed of 90 FPS with only a marginal 2.4% degradation
design is introduced, enabling real-time adjustments to net- in mean average precision for high-definition input images.
work structure and parameters to accommodate diverse target Nonetheless, while exhibiting enhanced throughput and effi-
recognition objectives. Implemented on a Spartan-6 FPGA ciency, the accuracy of this method necessitates further
platform, the system undergoes evaluation through pedestrian refinement.
and vehicle classification tasks, achieving a detection speed Corcoran et al. [32] proposed a streaming architec-
of 16 frames per second (FPS) and a power consumption rate ture toolflow to accelerate YOLO models on FPGA
of 0.79 W, while attaining a classification accuracy of 96%. devices, employing a deeply pipelined on-chip design for
However, it is noteworthy that the system’s evaluation was YOLO accelerators. These accelerators, generated using an
automated toolflow, incorporate novel hardware components FPGA board (Xilinx Kria KV-260) with minimal power con-
supporting YOLO operations in a dataflow manner, along sumption, the proposed system aims to maintain accuracy and
with off-chip memory buffering to mitigate on-chip memory throughput while operating within cost constraints. However,
limitations. Their approach achieves a throughput of 69 FPS achieving a trade-off among these performance evaluation
for input images of 416 × 416, consuming 15.4W power. metrics with a low-cost FPGA board presents significant
However, power consumption of this study remains relatively challenges. The Kria KV-260, being a newly launched FPGA
high, and accuracy metrics are not provided, posing potential board with limited memory and computational power, may
limitations. pose a bottleneck for deploying large neural network models.
Nguyen et al. [33] present an FPGA-based design Thus, the proposed system optimizes the YOLO model to
for YOLOv4 network tailored for flying-object detection, fit within the FPGA board’s constraints. Additionally, eval-
addressing challenges of limited floating-point resources uating the proposed system through on-road tests and using
while aiming to maintain accuracy, real-time performance, street photos and videos is a crucial aspect of this study,
and energy efficiency. They curated a suitable dataset of providing real-world validation of its effectiveness.
flying objects for network implementation, training, and fine-
tuning, adapting YOLO networks for FPGA deployment. III. IMPLEMENTATION
Evaluating on the ZCU104 FPGA kit, they achieved 125FPS The initiation of the study’s implementation process
for HD input images, consuming 26.4W power. However, commences with the establishment of the host machine
high throughput is accompanied by elevated power con- environment and the FPGA development board. This imple-
sumption and potentially compromised accuracy, suggesting mentation procedure encompasses multiple sequential stages,
potential bottlenecks. encompassing environment configuration, dataset prepara-
Valadanzoj et al. [34] introduced a high-speed YOLO tion, model training, model conversion, and deployment of
hardware accelerator tailored for self-driving automotive the trained model into the FPGA board, all aimed at creating
applications on FPGA. Their approach involved utilizing a robust and dependable object detection and classification
8-bit and 5-bit fixed-point formats for data and weights to system. The overall architecture of the proposed system is
conserve resources and memory. To address accuracy con- illustrated in Figure 1.
cerns, a Genetic Algorithm was employed to optimize the
decimal point positions across different network layers. Fur- A. DATASET
thermore, a technique enabling simultaneous multiplications The dataset has been prepared according to the YOLO format
with distinct operands using a single DSP block was pre- to train a YOLO model in the darknet framework. The images
sented, enhancing network execution speed. Evaluation on from the collected dataset were annotated with the bounding
the Xilinx Zynq ZC706 FPGA platform yielded a through- boxes surrounding the traffic lights that must be detected. The
put of 55 FPS with 79% accuracy for a given input image, annotations consist of the traffic light class and coordinates
consuming 13.6W of power. Nonetheless, similar to previous of the center on the X-axis, coordinates of the center on
studies, the high throughput was accompanied by elevated the Y-axis, height, and width. A class name was allocated
power consumption and compromised accuracy, potentially to every annotation according to the traffic light class (Red,
posing a limitation. Green, Yellow, Left Green, Right Green, Left Red, Right Red,
Besides of these large NN based architecture, several and No Light). The pixel values of bounding box coordinates
alternative methods have been reported for real-time object (x, y, width, height) were transformed to normalized values
detection, including lightweight approaches [16] and com- between 0 and 1 relative to the dimensions of the traffic light
binations of Blob, Histogram of Oriented Gradients (HOG), image to improve the model’s convergence and robustness.
and Support Vector Machine (SVM) [20]. However, the However, to train the YOLO model, a corresponding text file
performance of these methods still requires improvement has been generated for each traffic light image in the BSTLD
compared to other state-of-the-art object detection systems. dataset. The information about the traffic light in the text file
The majority of prior research has predominantly focused maintains a predefined sequence so that during the training,
on throughput and accuracy metrics, often neglecting the the darknet framework can extract the information correctly
critical aspects of power consumption and cost, which are and train the YOLO model with accurate data.
pivotal considerations for edge computing systems. More- The Bosch Small Traffic Light Dataset (BSTLD) [42]
over, some studies have utilized highly expensive FPGA contains a total of 13,427 camera photos, each with a size
boards, limiting their practical applicability, particularly for of 1280 × 720 pixels. Additionally, the collection includes
large-scale systems or ADAS. Unfortunately, current state- around 24,000 annotated traffic light signals. The images
of-the-art object detection systems still require improvement in the training set depict a variety of obstacles that may be
across all performance metrics, including throughput, accu- encountered when driving in urban environments. However,
racy, power consumption, and cost-effectiveness. To address 5094 images from the BSTLD images were used for training
these constraints, this paper introduces a novel object detec- purposes in this study, and among those images, 1019 images
tion system for FPGA utilizing YOLO. Leveraging a $250 were split to make a test set.
B. MODEL PREPARATION AND FPGA DEPLOYMENT model was trained with weights from a pre-trained model
1) MODEL TRAINING THROUGH DARKNET from the darknet to maximize efficiency. This step is followed
Darknet framework serves the dual purpose of training neu- by fine-tuning the YOLOv3 Tiny network with the dataset by
ral networks and subsequently processing images or video running the dataset through the neural network and modifying
frames using deep neural networks. However, to train the the parameters to minimize the loss. Table 2 represents the
YOLO model, this study considers the darknet framework optimized YOLO v3 Tiny model architecture for the proposed
due to its real-time object detection architecture. Darknet system.
mainly employs 53 convolutional layers in 3×3 and 1×1 con-
volutional filters to extract features and reduce output [43]. 2) GENERATION OF FROZEN GRAPH
Besides, It makes predictions by utilizing max pooling. Freezing the graph is a process that combines the architec-
YOLO framework comes with a configuration file that needs ture of the model and weights into a single file called a
to be modified according to the project-specific requirements. frozen graph. The darknet binary weights were converted into
These configurations play a crucial role in determining the the TensorFlow variables. The yolov3 Tiny architecture has
detection capabilities of the model. YOLOv3 configuration been built in TensorFlow with the proper configuration file
files have been changed for traffic light detection and classi- and the appropriate weight file generated from the Darknet
fication, including the number of classes, anchor boxes, and framework.
other hyperparameters. Initially, both the batch and subdivi- The architectural framework of the model has been repli-
sions were set to one. To train efficiently, the batch was set to cated with input and output nodes along with other associated
64, and the subdivision was set to 8 by considering the volume parameters. TensorFlow has built the YOLO model and gen-
of images in the BSTLD and the capacity of the host machine. erated the frozen graph with the neural network layers, the
The proposed YOLOv3 Tiny model utilizes 3 × 3 convolu- determination of activation functions, and the configuration
tion layers with a stride size of 1 to extract features through a of input-output functions in this process by using the config-
feedforward structure. The initial convolution layer takes as uration and weights files. In addition, the frozen graph has
input an image of size 416 × 416, and each convolution layer been optimized by the TensorFlow API. The procedure for
incorporates a padding size of 1. Furthermore, it employs generating a frozen graph illustrated in Fig. 2.
Max pooling layers to down sample data within the convolu-
tion layers. Bounding box predictions are made at two distinct 3) QUANTIZATION
feature map scales and combined with an up sampled 13 × 13 Vitis-AI quantizer provides a function to convert the 32-bit
feature map. YOLOv3 Tiny detects 3 boxes per grid cell floating-point weights and activations to decrease compu-
with 5 bounding boxes. Since this study considers 8 traffic tational complexity and maintain prediction accuracy. The
light classes, the number of filters within the convolutional frozen graph has been investigated by the Vitis-AI to find out
layer has been set to 39 in the configuration file. The YOLO the number of input and output nodes of the trained YOLO
model and their names. To calibrate the trained YOLO model 32-bit memory-mapped AXI interface employed to establish
before quantization, 250 random images have been selected the links between the processor and the DPU. In the compi-
from the test set. However, the model has been quantized lation process, the proper fingerprint of the DPU architecture
into INT8 fixed point formats that require less memory band- has been specified to make the compiled model compatible
width, improving power efficiency and computing speed. with the DPU. The compiled model has been deployed to the
Kria KV260 FPGA board to detect and classify the traffic
lights.
FIGURE 3. Interconnection between Xilinx DPU and the Kria KV260 processor.
TABLE 3. Comparison of the proposed system with different FPGA based object detection system.
V. CONCLUSION
An FPGA-based general YOLO model, tailored for object
detection and classification in edge computing devices, has
been introduced. This model has been carefully optimized
to meet the stringent requirements of edge systems, encom-
passing limitations in hardware resources, the need for
high accuracy, and real-time processing speed. In light of
the system’s inherent resource constraints and its overall
performance, it is justifiable to conclude that both the model
and the entire system have demonstrated their efficiency,
FIGURE 7. Performance evaluation of the proposed system.
reliability, and speed, particularly within the domain of
Advanced Driver Assistance Systems (ADAS) and edge
tion in efficiency. Overall, the proposed system demonstrates computing.
satisfactory performance compared to other systems. The One noteworthy feature of the developed system is its
robustness and reliability of the YOLOv3 Tiny model in versatility. It possesses the capability to detect not only traf-
real-time scenarios are underscored by its deployment on fic lights but also a wide range of other object types. This
the Kria KV260 and efficient utilization of the Xilinx DPU. flexibility arises from the system’s adaptability through the
However, isolated misdetections were encountered during the training of the YOLO model with suitable datasets. Con-
testing phase, albeit their incidence was minimal relative to sequently, ample room remains for further exploration and
the total number of objects tested. enhancement of the YOLO model. Architectural modifica-
tions and optimization efforts can be undertaken to achieve [18] R. Greer, A. Gopalkrishnan, J. Landgren, L. Rakla, A. Gopalan, and
improved performance. M. Trivedi, ‘‘Robust traffic light detection using salience-sensitive loss:
Computational framework and evaluations,’’ 2023, arXiv:2305.04516.
Additionally, it is worth considering the integration of a [19] E. Al-Ezaly, H. M. El-Bakry, A. Abo-Elfetoh, and S. Elhishi, ‘‘An inno-
fault tolerance mechanism within the system to handle unex- vative traffic light recognition method using vehicular ad-hoc networks,’’
pected issues and ensure robustness. A valuable avenue for Sci. Rep., vol. 13, no. 1, p. 4009, Mar. 2023.
future work is the assessment of system reliability through [20] Y. Zhou, Z. Chen, and X. Huang, ‘‘A system-on-chip FPGA design for real-
time traffic signal recognition system,’’ in Proc. IEEE Int. Symp. Circuits
long-term operation, observing its behavior and patterns Syst. (ISCAS), May 2016, pp. 1778–1781.
across various real-world scenarios. Such investigations will [21] Z. Shi, Z. Zou, and C. Zhang, ‘‘Real-time traffic light detection with
contribute to the continuous development and refinement of adaptive background suppression filter,’’ IEEE Trans. Intell. Transp. Syst.,
vol. 17, no. 3, pp. 690–700, Mar. 2016.
the system’s performance and reliability.
[22] K. Sruthi and R. Nandakumar, ‘‘AI/ML-based object detection on FPGA
SoC,’’ in Proc. Int. Conf. Commun., Electron. Digit. Technol. Cham,
REFERENCES Switzerland: Springer, 2023, pp. 479–487.
[23] S. Kalapothas, G. Flamis, and P. Kitsos, ‘‘Efficient edge-AI application
[1] X. Xu, X. Zhang, B. Yu, X. S. Hu, C. Rowen, J. Hu, and Y. Shi, ‘‘DAC-
deployment for FPGAs,’’ Information, vol. 13, no. 6, p. 279, May 2022.
SDC low power object detection challenge for UAV applications,’’ IEEE
Trans. Pattern Anal. Mach. Intell., vol. 43, no. 2, pp. 392–403, Feb. 2021. [24] K. Shi, M. Wang, X. Tan, Q. Li, and T. Lei, ‘‘Efficient dynamic recon-
figurable CNN accelerator for edge intelligence computing on FPGA,’’
[2] H. Xu, M. Guo, N. Nedjah, J. Zhang, and P. Li, ‘‘Vehicle and pedestrian
Information, vol. 14, no. 3, p. 194, Mar. 2023.
detection algorithm based on lightweight YOLOv3-promote and semi-
precision acceleration,’’ IEEE Trans. Intell. Transp. Syst., vol. 23, no. 10, [25] G. Rathod, P. Shah, R. Gajjar, M. I. Patel, and N. Gajjar, ‘‘Implementation
pp. 19760–19771, Oct. 2022. of real-time object detection on FPGA,’’ in Proc. 7th Int. Conf. Trends
[3] L. Fang, Q. Jiang, J. Shi, and B. Zhou, ‘‘TPNet: Trajectory proposal Electron. Informat. (ICOEI), Apr. 2023, pp. 235–240.
network for motion prediction,’’ in Proc. IEEE/CVF Conf. Comput. Vis. [26] P. Dhilleswararao, S. Boppu, M. S. Manikandan, and L. R. Cenkeramaddi,
Pattern Recognit. (CVPR), Jun. 2020, pp. 6796–6805. ‘‘Efficient hardware architectures for accelerating deep neural networks:
[4] X. Chang, H. Pan, W. Sun, and H. Gao, ‘‘YolTrack: Multitask learning Survey,’’ IEEE Access, vol. 10, pp. 131788–131828, 2022.
based real-time multiobject tracking and segmentation for autonomous [27] S. C. Magalhães, F. N. dos Santos, P. Machado, A. P. Moreira, and
vehicles,’’ IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 12, J. Dias, ‘‘Benchmarking edge computing devices for grape bunches and
pp. 5323–5333, Dec. 2021. trunks detection using accelerated object detection single shot multibox
[5] Z. Ouyang, J. Niu, Y. Liu, and M. Guizani, ‘‘Deep CNN-based real- deep learning models,’’ Eng. Appl. Artif. Intell., vol. 117, Jan. 2023,
time traffic light detector for self-driving vehicles,’’ IEEE Trans. Mobile Art. no. 105604.
Comput., vol. 19, no. 2, pp. 300–313, Feb. 2020. [28] M. Baczmanski, M. Wasala, and T. Kryjak, ‘‘Implementation of a per-
[6] T. H. P. Tran and J. W. Jeon, ‘‘Accurate real-time traffic light detection ception system for autonomous vehicles using a detection-segmentation
using YOLOv4,’’ in Proc. IEEE Int. Conf. Consum. Electron. Asia (ICCE- network in SoC FPGA,’’ in Proc. Int. Symp. Appl. Reconfigurable Comput.
Asia), Nov. 2020, pp. 1–4. Cham, Switzerland: Springer, 2023, pp. 200–211.
[7] H. Naimi, ‘‘Traffic sign and light detection using deep learning for automo- [29] Y. Wang, Y. Liao, J. Yang, H. Wang, Y. Zhao, C. Zhang, B. Xiao, F. Xu,
tive applications,’’ Ph.D. thesis, Dept. Elect. Comput. Eng., Univ. Windsor, Y. Gao, M. Xu, and J. Zheng, ‘‘An FPGA-based online reconfigurable
Windsor, ON, Canada, 2021. CNN edge computing device for object detection,’’ Microelectron. J.,
[8] P. Adarsh, P. Rathi, and M. Kumar, ‘‘YOLOv3-tiny: Object detection and vol. 137, Jul. 2023, Art. no. 105805.
recognition using one stage improved model,’’ in Proc. 6th Int. Conf. Adv. [30] Z. Zhang, M. A. P. Mahmud, and A. Z. Kouzani, ‘‘FitNN: A low-resource
Comput. Commun. Syst. (ICACCS), Mar. 2020, pp. 687–694. FPGA-based CNN accelerator for drones,’’ IEEE Internet Things J., vol. 9,
[9] J. Zhang, F. Zhang, M. Xie, X. Liu, and T. Feng, ‘‘Design and imple- no. 21, pp. 21357–21369, Nov. 2022.
mentation of CNN traffic lights classification based on FPGA,’’ in Proc. [31] D. Heller, M. Rizk, R. Douguet, A. Baghdadi, and J.-P. Diguet, ‘‘Marine
IEEE 4th Int. Conf. Electron. Inf. Commun. Technol. (ICEICT), Aug. 2021, objects detection using deep learning on embedded edge devices,’’ in
pp. 445–449. Proc. IEEE Int. Workshop Rapid Syst. Prototyping (RSP), Shanghai, China,
[10] W. Liu, ‘‘SSD: Single shot multibox detector,’’ in Proc. 14th Eur. Conf. Oct. 2022, pp. 1–7.
Amsterdam, The Netherlands: Springer, Oct. 2016, pp. 21–37. [32] A. Montgomerie-Corcoran, P. Toupas, Z. Yu, and C.-S. Bouganis,
[11] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, ‘‘You only look once: ‘‘SATAY: A streaming architecture toolflow for accelerating YOLO mod-
Unified, real-time object detection,’’ in Proc. IEEE Conf. Comput. Vis. els on FPGA devices,’’ in Proc. Int. Conf. Field Program. Technol.
Pattern Recognit. (CVPR), Jun. 2016, pp. 779–788. (ICFPT), Dec. 2023, pp. 179–187.
[12] S. Wu, N. Amenta, J. Zhou, S. Papais, and J. Kelly, ‘‘AUToLights: A [33] D.-D. Nguyen, D.-T. Nguyen, M.-T. Le, and Q.-C. Nguyen, ‘‘FPGA-SoC
robust multi-camera traffic light detection and tracking system,’’ 2023, implementation of YOLOv4 for flying-object detection,’’ J. Real-Time
arXiv:2305.08673. Image Process., vol. 21, no. 3, p. 63, May 2024.
[13] A. Abraham, D. Purwanto, and H. Kusuma, ‘‘Traffic lights and traffic signs [34] Z. Valadanzoj, H. Daryanavard, and A. Harifi, ‘‘High-speed YOLOv4-
detection system using modified you only look once,’’ in Proc. Int. Seminar tiny hardware accelerator for self-driving automotive,’’ J. Supercomput.,
Intell. Technol. Appl. (ISITIA), Surabaya, Indonesia, 2021, pp. 141–146. vol. 80, no. 5, pp. 6699–6724, Mar. 2024.
[14] W. H. D. Fernando and S. Sotheeswaran, ‘‘Automatic road traffic [35] E. Rzaev, A. Khanaev, and A. Amerikanov, ‘‘Neural network for real-time
signs detection and recognition using ‘you only look once’ version 4 object detection on FPGA,’’ in Proc. Int. Conf. Ind. Eng., Appl. Manuf.
(YOLOv4),’’ in Proc. Int. Res. Conf. Smart Comput. Syst. Eng. (SCSE), (ICIEAM), Sochi, Russia, May 2021, pp. 719–723.
vol. 4, Sep. 2021, pp. 38–43. [36] G. Wang, H. Ding, B. Li, R. Nie, and Y. Zhao, ‘‘Trident-YOLO: Improving
[15] H. K. Hua, K. H. Nguyen, L.-D. Quach, and H. N. Tran, ‘‘Traffic lights the precision and speed of mobile device object detection,’’ IET Image
detection and recognition method using deep learning with improved Process., vol. 16, no. 1, pp. 145–157, Jan. 2022.
YOLOv5 for autonomous vehicle in ROS2,’’ in Proc. 8th Int. Conf. Intell. [37] M. Liu, S. Luo, K. Han, B. Yuan, R. F. DeMara, and Y. Bai, ‘‘An efficient
Inf. Technol., Feb. 2023, pp. 117–122. real-time object detection framework on resource-constricted hardware
[16] Z. Yao, Q. Liu, Q. Xie, and Q. Li, ‘‘TL-detector: Lightweight based real- devices via software and hardware co-design,’’ in Proc. IEEE 32nd Int.
time traffic light detection model for intelligent vehicles,’’ IEEE Trans. Conf. Application-Specific Syst., Archit. Processors (ASAP), Jul. 2021,
Intell. Transp. Syst., vol. 24, no. 9, pp. 9736–9750, Sep. 2023. pp. 77–84.
[17] S. Shrivastava, A. Somthankar, V. Pandya, and M. Patil, ‘‘Implementation [38] D. Pestana, P. R. Miranda, J. D. Lopes, R. P. Duarte, M. P. Véstias,
of a pid controller for autonomous vehicles with traffic light detection in H. C. Neto, and J. T. De Sousa, ‘‘A full featured configurable accelerator
CARLA,’’ in Intelligent Computing and Networking. Cham, Switzerland: for object detection with YOLO,’’ IEEE Access, vol. 9, pp. 75864–75877,
Springer, 2023. 2021.
[39] J. Zhang, L. Cheng, C. Li, Y. Li, G. He, N. Xu, and Y. Lian, ‘‘A low-latency VEIT WIESE received the Dipl.-Ing. degree in
FPGA implementation for real-time object detection,’’ in Proc. IEEE Int. electrical engineering from the University of
Symp. Circuits Syst. (ISCAS), May 2021, pp. 1–5. Siegen, Germany, in 2015. Later on, he was a
[40] F. Esen, A. Degirmenci, and O. Karal, ‘‘Implementation of the object Project Manager with Qualigon GmbH, until 2017.
detection algorithm (YOLOV3) on FPGA,’’ in Proc. Innov. Intell. Syst. He is a Project Manager and a Lecturer with
Appl. Conf. (ASYU), Elazig, Turkey, Oct. 2021, pp. 1–6. the Institute of Embedded Systems, University
[41] J. Xin, M. Cha, L. Shi, C. Long, H. Li, F. Wang, and P. Wang, ‘‘Lightweight of Siegen. His research interests include fault
convolutional neural network of YOLOv3-tiny algorithm on FPGA for tolerance, safety-critical embedded systems, struc-
target detection,’’ in Proc. IEEE Int. Conf. Comput. Sci., Artif. Intell.
tural health monitoring, field-programmable gate
Electron. Eng. (CSAIEE), Aug. 2021, pp. 65–70.
arrays, and artificial intelligence.
[42] K. Behrendt, L. Novak, and R. Botros, ‘‘A deep learning approach to traffic
lights: Detection, tracking, and classification,’’ in Proc. IEEE Int. Conf.
Robot. Autom. (ICRA), May 2017, pp. 1370–1377.
[43] T. Diwan, G. Anirudh, and J. V. Tembhurne, ‘‘Object detection using
YOLO: Challenges, architectural successors, datasets and applications,’’
Multimedia Tools Appl., vol. 82, no. 6, pp. 9243–9275, Mar. 2023.