0% found this document useful (0 votes)
92 views6 pages

Design and Implementation of Embedded PCB Defect Detection System Based On FPGA

Design_and_implementation_of_embedded_PCB_defect_detection_system_based_on_FPGA

Uploaded by

misalabhijeet000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views6 pages

Design and Implementation of Embedded PCB Defect Detection System Based On FPGA

Design_and_implementation_of_embedded_PCB_defect_detection_system_based_on_FPGA

Uploaded by

misalabhijeet000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

IEEE ITOEC(ISSN: 2693-289X)

Design and implementation of embedded PCB


defect detection system based on FPGA
2023 IEEE 7th Information Technology and Mechatronics Engineering Conference (ITOEC) | 979-8-3503-3421-0/23/$31.00 ©2023 IEEE | DOI: 10.1109/ITOEC57671.2023.10292026

Hongze Yu1, Qingsong Lin1, Cunling Liu1


1. College of Information Engineering, Henan University of Science and Technology, Luoyang, China
[email protected], [email protected], [email protected]
Corresponding Author: Hongze Yu Email: [email protected]

Abstract—Aiming at the problem that the convolutional In the deep learning method, the convolutional neural
neural network commonly used in PCB defect detection is network[1] identifies the types of defects by extracting
difficult to deploy in embedded devices with limited image features, and the YOLO[2]( You Only Look Once)
resources, an FPGA embedded system is designed to realize algorithm can mark the location of defects. Many
the hardware deployment of YOLOv3 neural network for improved YOLO detectors[3] have been developed for
PCB defect detection. The system is mainly designed from surface defect detection, such as steel, transportation,
two aspects. The deep learning processing unit DPU is construction and fabric production processes. Many
designed on the hardware side to quickly process the researchers and practitioners have also introduced this
convolution calculation part of the neural network and
mechanism into PCB defect detection. Adibhatla[4]
configure the system software for it. At the algorithm side,
the model is compressed by using a quantization method to
improved the efficiency of PCB defect detection by
reduce the computational complexity of the model and optimizing the YOLO network structure. Although the
generate a DPU deployable model. The experimental results performance of the algorithm is constantly improving,
show that the designed system can still maintain the most programs are running on the GPU platform, and the
accuracy of 0.789 in PCB defect detection. At the same time, high power consumption of the GPU is one of the key
the convolution calculation throughput of 2.44TOPS and factors to ensure the fast operation of the algorithm. In
the detection speed of 97.59ms per frame delay are realized order to achieve fast PCB defect detection under low
under low power consumption, which is suitable for power consumption, how to deploy deep learning method
industrial PCB defect detection. to embedded devices is an urgent problem to be solved.

Keywords—FPGA, YOLOv3, embedded systems, PCB Based on this, this paper designs an embedded system
defect detection, hardware deployment based on FPGA (Field Programmable Gate Array), and
realizes the hardware deployment of YOLOv3 network.
From the underlying hardware to the application,
I. INTRODUCTION according to the characteristics of the Linux operating
As the most basic component of electronic products, system and the YOLOv3 network structure, the hardware
the quality of PCB (Printed Circuit Board) directly affects platform and software environment are respectively
the use effect of electronic products. Although the PCB configured to form a complete system. According to the
production process will go through layers of checks, but it software and hardware co-design, the algorithm is
will inevitably produce some defective circuit boards. The compressed on the software side[5], the hardware platform
resulting PCB defects mainly have the following is built on the hardware side, and finally passed the on-
characteristics: (1) the defect area is relatively small; (2) board test. The results show that the system has certain
The types of defects are complex; (3) PCB layout is easily speed and power consumption advantages.
confused with some defect features. At present, the
commonly used defect detection methods in PCB II. TARGET DETECTION SYSTEM COMPOSITION
industrial production include manual detection, electrical
testing and detection methods based on deep learning. A. YOLOv3 target detection algorithm
Traditional detection methods are mainly based on manual
detection. Although they have certain flexibility YOLOv3[6] Convolutional neural network is the third
advantages, too long working hours will cause visual generation of YOLO series algorithms based on one stage
fatigue, which will affect PCB quality control and lead to object detection. Its detection performance is far ahead of
wrong or missed detection. The electrical test may cause both versions in terms of speed and accuracy. It is widely
irreversible damage to the circuit due to unstable current, used in industrial and academic fields. The main
and this contact detection method may also cause defects architecture of YOLOv3 is based on Darknet-53. Taking
in the PCB due to improper operation. Therefore, PCB the input with image resolution of 416*416 as an example,
defect detection based on deep learning can work for a the 52 convolutional layers in Darknet-53 contain a total
long time without fatigue, with fast detection speed and of more than 40M parameters and a calculation amount of
stable quality control, which has attracted wide attention. about 24.5G, as shown in Table 1.

979-8-3503-3421-0/23/$31.00 ©2023 IEEE 530


Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on August 05,2024 at 10:10:40 UTC from IEEE Xplore. Restrictions apply.
TABLE I. MODEL CALCULATION obtained iteratively through the feature extraction process.
Input Resolution Parameter(M) Computation(G) The yolo layer finally has three types of receptive field
288*288 11.75 output. Therefore, YOLOv3 has a total of nine prediction
352*352 17.55 boxes, each of which outputs a one-dimensional vector y
416*416 40.58 24.52 (formula 1) containing 5+n elements as the preliminary
480*480 32.64 prediction result.
544*544 41.92 y = [bx, by, bw, bh, C, c1 , c2 , c3 ,...cn ] (1)
Finally, the threshold is set by NMS (Non-Maximum
The network structure of YOLOv3 is shown in Fig.1. Suppression) and the prediction box larger than the
Its main architecture is based on Darknet-53, including 75 threshold is selected to calculate the IOU (formula 2), and
convolutional layers, 72 BN layers and 72 LeakyReLu the prediction box with the maximum probability is
activation function layers to extract input image features[7], selected, which is the final calibrated object anchor frame
followed by 30 yolo layers to predict objects of different of YOLOv3. Since the PCB defects are small targets and
sizes and shapes. The two contact layers expand the the features are not obvious, setting the threshold to 0.3
dimension of the feature map through tensor splicing, has a better detection effect.
which is used for target prediction of larger receptive
fields. The add layer in the 23 Residual Blocks deepens target prediction
the number of layers according to the idea of ResNet[8]. IOU = (2)
target prediction
The network finally has three receptive field output
channels, each channel has three prediction boxes, which In addition, the calculation of IOU involves the
can more accurately distinguish objects of different shapes. evaluation of detection accuracy, and mAP (formula
Through model training, it is finally solidified into a 32- 3,4,5,6) is often used to evaluate the detection effect of
bit floating-point model, which is also an important reason YOLO series algorithms.
for the difficulty of neural network hardware deployment.
Subsequent design will reduce the amount of model TP
Precision = (3)
calculation through quantitative methods. TP+ FP
TP
Recall = (4)
TP+ FN
1
AP =  precision(recall )drecall (5)
0

1 C
mAP =  APi
C i=1
(6)

B. FPGA embedded hardware architecture


FPGA[9] is famous for its unique programming
flexibility and computing parallelism, and its
reconfigurability is often used for dedicated chip
verification in industry. Therefore, using the
reconfigurability of FPGA to design the DPU dedicated to
processing the convolution calculation of convolutional
neural network can further improve the processing speed
in the case of low power consumption.
The hardware layer of the system designed in this
paper takes CPU+DPU as the main processing unit, and
the hardware operating system is above the hardware
layer. The embedded Linux operating system is
configured based on the hardware layer, as shown in Fig.2.
From hardware to software, it is connected through
system software, including hardware abstraction layer,
Fig. 1. YOLOv3 network structure operating system layer and middleware layer. The
hardware abstraction layer encapsulates the hardware into
In the inference process of YOLOv3 target detection, a program driver, which is the basis for the operation of
the input image is divided into 13*13, 26*26 and 52*52 the application program, and the boot loader provides the
grids after preprocessing. Each grid point is responsible software conditions for the start of the system. The
for the detection of a region. The image features are operating system layer contains the Linux operating

531
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on August 05,2024 at 10:10:40 UTC from IEEE Xplore. Restrictions apply.
system kernel and the files necessary for system operation, the complexity of model calculation without losing its
and the root file system is at this layer. The middleware is detection accuracy, thus making the hardware deployment
a common service with standard program interface and more friendly[11].
protocol, which can realize the interconnection between
systems and improve the operation efficiency of software Model quantization for hardware deployment, as
applications. The top layer is the application layer. Users shown in Fig.3, takes the floating-point model, calibration
write code generation applications according to data set and its input function as the input of quantization.
requirements, and set up dynamic link libraries to expand The floating-point model provides a YOLOv3 inference
application functions. network structure that does not include data preprocessing
and loss function calculation. The calibration data and its
Application Layer input function are used to standardize the data conversion
rules of the quantization process and perform the data
preprocessing process. Quantization converts the floating-
Middleware Layer
point model into a fixed-point model. After that, the

System Software Layer


compilation parses the topology of the model and
generates a DPU deployable model by constructing a
Operating system kernel , Root file system, ... DPU-operable control flow and data flow form. By setting
the computing node fusion to ensure program parallelism
and data reuse. The final result of compiling is to output a
Boot loader , Hardware driver, ... DPU deployable binary program file. Runtime prepares
the necessary conditions for running on the board.
Through cross-compilation at runtime, a dynamic link
Hardware Layer library is generated to connect software programs and
hardware structures, which together with quantitative
Fig. 2. Embedded Linux operating system architecture
compilation results, test data sets and application drivers
constitute the four necessary conditions for running the
According to the traditional embedded system neural network model on the board.
architecture, the system designed in this paper also
connects the algorithm program and the hardware circuit B. Build hardware platform
by configuring the system software. The YOLO target DPU (Deep Learning Processing Unite) based on
detection algorithm is located at the application layer in FPGA is a computing unit dedicated to deep learning.
the architecture. A series of instructions and execution
instructions of the algorithm are completed through fast
communication with CPU and DPU. In the design process,
High
we realize the system software by configuring the Performance
PE 1 PE 2 PE 3 PE n

hardware system and building the hardware platform, and Scheduler


Hybrid Computing Array
realize the deployment and operation of the algorithm by
quantization, compilation and cross compilation. The APU APU
specific design process will be carried out in the next part. Instruction
Global Memory Pool
Fetch Unit

III. HARDWARE DEPLOYMENT DESIGN


DPU

A. Network model compression High Speed Data Pipe


The original YOLOv3 neural network is usually fixed
as a floating-point model, and the use of 32-bit floating-
point weights[10] and activation values often makes it have External DDR RAM

good test performance on the GPU side.


Fig. 4. DPU internal architecture
Generate
Float Model Preprocess Quantizer
DPU Model
Specifically, as shown in Fig.4, the convolution
Calibration DPU
computing unit contains multiple dedicated processing
Dataset architecture units, which are mainly used to process the convolution
operation of the convolutional neural network, and
DPU
Deployment
Runtime evaluate Compiler improve the overall operation speed by running multiple
convolution processing unit operations in parallel. The
instruction extraction unit extracts the application
Fig. 3. Quantitative compilation process instruction to the convolution processing unit. The
scheduler is used to manage the saturation of use of
Through quantization, the 32-bit floating-point
parallel convolution processing units. The global memory
number is quantized into 8-bit integer data, which reduces

532
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on August 05,2024 at 10:10:40 UTC from IEEE Xplore. Restrictions apply.
is used to cache the last calculated data, providing a Usually, the parallel computing method of PL will
hardware basis for data reuse. Externally, the DPU accelerate the task of using repeated computing or data
interacts with the ARM and the external storage unit reuse, which has certain advantages for the calculation of
through the high-speed data transmission channel AXI, convolutional neural network convolutional layer and
and processes the instructions sent by the front-end fully connected layer. In addition, CPU and DPU are
application through the application processing unit. connected by AXI bus to accelerate data transmission.
After the DPU configuration is completed, its
PS AXI
parameter information is solidified into the hardware
description file, and its resource utilization is shown in CPU
PL

Table 2. Taking the LUT as an example, the resource IP 1 IP 2 IP 3 ··· IP n


consumption ratio of the system is 39.34%, indicating that
the hardware structure of the system is not complicated. Memary Control Unit
On the basis of this hardware circuit, the hardware
operating system is configured to realize on-board Control Bus

operation and neural network algorithm inference process.


Using the petalinux tool, the project is created on the basis Fig. 5. Hardware-software co-design
of the hardware xsa file by petalinux-create instruction,
and then the root file system rootfs is configured through Through previous work, DPU and CPU together
the petalinux-config instruction, and the operating system constitute the processor system running on the board, the
kernel is configured by petalinux-config instruction. root file system and the Linux operating system on the
Finally, the petalinux-build instruction is used to compile operating system kernel support board, and configure the
the project to achieve specific configuration., and the python driver to provide a software environment for the
system software such as the root file system and the boot application to run. Then the original algorithm is
file of the Linux system is generated. quantified and compiled into a DPU deployable model, so
that the hardware deployment of YOLOv3 neural network
TABLE II. DPU DESIGN RESOURCE USAGE can be realized.
Resource Utilization Available Utilization (%)
IV. EXPERIMENT AND RESULT
LUT 46073 117120 39.34
LUTRAM 3429 57600 5.59 A. Experimental environment
FF 66334 234240 28.32
The system designed in this paper takes the AXU5EV-
BRAM 132 144 91.32
P development board as the hardware platform. The
DSP 326 1248 26.12 experimental environment is shown in Fig.6. The CPU
BUFG 3 352 0.85 part integrates four ARM Cortex-A53 processors and two
PLL 1 8 12.50 ARM Cortex-R5 processors. The FPGA part uses the
XCZU5EV-2SFVC784I industrial FPGA chip produced
by Xilinx. The system development environment provided
C. Hardware-software co-design by Xilinx is used in the design process, including vitis AI
In the design process, in addition to the separate quantizer, vitis AI compiler, vitis AI runtime for algorithm
processing of the above software and hardware, it is also design, vivado, petalinux, vitis and other tools for
necessary to consider the collaborative design of software hardware design.
and hardware[12]. Considering the different data processing
methods of CPU and DPU, the hardware design part is
divided into two parts: PS (processing system) and PL
(programable logic). As shown in Fig.5, the PS part is
dominated by CPU and memory, which is mainly used to
process data interaction problems such as data input and
output, run software-side code, and allocate the
convolution calculation part in the program to the PL side.
The PL part is mainly composed of DPU designed by
FPGA. FPGA forms IP by solidifying the distribution
logic unit, and then connects to the PS end to form a
dedicated processor DPU for convolution calculation. As
the central processor, the CPU has a higher clock
frequency. The main program of the code is placed on the
PS side to run, which can respond to the input instructions
faster. At the same time, the CPU can call different IP Fig. 6. Experimental environment composed of AXU5EV-P
cores on the PL side to handle different computing tasks. development board

533
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on August 05,2024 at 10:10:40 UTC from IEEE Xplore. Restrictions apply.
In order to verify the effectiveness and practicability
of the designed system, the DeepPCB data set is used as
the experimental data set, and 200 images are randomly
selected as the test data set for experimental verification.
The data set is 6 classification target detection: open, short,
mouse-bite, spur, copper and pin-hole, image size
640*640, each image has about 3-12 defects. The data set
is commonly used as the experimental verification of the
target detection algorithm, which is more convincing for
the experimental verification of the detection system with
the application background of industrial PCB defect
detection.

B. Data analysis
The overall detection effect of the system is shown in
Fig.7. From the diagram, it can be seen that the system Fig. 8. System power consumption
can accurately detect six defects of PCB when performing
PCB defect detection, frame the location of the defect and Speed calculation: DPU performs well in the
mark the defect type. detection process. Fig.9 shows the total time of single
image processing and the distribution of convolution
processing time. Therefore, the average delay of single
image is 97.59ms, and the average delay of convolution
calculation is 10.03ms, accounting for 10.28%. At the
same time, according to the amount of Darknet calculation
in Table 1, the throughput (formula 7) of DPU in the
detection process is 2.44TOPS.
computation
throughput DPU = (7)
timeconv

Fig. 9. Inference delay distribution

Accuracy calculation: After collecting experimental


data, the AP of various defects in the PCB detection
process and the overall mAP are obtained. Compared with
the accuracy of the GPU end operation ( mAP = 0.795),
the accuracy of the quantified YOLOv3 model is reduced
by less than 2%, which conforms to the model
compression law.
Fig. 7. PCB defect detection renderings
TABLE III. PCB DEFECT DETECTION MAP CALCULATION
Power consumption calculation: The overall power
consumption of the embedded platform based on FPGA is Defect
open short
Mouse-
spur copper
pin-
shown in Fig.8. The overall power consumption of the name bite hole
system is 7.108W, which is much lower than the power Number 208 165 165 137 175 233
AP 0.843 0.857 0.700 0.681 0.823 0.830
consumption of CPU (65W of Intel Core i9-11900) or mAP 0.789
GPU (170W of NVIDIA GeForce RTX 3060).

534
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on August 05,2024 at 10:10:40 UTC from IEEE Xplore. Restrictions apply.
C. Experimental conclusion REFERENCES
According to the above data, it can be concluded that [1] Matthew Zeiler, D., & Rob, F. "Visualizing and understanding
when the system detects PCB defects, the overall power convolutional neural networks." ECCV, 2014.
consumption of the system is 7.108W, and the detection [2] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. "You only
speed of single frame delay is 97.59ms while maintaining look once: Unified, real-time object detection." Proceedings of the
IEEE conference on computer vision and pattern recognition.
the accuracy of 0.789. It is worth mentioning that DPU 2016.
shows excellent performance in processing convolution [3] Kou, X., Liu, S., Cheng, K., & Qian, Y. "Development of a
operations, with a throughput of 2.44TOPS and a YOLO-V3-based model for detecting defects on steel strip
convolution processing delay of 10.03ms. surface." Measurement 182 (2021): 109454.
[4] Adibhatla, V. A., Chih, H. C., Hsu, C. C., Cheng, J., Abbod, M. F.,
& Shieh, J. S. "Applying deep learning to defect detection in
V. CONCLUSION printed circuit boards via a newest model of you-only-look-once."
In order to realize low-power PCB defect detection (2021).
equipment, aiming at the hardware deployment problem [5] Cai, Y., Yao, Z., Dong, Z., Gholami, A., Mahoney, M. W., &
of convolutional neural network, an embedded system Keutzer, K. "Zeroq: A novel zero shot quantization framework."
Proceedings of the IEEE/CVF Conference on Computer Vision
based on FPGA is designed in this paper, and successfully and Pattern Recognition. 2020.
deploys YOLOv3 neural network, which can be used for [6] Redmon, J., & Farhadi, A. "Yolov3: An incremental
industrial PCB defect detection. Firstly, the industrial improvement." arXiv preprint arXiv:1804.02767 (2018).
PCB defect detection method and the academic research [7] Zhang, H., Wang, J., Sun, Z., Zurada, J. M., & Pal, N. R. "Feature
status are quoted, thus introducing the content we selection for neural networks using group lasso regularization."
designed. Secondly, the operation process of YOLOv3 IEEE Transactions on Knowledge and Data Engineering 32.4
algorithm and the operation architecture of FPGA (2019): 659-673.
embedded system are introduced. Thirdly, the hardware [8] He, K., Zhang, X., Ren, S., & Sun, J. "Deep residual learning for
image recognition." Proceedings of the IEEE conference on
deployment of YOLOv3 neural network is completed by computer vision and pattern recognition. 2016.
designing the software, hardware and hardware-software
[9] Nurvitadhi, E., Venkatesh, G., Sim, J., Marr, D., Huang, R., Ong
co-design of FPGA embedded system. Finally, according Gee Hock, J., ... & Boudoukh, G. "Can FPGAs beat GPUs in
to the experimental test, this system has good speed and accelerating next-generation deep neural networks?." Proceedings
power consumption performance when used in industrial of the 2017 ACM/SIGDA international symposium on field-
PCB defect detection, which basically meets the industrial programmable gate arrays. 2017.
needs. Therefore, the design content of this paper not only [10] Wang, J., Ye, Z., Gao, W., & Zurada, J. M. "Boundedness and
helps to promote the deployment and application of deep convergence analysis of weight elimination for cyclic training of
neural networks." Neural Networks 82 (2016): 49-61.
learning methods in industrial PCB defect detection, but
[11] Xie, X., Zhang, H., Wang, J., Chang, Q., Wang, J., & Pal, N. R.
also has certain reference significance for the subsequent "Learning optimized structure of neural networks by hidden node
hardware deployment of other higher-performance deep pruning with $ L_ {1} $ regularization." IEEE Transactions on
learning algorithms. cybernetics 50.3 (2019): 1333-1346.
[12] Yu, Q., Wang, C., Ma, X., Li, X., & Zhou, X. "A deep learning
prediction process accelerator based FPGA." 2015 15th
IEEE/ACM International Symposium on Cluster, Cloud and Grid
Computing. IEEE, 2015.

535
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on August 05,2024 at 10:10:40 UTC from IEEE Xplore. Restrictions apply.

You might also like