Guide de L'electronique
Guide de L'electronique
Article
YOLO-MBBi: PCB Surface Defect Detection Method Based on
Enhanced YOLOv5
Bowei Du † , Fang Wan † , Guangbo Lei, Li Xu *, Chengzhi Xu and Ying Xiong
Abstract: Printed circuit boards (PCBs) are extensively used to assemble electronic equipment.
Currently, PCBs are an integral part of almost all electronic products. However, various surface
defects can still occur during mass production. An enhanced YOLOv5s network named YOLO-MBBi
is proposed to detect surface defects on PCBs to address the shortcomings of the existing PCB surface
defect detection methods, such as their low accuracy and poor real-time performance. YOLO-MBBi
uses MBConv (mobile inverted residual bottleneck block) modules, CBAM attention, BiFPN, and
depth-wise convolutions to substitute layers in the YOLOv5s network and replace the CIoU loss
function with the SIoU loss function during training. Two publicly available datasets were selected for
this experiment. The experimental results showed that the mAP50 and recall values of YOLO-MBBi
were 95.3% and 94.6%, which were 3.6% and 2.6% higher than those of YOLOv5s, respectively, and
the FLOPs were 12.8, which was much smaller than YOLOv7’s 103.2. The FPS value reached 48.9.
Additionally, after using another dataset, the YOLO-MBBi metrics also achieved satisfactory accuracy
and met the needs of industrial production.
Keywords: printed circuit board (PCB); defect detection; deep learning; YOLOv5
1. Introduction
PCB stands for printed circuit board. The bare board of a PCB is usually rectangular,
Citation: Du, B.; Wan, F.; Lei, G.; Xu,
and the front and back sides are usually sprayed green or other coloured protective varnish.
L.; Xu, C.; Xiong, Y. YOLO-MBBi:
Only bare, round, hole-shaped pads are used for soldering the electronic components.
PCB Surface Defect Detection
The front of the PCB will be painted to indicate the wiring connection between the pads,
Method Based on Enhanced YOLOv5.
indicating the connection between the pads. For ease of assembly, the PCB surface is also
Electronics 2023, 12, 2821. https://
marked with the name and model number of the electronic components assembled in
doi.org/10.3390/electronics12132821
that position. Instead of intricate wires connecting tiny components, PCBs have the neces-
Academic Editor: Beiwen Li sary electronic components and have the advantages of densification and high reliability,
Received: 11 May 2023
designability, and producibility. In the electronic information manufacturing industry,
Revised: 13 June 2023
soldering electronic components to PCBs can reduce product size and production costs
Accepted: 14 June 2023
and increase production efficiency, which is important as this increases output and product
Published: 26 June 2023 quality. Both household appliances and military weapon systems use PCBs as the medium
for electrical interconnection, and today, PCBs are used in almost all electronic products.
Considering the wide range of PCB applications, which includes advanced computers,
automated driving, and objects related to aerospace and other highly sophisticated fields,
Copyright: © 2023 by the authors. the PCB quality can directly affect the operation of devices, so increasing the PCB quality is
Licensee MDPI, Basel, Switzerland. of high importance to the electronic information industry. Detecting surface defects in PCBs
This article is an open access article is a key element in increasing the quality of electronic products. To ensure high-quality
distributed under the terms and products, individuals avoid using poorly assembled products by screening out PCBs with
conditions of the Creative Commons
surface defects. The various defects present on PCBs can be broadly divided into solder
Attribution (CC BY) license (https://
and non-soldering defects [1], which can be subdivided into wiring trace faults, polarity
creativecommons.org/licenses/by/
errors, missing components, and component misalignments [2,3]. The object of study for
4.0/).
this experiment was the PCB bare board surface defects that belonged to a wiring track
fault that contained six types of defects: missing holes, mouse bites, open circuits, shorts,
spurs, and spurious copper, as shown in Figure 1.
The traditional methods used to detect PCB surface defects include the naked eye
and probe detection methods. The former is mainly performed by observing PCB surface
defects with the naked eye, but this method requires inspectors to have a plethora of
experience, and making these observations for a long time leads to adverse effects on the
accuracy and efficiency of the detection. The latter is mainly performed by using a needle
bed or flying needle equipment with a computer to edit the test program, control the probe
and PCB for contact, and determine whether surface defects exist according to the data
from the PCB probe analysis, but this method requires more expensive equipment and
higher learning costs, and the detection process will cause damage to the PCB. Therefore,
ensuring efficiency and accuracy in a batch inspection environment is difficult when using
traditional inspection methods; thus, a method whereby users can more accurately and
efficiently detect surface defects on PCBs is needed.
Applying noncontact automatic optical inspection methods to detect PCB surface
defects is now a major research focus, and high-resolution optical sensors and image-
processing algorithms are used to detect surface defects on PCBs [4], both via traditional
image-processing and deep-learning-based detection methods.
Regarding the traditional image-processing detection methods, Mukesh Kumar et al.
proposed a method to detect PCB bare board defects by combining image enhancement
techniques and a standard template generation particle analysis [5]. Khlong Luang and
Pathum Thani proposed a PCB defect classification method by using arithmetic and logical
operations, a circular hough transform (CHT), morphological reconstruction (MR), and
connected component labelling (CCL) [6]. Liu and Qu used a hybrid recognition method
of mathematical morphology and pattern recognition to process PCB images and used
an image aberration detection algorithm to mark images of PCB defects for defect iden-
tification [7]. Gaidhane et al. introduced the concept of linear algebra to determine the
presence of surface defects in PCBs by calculating the rank of the matrix corresponding to
Electronics 2023, 12, 2821 3 of 24
the image [8]. Although these methods can be used to achieve satisfactory accuracy, the
detection rate is slow, which is not conducive to fast detection in practical applications.
Moreover, they are dependent on a priori knowledge or require a large number of standard
images to be stored, which is not conducive to generalization to different scenarios [9].
The deep learning approach to detection is based on a model trained by a neural
network to detect defects; firstly, the object detection weight is trained, which involves
building and labelling the dataset, designing the neural network structure, and setting the
training parameters and strategy. During the training process, the weight parameters are
updated after backpropagation, with the loss values of the model tending to decrease and
the accuracy values tending to increase, and the performance of the resulting weight can be
evaluated by using either a validation or test set after training is complete. The detection of
surface defects on PCBs by using deep learning object detection methods has been shown
to be feasible by the results of several studies. Ding et al. proposed a tiny defect detection
network, TDD-Net, which exploits the inherent multiscale and pyramidal hierarchy of deep
convolutional networks when constructing feature pyramids with high portability [10]. Hu
et al. modified a Faster-RCNN by using ResNet50 as the backbone network and fusing the
GARPN and ShuffleNetV2 residual units [11]. Liao et al. enhanced the YOLOv4 network to
obtain YOLOv4-MN3, replaced the CSPDarkNet53 backbone network with MobileNetV3,
added the SENet attention to the neck network, optimized the activation and loss functions,
and constructed a dataset by capturing 2008 images containing PCB surface defects of
4024 × 3036 pixels with an industrial camera; the dataset contained six types of surface
defects: bumpy or broken lines, clutter, scratches, line-repair damage, hole loss, and over
oil filling. The weight was used to detect these six types of PCB surface defects with a mAP
of 98.64% and FPS of 56.98 [9]. Zheng et al. proposed an enhanced full convolutional neural
network based on MobileNet-V2 by incorporating atrous convolution and enhancing the
skip layers, and the average recognition accuracy of this network when detecting four PCB
defects was 92.86% [12]. However, the networks proposed in the literature [10,11,13] all
suffer from slow detection rates and a lack of efficiency.
To solve the current problem of the lack of accuracy, efficiency, and stability of the PCB
surface defect detection methods that are used in industrial production, we introduce deep
learning methods and computer vision technology, a neural-network-based object-detection
method that allows PCB surface defects to be automatically identified by using YOLOv5
as the framework is used, and we propose YOLO-MBBi based on YOLOv5. YOLO-MBBi
contains the following enhancements to the YOLOv5 network: the backbone network of
YOLOv5s is replaced with an MBConv (mobile inverted residual bottleneck block), the
main module of EfficientNet-B0 is used, the baseline network of EfficientNet [14] has a
superior inference rate and accuracy, the CBAM [15] attention is added to enhance the
feature learning of the objects in the backbone network, BiFPN [16] is added in the head
detector feature fusion network, some of the convolutions in the detection head network are
replaced with depth-wise convolutions [17], and the CIoU loss function that was originally
used in YOLOv5 is replaced with the SIoU loss function to enhance the learning [18]. Finally,
an optimized weight is obtained by training using the YOLO-MBBi network structure,
which is used to detect the PCB surface defects and compare YOLO-MBBi with other neural
networks, including the original YOLOv5s.
2. Related Works
YOLO, which stands for you only look once [19–22], is a one-stage object-detection
network that differs from two-stage object-detection networks such as R-CNN and Faster-
R-CNN [23,24] in that the YOLO series of networks only needs to scan the image once to
output the detection result, which achieves the function of object localization and detection
at one time, whereas two-stage object detection networks need to scan the image twice to
complete object localization and detection. Therefore, although the detection accuracy of
the YOLO series networks may be slightly inferior to that of the first-stage object-detection
network, YOLO series networks are faster in terms of their detection rate and still achieve
Electronics 2023, 12, 2821 4 of 24
an excellent accuracy, which has a high theoretical and application value; additionally,
many scholars have found results related to object detection by using YOLO series networks
to enhance the network [25–28].
YOLOv5 is the fifth iteration of the YOLO series network, which is now divided
into five versions in terms of their weight, from small to large, depth, and width. The
network depth increases according to the number of modules in the series, and increasing
the network depth allows for a larger receptive field to help capture more pixel-like features
and increase the expressiveness of the model. Increasing the number of input and output
channels between layers increases the network width, and increasing the network width
allows for finer granularity and richer features and increases the parallelism of the model,
which thereby increases the training speed. As a result, the number of modules used for
smaller weights is overall smaller than the number used for larger weights, so smaller
weights are faster but less accurate, whereas larger weights are slower but more accurate,
and the training cost rises as the depth and width of the weight increases. The YOLOv5s
v6.0 network was chosen to enhance and test the PCB surface defect detection, and we
needed to ensure both the detection efficiency and accuracy of the network, as these factors
are most important for practical application scenarios. The YOLOv5s versions mentioned
below are all v6.0.
The YOLOv5s network structure is shown in Figure 2. The structure consists of three
main parts, namely the backbone and neck network and head detector. The backbone
network consists of a convolution module named CBS that is connected by a convolution,
batch normalization layer, SiLU activation function [29], and C3_1 modules that contain a
residual structure in a series. The role of the convolution module is to obtain the feature
map in the original image by downsampling, and the residual structure in the C3_1 module
can enhance the gradient value during backpropagation, which effectively prevents the
gradient from disappearing when the network deepens and reduces the loss in feature
extraction. The SPP (spatial pyramid pooling) [30] module is the last layer of the old
YOLOv5s backbone network. The current version of YOLOv5s uses the SPPF module to
replace the SPP module used in the previous version, which is optimized on the basis of SPP
and reduces the computational effort and increases the computational speed of the model
without changing the original algorithm. The module downsamples the last layer of the
feature map with three different-sized convolution layers and fuses the three downsampled
features to extract spatial features of different sizes and to increase the robustness of the
model. The next subnetwork connected to the backbone network is the neck network,
which mainly consists of convolution, upsampling, feature fusion, and C3_2 modules with
the residual structure removed. The feature fusion module can concatenate feature maps
of the same size in both layers of the network to increase the number and granularity of
the features in the feature map. C3_2 reuses the C3_1 module from the backbone network,
which reduces the computational effort, enhances the feature fusion capability, and retains
richer feature information. Lastly, the output section of YOLOv5s uses CIoU as the loss
function and provides three different feature map sizes for detection.
Electronics 2023, 12, 2821 5 of 24
Regarding the main method that is used to process the feature map by using CBAM
attention, the feature map is first max and average pooled by the channel attention module
to calculate two tensors with only the channel dimensions, which are summed by a fully
connected layer and activated by a sigmoid function to obtain the channel attention ten-
sor; the channel attention structure is shown in Figure 5, and its expression is shown in
Equations (1) and (2).
c c
MC ( F ) = σ (W1 (W0 ( Favg )) + (W1 (W0 ( Fmax ))) (2)
The new feature maps are calculated by the channel attention module and then passed
through the spatial attention module. The spatial attention module pools the feature maps
of the maximum and average pooling and connects the two pooled tensors in the channel
dimension. The spatial attention tensor is obtained by convolving the obtained feature
maps into one channel and then activating it with the sigmoid function. The structure of
the spatial attention is shown in Figure 6, and its expression is shown in Equation (3).
and computational complexity of the model. The addition of the CBAM attention can
enhance the performance and computational efficiency of the neural networks, which thus
makes them more applicable in various application scenarios. So, CBAM was added in
YOLO-MBBi.
3.3. BiFPN
FPN [31] stands for feature pyramid networks, which are fundamental components of
recognition systems that are used to detect objects at different scales. FPNs connect top-
down high-level features with low-level features, which enhances the semantic information
of the features at all scales. However, the problem with FPNs is that a limitation regarding
unidirectional information flow exists.
BiFPN [16] is a bidirectional weighted feature pyramid network whose structure is
shown in Figure 7, and it is capable of simply and quickly fusing multiscale features. BiFPN
adds cross-scale connections to the network that are not found in other FPNs and removes
nodes that only have one input and very little weight in their feature fusion; additionally,
it adds skip connections between the input and output nodes so that the output nodes
can use both the feature fusion information from the original nodes, as well as the feature
information that was feature fused.
The weight differences between the features are usually not considered when fusing
the features at different scales via traditional feature fusion, and instead, feature fusion is
directly performed. BiFPN uses the weights of features at different scales as parameters for
deep learning as well; this feature fusion method is known as fast normalized fusion, and
the expression of this fusion method is shown in Equation (4):
w
O= ∑i e + ∑i wj
· Ii (4)
j
where Ii and O denote the features before and after fusion, respectively; wi and w j denote
the weights of the features to be learned; and e is a minimal value that is much smaller
than one to ensure the stability of the value.
BiFPN also employs learnable weights that can adaptively adjust the degree of infor-
mation transfer and fusion, which increases the efficiency of the information utilization and
enables the simultaneous up- and downsampling of features without the need to separate
the up- and downsampling, as is required for some traditional methods, which prevents
computation and memory wastage. Additionally, BiFPN can be used in feature pyramid
networks, which enables efficient multiscale feature fusion and can increase the accuracy
with which a model performs object-detection tasks through more comprehensive feature
interaction and information fusion. The results of experiments on a number of publicly
available datasets have shown that models incorporating BiFPN can achieve higher accu-
racy compared with models without BiFPN. Therefore, YOLO-MBBi uses BiFPN instead of
the original feature fusion.
C1 ∩ C2
IoU = (5)
C1 ∪ C2
To address the problem for objects of different size scales, the enhanced loss functions
such as GIoU, DIoU, and CIoU [13] were optimized based on the original IoU. SIoU [18]
then considers the angle difference between the two rectangular centres of the prediction
rectangular on the basis of these IoUs. Note that φ is the acute angle between the centres of
the two rectangles and can be used to find the angular loss Λ according to φ, as shown in
Equations (6) and (7). SIoU redefines the distance loss calculation ∆ , which also uses the
angle loss function Λ, as shown in Equations (8) and (9):
min(dw , dh )
φ = arcsin (6)
ρ(b, b gt )
π
Λ = 1 − 2 sin2 (θ − ) (7)
4
dw 2 d
ρx = ( ) , ρ y = ( h )2 (8)
cw ) ch )
|w − w gt | |h − h gt |
Equation (10), whereby ωw = , ω = , and θ is the hyperparameter.
max(w, w gt ) h max(h, h gt )
After calculating ∆ and Ω, the formula for SIoU is provided in Equation (11):
Ω = (1 − e − ωw ) θ + (1 − e − ω h ) θ (10)
∆+Ω
SIoU = 1 − IoU + (11)
2
Compared with the CIoU used in the YOLOv5 network, SIoU redefines the penalty cal-
culation method, considers the vector pinch angle of the desired regression, and calculates
more accurate loss values, which is conducive to increasing the accuracy and efficiency of
the inference of the regression during training. Therefore, the CIoU loss function used in
YOLOv5s is replaced with the SIoU loss function as the locus loss function for the bounding
box regression.
Figure 8. The depth-wise convolution structure. For the input feature map, the features of each
channel are separately convolved to obtain a feature map with the same number of channels, and
then the number of channels in the feature map is expanded by using point-by-point convolution [17].
(a) (b)
(c) (d)
Figure 9. Four sample images from PKU-Market-PCB dataset. (a,b) Images of a normally placed
PCB, (c,d) images after random rotation.
Electronics 2023, 12, 2821 11 of 24
DeepPCB is a PCB surface defect dataset from GitHub that contains 1500 images of
PCBs with six types of surface defects: openings, shorts, mouse bites, spurs, spurious
copper, and pinholes. It also contains 1500 corresponding images that do not show defects
for comparison and reference purposes. All the images in this dataset were obtained from
a linear scan CCD at a resolution of around 48 pixels per 1 millimetre. Each original image
had 16k × 16k pixels. These original images were cropped into many sub-images with
a size of 640 × 640 and were aligned by using template matching techniques. Figure 10
displays the normal PCB images and PCB images with surface defects in this dataset. A
total of 1200 images were present in the training set, 150 images were present in each of the
training and testing sets, and each image showed different types of defects.
Considering the small size of the two datasets chosen, the datasets were randomly
divided three times, and each model was trained once on each of the three divided datasets.
For each model’s metrics, the average of their corresponding metrics in these three training
sessions was taken as the final result.
(a) (b)
Figure 10. Two sample images from DeepPCB dataset. (a) PCB images without surface defects,
(b) PCB images with surface defects.
TP TP
P= × 100%, R = × 100% (12)
TP + FP TP + FN
Additionally, mAP50 could be calculated according to the recognition accuracy of
each category, as shown in Equation (13). The number 50 reflects the threshold value of the
Electronics 2023, 12, 2821 12 of 24
intersection ratio, which is 0.5; Ncls is the total number of items in all the categories; and
R1
0 Pi ( Ri ) dR is the detection accuracy of the target object in category i.
N R1
∑i=cls1 0 Pi ( Ri )dR
mAP50 = × 100% (13)
Ncls
The FPS is the number of frames per second and represents the number of images
detected per second. The faster the inference rate of the model, the more images are
detected per unit of time.
FLOPs refer to the number of floating point operations of a neural network, and it
is often used to measure the computational complexity of a neural network, compare the
computational efficiency between different neural networks, and evaluate the computa-
tional complexity and speed of a model. In general, a higher FLOPs value means that the
weight needs to perform more computations and therefore requires more computational
resources, and it also means that the model will run slower, although using a model with
lower FLOPs can increase the corresponding speed.
P $ 3
<