Gu 2020
Gu 2020
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2020.3026285, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
Abstract—For the purpose of ensuring public security, crime [1]. Security screening using X-ray scanners is widely
automatic inspection of X-ray scanners has been deployed used in public places [2]. These scans are visually inspected
at the entry points of many public places to detect danger- by a specifically trained human inspector to ensure there are
ous objects. However, current surveillance systems cannot
function without human supervision and intervention. In no dangers. It is extremely tedious to manually perform this
this paper, we propose an effective method using deep task since the baggage might actually be dangerous [3]. During
convolutional neural networks to detect objects during X- rush hour, it only takes a few seconds to determine whether
ray baggage inspection. As a first step, a large amount a piece of baggage contains any dangers or not [4]. Since
of training data is generated by a specific data augmen- each employee has to check a large amount of baggage, the
tation technique. Second, a feature enhancement module
is used to improve feature extraction capabilities. Then in possibility of human error over a long time is considerable,
order to address the foreground-background imbalance in even with specialized training [5]. Automated X-ray analysis
the region proposal network, focal loss is adopted. Third, remains a crucial issue in baggage inspection.
the multi-scale fused region of interest (RoI) is utilized to X-ray imaging is quite different from natural optical imag-
obtain more robust proposals. Finally, soft non-maximum ing in several aspects. The main difference is that the X-ray
suppression (NMS) is adopted to alleviate overlaps in bag-
gage detection. As compared with existing algorithms, the image is formed by irradiating the object with X-rays, while
proposed method proves that it is more accurate and robust the natural optical image is formed by the light reflection,
when dealing with densely cluttered backgrounds during X- which gives information about the surface of the objects
ray baggage inspection. [6], [7]. Thus, an X-ray image consists of shadows from
Index Terms—Convolutional Neural Networks, Baggage overlapping transparent layers. The transparency of the image
Inspection, Baggage Detection, X-Ray Images for Security is determined by the material density along the X-ray path.
Applications The visibility of objects on X-ray images depends on the
object’s density: high-density objects (e.g. thick metal) behave
substantially opaque and occlude all the other overlapping
I. I NTRODUCTION
objects, while very low-density objects (e.g. clothes) are barely
0278-0046 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 13:28:47 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2020.3026285, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
used for X-ray baggage inspection. of the target object. Domingo Mery and colleagues [18] used
In recent years, convolutional neural networks (CNN) have adaptive sparse representations [19], [20] to automatically
been widely used in image analysis and interpretation. Meth- detect objects, when less restrictive conditions apply including
ods based on deep learning have achieved state-of-the-art some contrast, pose, intra-class variability and focal distance.
detection performance in many computer vision tasks [10]– The task presented in [21] considered a bag of visual words
[12], such as face recognition and automatic driving. However, (BoVW) model with several hand-crafted feature represen-
few efforts have been dedicated to investigate object detection tations. It achieved an average precision of 57%. Thorsten
in X-ray baggage inspection due to many limitations. For the Franzel and colleagues [22] studied the applicability and
lack of training data, most of the existing methods finetune efficiency of sparse local features on X-ray baggage object de-
the networks [13] to achieve good performance. But this is tection. This work investigated how material information given
not feasible in X-ray baggage inspection. Direct adoption of a in multi-view X-ray imaging affects detection performance.
pre-trained network has little flexibility to adjust the structure. As clearly seen, these methods are mostly based on hand-
There might be bias in the learning process. A good solution to crafted features. However, the advances in automated baggage
tackle these critical issues is to train the models from scratch. inspection are minimal and very limited compared to what
However, due to numerous parameters and inefficient training is required for X-ray inspection systems, which rely less on
strategy with the limited training data, previous approaches are human inspectors.
difficult to converge [14].
To address these issues, in this paper we propose an effective B. Convolutional Neural Networks for Object Detection
approach for object detection in X-ray baggage inspection.
Compared with other detection methods for object surface, Deep convolutional neural networks have made huge steps
such as Faster Region-based Convolutional Neural Networks in object detection in recent years. State-of-the-art deep CNN-
(Faster-RCNN) [15], Feature Pyramid Network (FPN) [16], based object detection methods can be divided into two groups:
our method has great advantages regarding object detection two-stage methods and single-stage methods. 1) Two-stage
in X-ray baggage inspection for object interior character. The methods, such as R-CNN [23], Fast R-CNN [24], Faster R-
main contributions of the proposed method are as follows. CNN [15], R-FCN [25] and FPN [16] achieve the detection
First, a specific data augmentation pipeline is designed to through two steps. The first step generates a set of candi-
accommodate the varied data. Second, an effective feature date region proposals and the second step classifies them
enhancement module is added to improve feature extraction into the target object category. To date, two-stage methods
capabilities; focal loss is adopted to address the foreground- have achieved the highest accuracy among object detection
background imbalance. Third, multi-scale fused RoI is adopted methods. 2) Single-stage methods, such as YOLO [26]–[28]
to obtain more accurate region proposals. Finally, soft NMS and SSD [29], use a single feed-forward convolutional network
is used to reduce errors when detecting adjacent objects. Two to directly predict classes and bounding boxes. Although these
new datasets are built for X-ray baggage object detection. methods have been tuned for speed, their accuracy is lower
To evaluate the method, a list of representative CNN-based than two-stage methods.
methods is investigated on the task of object detection during Object detection during X-ray baggage inspection is a
X-ray baggage inspection. The results are reported as a useful more challenging task than in natural optical images. To
performance baseline. And the proposed method outperforms the best of our knowledge, most of the previous work used
the existing ones. networks that were pre-trained on the ImageNet classification
The rest of this paper is organized as follows. Related work dataset. The study presented in [30] compared a BoVW
is explored in Section II. The proposed method in detail is approach with a CNN approach, exploring the use of transfer
presented in Section III. All methods evaluated in this work learning by finetuning the weights of different layers. The
are reported in Section IV. Finally, a conclusion is given in layers were transferred from another network trained on a
Section V. different task. Experiments show that CNN-based methods
outperform BoVW methods. Samet Akcay et al. [31] explored
some framework on X-ray baggage image classification and
II. RELATED WORK
detection. Their results showed that the CNN-based method
In this section, we briefly introduce traditional object de- outperforms hand-crafted methods.
tection methods in X-ray baggage inspection and CNN-based
models for object detection. III. PROPOSED METHOD
Our method is inspired by the design principles of the two-
A. Traditional Object Detection in X-ray Baggage Inspec- stage methods. Thus it inherits the accuracy advantages of
tion region proposal based methods. Fig. 2 illustrates the architec-
Some approaches attempted to perform object detection in ture of the proposed method, which can mainly be divided
X-ray baggage images from a single view of a single energy. into two parts: X-ray proposal network (XPN) and X-ray
The adapted implicit shape model (AISM) based on visual discriminative network (XDN). XPN takes an image as input
codebooks was proposed in [17]. This method used visual and outputs predicted region boxes. XDN is added after XPN.
vocabulary and appearance structures that were generated from XDN takes the coarse region boxes as input, and outputs the
a training dataset which includes representative X-ray images refined category and position simultaneously. In XPN, data
0278-0046 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 13:28:47 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2020.3026285, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
Fig. 2. The architecture of the proposed method. Part A represents the XPN. Part E represents the XDN.
augmentation is used to accommodate the diversity of the input distribution (0, 1), inv(·) represents the complement of the
image. The followed feature enhancement module is utilized image, op(·) represents the basic augmentation operation. In
to make information easier to propagate. In XDN, the fused this method, we use the following basic data augmentation:
RoI layer allows each proposal to access information from all affine transformations, mirroring and flipping, cropping and
levels. Then the bounding box regression and class prediction perspective transformations. Fig. 3(b) shows an example of
are processed. After XDN, soft NMS is used to alleviate object our data augmentation. A gun patch is cut from image A. Then
overlaps in the model. we apply some basic augmentation to the gun patch. Finally,
we paste the gun patch on another image B for training. This
technology can make full use of the data which have no target
A. Data Augmentation
object.
Data augmentation plays an important role in increasing
the network robustness against normal changes that might
appear in X-ray images, such as density changes or changes
in object orientation. Additionally, it can be used to achieve
better generalization and to simulate different X-ray object
conditions thus overcoming one of the main weaknesses of
CNN: its heavy reliance on previous training data. X-ray
images are quite different from natural images since they
undergo severe geometric transformations and are densely (a)
cluttered, this makes it difficult to cover most situations in
the training set. In this paper, we use an online augmentation
approach providing a virtually infinite dataset that does not
require extra storage space on the disk. Many applications
use basic geometric transformations for data augmentation,
such as mirroring and flipping. In order to change the position
of the objects, affine transformations are performed. Besides
the basic data augmentation, we design an effective pipeline
for X-ray images inspection to handle the problem of densely
cluttered objects in X-ray images. Fig. 3(a) shows the details (b)
of the specific technique. We select two random images A Fig. 3. Description of the proposed data augmentation technology.
and B from the database, image A belongs to the data which
contains target objects and images B belongs to the data which
contains no target objects. We cut the part containing the target B. Feature Enhancement Module
object from image A, namely patch A. Then we apply basic Feature enhancement module is utilized to enhance the
data augmentation to patch A. Finally, we combine image B information flow, which is not trivial for traditional CNN-
with patch A to build the augmented data used for training. based detectors, especially for densely cluttered background
We define the combined operation as: and small objects. It has been found that in [32], low layers
contain less semantic information compared with high layers,
C = inv(λ × inv(op(A)) + (1 − λ) × inv(B)) (1) but they have a higher localization accuracy. Since object de-
tection requires both accurate positions and precise categories,
Where C represents the composited image, λ represents multiple layers fusion is required. Several recent models for
the combination ratio which is sampled from the uniform object detection utilized different layers in a network. Some
0278-0046 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 13:28:47 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2020.3026285, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
Ni+1 = Ni ⊕ (Ci+1 ⊕ Mi+1 ) (2) Focal loss [36] is utilized to avoid class imbalance problem
by down-weighting the losses of vast number of easy sam-
where ⊕ is an concatenation operation. A 1 × 1 convolution ples during the training of XPN. The basic region proposal
operation is applied to the previous layers (Ci+1 and Mi+1 ). network generates a large number of regions including more
Together with down-sampled layer of Ni , three components negative regions than the positive ones. To compensate for this
are added to produce fused layer Ni+1 . With this design, the imbalance, sampling strategies including random sampling and
current layers can take full advantage of prior information to hard negative mining are adopted in most models. Only a fixed
extract more discriminative representations. Fig. 4 shows The number of anchors with a fixed ratio are sampled. However, the
structure of the last bottom-up path {N1 , N2 , N3 , N4 }. resulting sampled positives cannot fully represent the objects.
In this paper, we use focal loss to take all regions into account
for training.
With the traditional definitions, the training
P regions of the
nth proposal layer are defined as S n = (p∗i , b∗i ), where
i
p∗i and b∗i are the corresponding label and ground truth coor-
dinates, respectively. Similar to most CNN-based detectors,
the loss of the ith sample in the nth detection layer is a
Fig. 4. The detail of the last bottom-up path which bring more combina- combination of classification and bounding box regression,
tions for the feature maps.
which is defined as follows:
Dilated convolution layer has achieved progressive improve-
ment in semantic segmentation which can provide context Ln ( pi , bi | W ) = Lcls (pi , p∗i ) +λLreg (bi , b∗i ) (3)
information [33]–[35]. In this work, dilated convolution layers
are utilized to enhance the feature maps for region proposal, Where W represents the parameters of the region proposal
thus making them more discriminable and robust. Fig. 5 shows network, pi is the probability distribution over the background
the details of our context enhancement module (CEM ). It and foreground object that is calculated by a softmax layer,
takes the feature maps N as input, and outputs the feature λ is the balancing parameter and bi stands for the regressed
maps P . The module contains one convolution layer and bounding box. For regions that are positively labeled, the
several dilated convolution layers with different dilation rates. bounding box b∗i is regressed from the corresponding region
We then concatenate the output feature maps of different box bi . The regression loss denotes a smooth L1 loss defined
convolution layers. More details is shown in the part D of as:
0278-0046 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 13:28:47 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2020.3026285, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
0278-0046 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 13:28:47 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2020.3026285, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
0278-0046 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 13:28:47 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2020.3026285, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
TABLE I
AVERAGE PRECISION (AP) AND MEAN AVERAGE PRECISION
(MAP) ON THE TESTING SET OF THE XDB1 DATASET
model mAP scissors bottle mental cup kitchen knife knife battery scissors
Faster R-CNN (resnet50) 0.837 0.808 0.879 0.891 0.844 0.730 0.835 0.872
R-FCN(resnet50) 0.847 0.827 0.861 0.895 0.853 0.752 0.848 0.896
FPN(resnet50) 0.922 0.895 0.951 0.962 0.936 0.823 0.938 0.952
SSD(vgg) 0.823 0.788 0.873 0.866 0.801 0.714 0.827 0.893
YOLOv3(darknet53) 0.870 0.842 0.897 0.906 0.872 0.785 0.856 0.930
Ours 0.954 0.938 0.981 0.989 0.963 0.880 0.951 0.974
0278-0046 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 13:28:47 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2020.3026285, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
TABLE II
AVERAGE PRECISION (AP) AND MEAN AVERAGE PRECISION
(MAP) ON THE TESTING SET OF THE XDB2 DATASET
model mAP bottles mental cup knives scissors gun battery laptop umbrella lighter pressure cans
Faster R-CNN(resnet50) 0.706 0.808 0.809 0.528 0.614 0.710 0.683 0.818 0.781 0.557 0.749
Fig. 10. Performance with different proportion on the dataset Xdb2
R-FCN(resnet50) 0.706 0.817 0.801 0.525 0.623 0.722 0.688 0.815 0.762 0.551 0.752
FPN(resnet50) 0.797 0.860 0.858 0.677 0.690 0.834 0.766 0.927 0.853 0.669 0.831
SSD(vgg) 0.694 0.803 0.810 0.521 0.604 0.704 0.679 0.792 0.754 0.548 0.727
YOLOv3(darknet53) 0.713 0.826 0.822 0.532 0.612 0.731 0.696 0.811 0.795 0.552 0.755
V3 with our RoI fusion module, Ours represents V4 with
Ours 0.835 0.898 0.886 0.737 0.743 0.864 0.816 0.938 0.885 0.706 0.872 soft NMS module. For Xdb1 dataset, the data augmentation
module’s improvement is 1.5% more than our modified FPN,
feature enhancement module’s improvement is 0.8% more
than V1, focal loss module’s improvement is 0.1% more
D. Analysis of the Proposed Modules than V2, RoI module’s improvement is 0.6% more than V3,
To examine the effectiveness and contributions of different soft NMS module’s improvement is 0.3% more than V4 and
modules that are used in the proposed method, we conduct an our proposed method’s improvement is 3.4% more than our
additional ablation experiment for studies listed in Table IV modified FPN. For Xdb2 dataset: data augmentation module’s
and Table V. We mainly analyze the data augmentation mod- improvement is 2.3% more than our modified FPN, feature
ule, feature enhancement module, focal loss module and enhancement module gets an improvement of 1.3% more
soft NMS module. They are all listed in Table III where than V1, focal loss module’s improvement is 0.2% more than
FPN represents our modified feature pyramid network, V1 V2, RoI module’s improvement is 0.6% more than V3, soft
represents the basic FPN with our data augmentation module, NMS module gets an improvement of 0.3% more than V4
V2 represents V1 with our feature enhancement module, V3 and our proposed method improves by 4.7% more than our
represents V2 with our focal loss module, V4 represents modified FPN. The results of this comparison clearly reveal
0278-0046 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 13:28:47 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2020.3026285, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
the advantages of our method. The data augmentation module appearance of other objects. These reasons make it difficult for
improves the most in mAP. It can generate diverse training the model to correctly distinguish the targets. The right parts
data which can make the detector more robust when it is fed of Fig. 11 and Fig. 12 show some samples that are missed
with new data. Feature enhancement module is also effective marked in red rectangles. In the right part of Fig. 11, the
Thanks to combining enhanced multi-scale feature maps which kitchen knife is recognized as a dagger. Due to different views,
can be helpful for small objects like knives. The focal loss this sample has a similar shape of a dagger. The right part of
module and soft NMS module slightly improve which is still Fig. 12 shows that the knives was recognized as a lighter. Due
useful for detection. RoI module is also effective by fusing all to the small size, they may have the same shape in this view.
feature maps to generate more robust results. This situation may be alleviated by multi-view detection.
In order to further show the effect of the proposed method,
TABLE III we also validate the number of baggage without target objects
ADDITIONAL EXPERIMENTS WITH DIFFERENT MODULES OF detected as containing any of target objects. For the dataset of
OUR PROPOSED METHOD
Xdb1, 0.4% of baggage without target objects is detected as
Pyramid Structure Data Augmentation Feature Enhancement Focal Loss RoI Fusion Soft NMS
containing any of target objects. For the dataset of Xdb2, 0.2%
FPN ! # # # # #
of baggage without target objects is detected as containing any
V1 ! ! # # # # of target objects. The results show that the proposed method
! ! ! # # #
V2
yields a very low percentage as the baggage without target
V3 ! ! ! ! # #
V4 ! ! ! ! ! # objects is detected as containing any of target object. This
Ours ! ! ! ! ! ! proves that our method can be used in practice.
TABLE IV
AVERAGE PRECISION (AP) AND MEAN AVERAGE PRECISION
(MAP) ON THE TESTING SET OF THE XDB1 DATASET
model mAP scissors bottle mental cup kitchen knife knife battery scissors
TABLE V
AVERAGE PRECISION (AP) AND MEAN AVERAGE PRECISION
(MAP) ON THE TESTING SET OF THE XDB2 DATASET
model mAP bottles mental cup knives scissors gun battery laptop umbrella lighter pressure cans
FPN 0.797 0.860 0.858 0.677 0.690 0.834 0.766 0.927 0.853 0.669 0.831
V1 0.816 0.879 0.873 0.718 0.719 0.846 0.785 0.931 0.869 0.684 0.858
V2 0.827 0.887 0.879 0.727 0.735 0.857 0.805 0.935 0.877 0.699 0.864
V3 0.829 0.890 0.881 0.730 0.737 0.859 0.809 0.935 0.880 0.701 0.867
Fig. 12. Miss detection situation on Xdb2 dataset. (Left) miss alarm and
V4 0.833 0.895 0.885 0.735 0.743 0.862 0.813 0.938 0.883 0.705 0.870
(Right) false alarm.
Ours 0.835 0.898 0.886 0.737 0.743 0.864 0.816 0.938 0.885 0.706 0.872
V. CONCLUSIONS
E. False alarm and miss alarm In this paper, an effective approach is proposed to build
Although the proposed algorithm outperforms the relevant a deep object detector and train it from scratch for X-ray
methods on object detection during X-ray baggage inspection, image inspection. The novelties that distinguish the proposed
there are still some targets that are missed or misreported. This work from previous works lie in two major aspects. First,
section will briefly analyze these situations. Test results show instead of fine-tuning using ImageNet pre-trained models, our
that most errors occur in situations like the ones shown in method trains the deep detector from scratch, this provides the
Fig. 11 and Fig. 12. Due to the impact of objects’ cluttering, freedom to adjust or redesign the structures. Second, in order
some objects are misreported during testing. The left parts to improve the detection performance for clustered objects,
of Fig. 11 and Fig. 12 show some samples that are missed we adopt focal loss to address the foreground-background
marked in red rectangles. The misreported object may heavily imbalance and predict multi-scale object proposals from sev-
suffer from other objects or similar objects. Due to the impact eral enhanced intermediate layers to improve the accuracy.
of different views, some objects have a similar shape and The proposed regions are scaled using RoI Align, followed
0278-0046 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 13:28:47 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2020.3026285, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
by element-level fusion and soft NMS post-processing. The [19] J. Liu, Y. Hu, J. Yang, Y. Chen, H. Shu, L. Luo, Q. Feng, Z. Gui,
quantitative comparison results on the Xdb1 and Xdb2 datasets and G. Coatrieux, “3d feature constrained reconstruction for low-dose
ct imaging,” IEEE Transactions on Circuits and Systems for Video
show that the proposed method achieves better performance Technology, vol. 28, no. 5, pp. 1232–1247, 2016.
than comparative methods and it is more effective than existing [20] J. Liu, J. Ma, Y. Zhang, Y. Chen, J. Yang, H. Shu, L. Luo, G. Coatrieux,
algorithms for detecting small and densely cluttered X-ray W. Yang, Q. Feng et al., “Discriminative feature representation to
improve projection data inconsistency for low dose ct imaging,” IEEE
objects. However, as above stated, our method still produces transactions on medical imaging, vol. 36, no. 12, pp. 2499–2509, 2017.
some false alarms and omissions in some severe situations. [21] M. Baştan, M. R. Yousefi, and T. M. Breuel, “Visual words on baggage
Hence, in our future studies, we will focus on discriminating x-ray images,” in International Conference on Computer Analysis of
Images and Patterns, pp. 360–368. Springer, 2011.
the false alarms and learning the structure of the network [22] D. Mery, V. Riffo, I. Zuccar, and C. Pieringer, “Automated x-ray object
adaptively. In addition, we will improve the transferability of recognition using an efficient search algorithm in multiple views,” in
our model using domain adaptation methods. Proceedings of the IEEE conference on computer vision and pattern
recognition workshops, pp. 368–374, 2013.
[23] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Region-based
R EFERENCES convolutional networks for accurate object detection and segmentation,”
IEEE transactions on pattern analysis and machine intelligence, vol. 38,
[1] G. Zentai, “X-ray imaging for homeland security,” in 2008 IEEE no. 1, pp. 142–158, 2015.
International Workshop on Imaging Systems and Techniques, pp. 1–6. [24] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international
IEEE, 2008. conference on computer vision, pp. 1440–1448, 2015.
[2] E. Parliament, “Aviation security with a special focus on security [25] J. Dai, Y. Li, K. He, and J. Sun, “R-fcn: Object detection via region-
scanners,” European Parliament Resolution (2010/2154 (INI)), pp. 1– based fully convolutional networks,” in Advances in neural information
10, 2012. processing systems, pp. 379–387, 2016.
[3] A. Schwaninger, A. Bolfing, T. Halbherr, S. Helman, A. Belyavin, and [26] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look
L. Hay, “The impact of image based factors and training on threat once: Unified, real-time object detection,” in Proceedings of the IEEE
detection performance in x-ray screening.” 2008. conference on computer vision and pattern recognition, pp. 779–788,
[4] G. Blalock, V. Kadiyali, and D. H. Simon, “The impact of post-9/11 2016.
airport security measures on the demand for air travel,” The Journal of [27] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in
Law and Economics, vol. 50, no. 4, pp. 731–755, 2007. Proceedings of the IEEE conference on computer vision and pattern
[5] S. Michel, S. M. Koller, J. C. de Ruiter, R. Moerland, M. Hogervorst, recognition, pp. 7263–7271, 2017.
and A. Schwaninger, “Computer-based training increases efficiency in [28] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,”
x-ray image interpretation by aviation security screeners,” in 2007 41st arXiv preprint arXiv:1804.02767, 2018.
Annual IEEE international carnahan conference on security technology, [29] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C.
pp. 201–206. IEEE, 2007. Berg, “Ssd: Single shot multibox detector,” in European conference on
[6] Z. Chen, Y. Zheng, B. R. Abidi, D. L. Page, and M. A. Abidi, “A computer vision, pp. 21–37. Springer, 2016.
combinational approach to the fusion, de-noising and enhancement of [30] S. Akçay, M. E. Kundegorski, M. Devereux, and T. P. Breckon, “Transfer
dual-energy x-ray luggage images,” in 2005 IEEE Computer Society learning using convolutional neural networks for object classification
Conference on Computer Vision and Pattern Recognition (CVPR’05)- within x-ray baggage security imagery,” in 2016 IEEE International
Workshops, pp. 2–2. IEEE, 2005. Conference on Image Processing (ICIP), pp. 1057–1061. IEEE, 2016.
[7] D. Mery, “Computer vision for x-ray testing,” Switzerland: Springer [31] S. Akcay, M. E. Kundegorski, C. G. Willcocks, and T. P. Breckon,
International Publishing.–2015, vol. 10, pp. 978–3, 2015. “Using deep convolutional neural network architectures for object clas-
[8] V. Rebuffel and J.-M. Dinten, “Dual-energy x-ray imaging: benefits sification and detection within x-ray baggage security imagery,” IEEE
and limits,” Insight-non-destructive testing and condition monitoring, transactions on information forensics and security, vol. 13, no. 9, pp.
vol. 49, no. 10, pp. 589–594, 2007. 2203–2215, 2018.
[9] D. Mery, E. Svec, M. Arias, V. Riffo, J. M. Saavedra, and S. Banerjee, [32] P. Wang, P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell,
“Modern computer vision techniques for x-ray testing in baggage inspec- “Understanding convolution for semantic segmentation,” in 2018 IEEE
tion,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, winter conference on applications of computer vision (WACV), pp. 1451–
vol. 47, no. 4, pp. 682–692, 2016. 1460. IEEE, 2018.
[10] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, [33] F. Yu, V. Koltun, and T. Funkhouser, “Dilated residual networks,” in
inception-resnet and the impact of residual connections on learning,” in Proceedings of the IEEE conference on computer vision and pattern
Thirty-first AAAI conference on artificial intelligence, 2017. recognition, pp. 472–480, 2017.
[11] Y. Zhu and S. Newsam, “Densenet for dense flow,” in 2017 IEEE [34] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
international conference on image processing (ICIP), pp. 790–794. V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,”
IEEE, 2017. in Proceedings of the IEEE conference on computer vision and pattern
[12] A. Kamilaris and F. X. Prenafeta-Boldú, “Deep learning in agriculture: A recognition, pp. 1–9, 2015.
survey,” Computers and electronics in agriculture, vol. 147, pp. 70–90, [35] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille,
2018. “Deeplab: Semantic image segmentation with deep convolutional nets,
[13] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, atrous convolution, and fully connected crfs,” IEEE transactions on
Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848,
scale visual recognition challenge,” International journal of computer 2017.
vision, vol. 115, no. 3, pp. 211–252, 2015. [36] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss
[14] M. Baştan, “Multi-view object detection in dual-energy x-ray images,” for dense object detection,” in Proceedings of the IEEE international
Machine Vision and Applications, vol. 26, no. 7-8, pp. 1045–1060, 2015. conference on computer vision, pp. 2980–2988, 2017.
[15] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time [37] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in
object detection with region proposal networks,” in Advances in neural Proceedings of the IEEE international conference on computer vision,
information processing systems, pp. 91–99, 2015. pp. 2961–2969, 2017.
[16] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, [38] N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, “Soft-nms: improving
“Feature pyramid networks for object detection,” in Proceedings of the object detection with one line of code,” in Proceedings of the IEEE
IEEE conference on computer vision and pattern recognition, pp. 2117– international conference on computer vision, pp. 5561–5569, 2017.
2125, 2017. [39] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan,
[17] V. Riffo and D. Mery, “Automated detection of threat objects using T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An
adapted implicit shape model,” IEEE Transactions on Systems, Man, imperative style, high-performance deep learning library,” in Advances
and Cybernetics: Systems, vol. 46, no. 4, pp. 472–482, 2015. in Neural Information Processing Systems, pp. 8024–8035, 2019.
[18] D. Mery, E. Svec, and M. Arias, “Object recognition in baggage [40] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisser-
inspection using adaptive sparse representations of x-ray images,” in man, “The pascal visual object classes (voc) challenge,” International
Image and Video Technology, pp. 709–720. Springer, 2015. journal of computer vision, vol. 88, no. 2, pp. 303–338, 2010.
0278-0046 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 13:28:47 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2020.3026285, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
0278-0046 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 13:28:47 UTC from IEEE Xplore. Restrictions apply.