0% found this document useful (0 votes)

30 views5 pages

3538 978-1-7281-9835-4/23/$31.00 ©2023 Ieee Icip 2023

This document summarizes a research paper about developing a context-aware transformer model for weakly supervised baggage threat localization using X-ray images. The proposed dual-token transformer architecture can generalize to different threat categories by learning threat-specific semantics from token-wise attention to generate context maps. The model was evaluated on two public datasets and outperformed other state-of-the-art approaches. The researchers aim to develop an automated solution for threat detection that does not require extensive instance-level annotations, unlike many existing deep learning methods.

Uploaded by

fthun58

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views5 pages

3538 978-1-7281-9835-4/23/$31.00 ©2023 Ieee Icip 2023

Uploaded by

fthun58

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

CONTEXT-AWARE TRANSFORMERS FOR WEAKLY SUPERVISED BAGGAGE THREAT

LOCALIZATION

Divya Velayudhan1 , Abdelfatah Ahmed1 , Taimur Hassan 2 , Mohammed Bennamoun3

Ernesto Damiani 1 , Naoufel Werghi1

1
KUCARS and C2PS, Department of Electrical Engineering and Computer Science, Khalifa University
2
Department of Electrical, Computer and Biomedical Engineering,Abu Dhabi University
3
Department of Computer Science and Software Engineering, The University of Western Australia

ABSTRACT
Recent advances in deep learning have facilitated significant
progress in the autonomous detection of concealed security
threats from baggage X-ray scans, a plausible solution to
overcome the pitfalls of manual screening. However, these
data-hungry schemes rely on extensive instance-level annota-
tions that involve strenuous skilled labor. Hence, this paper
proposes a context-aware transformer for weakly supervised
baggage threat localization, exploiting their inherent capacity
to learn long-range semantic relations to capture the object-
level context of the illegal items. Unlike the conventional
single-class token transformers, the proposed dual-token ar-
Fig. 1. Visualization of baggage threat localization. The top
chitecture can generalize well to different threat categories
row showcases the result of the proposed approach, with (A)
by learning the threat-specific semantics from the token-wise
threat-aware context map extracted from the proposed CGM
attention to generate context maps. The framework has been
(dimension 14 × 14), (B) context map interpolated and over-
evaluated on two public datasets, Compass-XP and SIXray,
laid on the input scan, capturing global features of the threat
and surpassed other SOTA approaches.
items, and (C) the final threat localization result. The bottom
Index Terms— Baggage security, X-ray Imagery, Weakly row shows the comparative results with different approaches,
Supervised Localization, Threat Recognition, Transformer. Grad-CAM [9], Ablation CAM [10] and TS-CAM [11].

1. INTRODUCTION Within the broader vision community, researchers have

focused on weakly supervised localization (WSOL), explor-
The increasing passenger traffic at airports and other transit ing weak supervisory data (image labels) to locate the ob-
hubs aggravates the risk of concealed security threats within ject categories instead of demanding instance-level annota-
baggage, raising serious concerns about public safety. Since tions. [12, 13]. Further, WSOL provides visual reasoning,
existing techniques in baggage monitoring are reliant on hu- which is vital for critical applications, such as baggage se-
man expertise, researchers have proposed automated baggage curity [14]. Visual inferences not only aid security personnel
security threat identification from X-ray scans as a plausible in locating the threats but also researchers in identifying the
solution [1–6]. Despite the significant progress, these data- pitfalls of the framework and developing more robust models.
hungry schemes feed on massive amounts of well-annotated The pioneering work in this field, proposed by Zhou et
training data, which are procured at great cost. Further, stud- al. [12], redesigned the classifier and linearly merged the ac-
ies have reported that the instance-level annotation of security tivation maps to depict model predictions. Other methods
datasets necessitates skilled security personnel to identify the based on class activation maps (CAM) include gradient-based
different threat categories and also involves strenuous labor approaches such as Grad-CAM [9] which is widely embraced
(taking up to 3 minutes for a single scan) [7, 8]. due to accessibility to all CNN architectures. However, it
This work is supported by a research fund from Khalifa University. Ref:
fails in highlighting integral object regions and in the case
CIRA-2019-047 and the Abu Dhabi Department of Education and Knowl- of multiple instances. To counteract this, several other tech-
edge (ADEK), Ref: AARE19-156. niques were contributed, such as adversarial erasing [13], and

978-1-7281-9835-4/23/$31.00 ©2023 IEEE 3538 ICIP 2023

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on November 24,2023 at 08:42:29 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. The proposed Context-aware dual token transformer model translates the input scan into a sequence of patch tokens to
which learnable dual-class tokens are affixed to capture the global context of both the threat and benign categories. Position
embeddings are also added before passing the tokens through the encoder layers. CGM captures the global context of the
concealed threats by leveraging the threat-specific class token CT and the patch tokens. The context map is then refined using
patch-wise attention which is passed to PSM to expose complete threat objects from cluttered and occluded baggage scans.

gradient-free approaches [10, 15]. between overlapping threats and normal items. Hence, we
However, these approaches, based on CNNs [16, 17], are propose a dual-token transformer architecture, unlike conven-
constrained to localized interactions (see Fig. 1). Alterna- tional single-class token transformers, to capture the object-
tively, vision transformers [18] have gained attention due to level context of concealed security threats and to generalize
their ability to model global features by leveraging long-range well to different threat categories by localizing them with
semantics, which is crucial in localizing the object of interest. only binary labels (Threat vs. Benign). A class-specific train-
Gao et al. [11] incorporated CAMs with transformers to em- ing strategy is employed to associate the class tokens with
phasize the distinctive local features while diverting attention the specific object category (detailed in Section 2). We have
from the irrelevant parts. Meanwhile, Su et al. [19] proposed also designed a Context map Generation Module (CGM) to
token-prioritizing to comprehend the objects precisely. capture the global semantics of the threat items. Further, we
Despite the progress, WSOL has not yet been investigated have integrated a Patch Scoring Module (PSM) to expose ad-
in security threat recognition, primarily due to the additional ditional relevant occluded object regions.
challenges: a) Occlusion: Threat items may be impeded by
other high-density benign materials, rendering them indistin- 2. PROPOSED METHOD
guishable; b) Heavily cluttered background: Precise local-
ization is challenging due to noisy activation maps. Towards This section provides an overview of the proposed context-
this goal, we explore weakly supervised baggage threat lo- aware transformer (Fig. 2) along with detailed explanations
calization using transformers to exploit their ability to model of CGM and PSM and the implemented training strategy.
long-range spatial correlations. Furthermore, transformers Context-aware Transformer architecture : The input bag-
are ideal for X-ray baggage threat localization, as they favor gage X-ray image x of resolution W × H is initially divided
shape over texture and are robust to occlusion [20]. into M patches, where each patch xpn ∈ Rs×s×3 , n =
Even though the multi-headed attention mechanism en- 1, 2, · · · M does not overlap with the adjacent patches, such
that M = N ×N and N = W/s. The patches are then vector-
ables transformers to focus on several semantic regions, the
ized and linearly projected (represented by F() in Eq. 1) into
attentions are not class-specific [11]. Further, it can lead to M patch embeddings xn ∈ RM ×D , to which class tokens
very noisy activation maps as the class token captures interac- xCL ∈ R2×D are affixed, where D denotes the embedding di-
tions between different classes and the background. This can mension and xCL = [xCT ; xCB ] comprises of xCT and xCB
lead to unsatisfactory localization results, especially in com- ∈ R1×D . It is to be noted that, unlike standard transformer
pactly packed baggage scans where it is difficult to distinguish design where a single class token is employed, the pro-

3539

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on November 24,2023 at 08:42:29 UTC from IEEE Xplore. Restrictions apply.
posed framework has a dual-token architecture to capture the CAM [15] for transformer models. Score CAM was initially
context of both threat and benign categories by learning dis- proposed to grasp the significance of the activation maps of
criminative representations respectively. The tokens are then CNNs. However, as discussed in Section 3, employing Score
updated using positional embeddings xpos ∈ R(2+M )×D , CAM on patch tokens can activate unwanted backgrounds.
yielding the input token embeddings xin ∈ R(2+M )×D , In the proposed PSM, the patch embeddings from the final
which are then passed through L stacked encoder blocks. encoder block {P1 , P2 · · · PM } are first reshaped and trans-
xin = [xCT ; xCB ; F(xP1 ); F(xP2 ); · · · F(xPM )] ⊕ xpos (1)
posed into feature maps PF , where each feature map PFd , d ∈
{1, 2, · · · D} highlights different semantically related regions.
However, this might also add unwanted parts to the localiza-
= [CT ; CB ; P1 ; P2 ; · · · PM ] (2) tion results. Hence, the refined context map ACT ref is added
Each of these encoder blocks is comprised of a multiheaded to the feature maps to suppress the background.
attention layer with k heads and a multilayer perceptron. As
PFT = ACT ref ⊕ PF (6)
the tokens pass through multiple encoder blocks, CT captures
the contextual information of threat items from the scans. where PFT ∈ RN ×N ×D is then upsampled and normalized.
Context map Generation : The proposed CGM is respon-
sible for extracting the global context of concealed security ˆ = PFT − min(PFT )
threats by leveraging the long-range inter-dependencies be- PFT (7)
max(PFT ) − min(PFT )
tween the tokens learned by the self-attention blocks within
the encoder. More specifically, the input tokens xin are trans- The feature maps were superimposed over the input scan x
formed into queries Q, keys K, and values V for computing to generate scans with partial masking. These masked images
the attention (Eq. 3). The token-wise similarity map AT (Eq. were then fed to the trained transformer model to yield target
4) is then obtained by fusing the attention across the k heads. scores, which were then utilized as weights to linearly com-
bine with the respective feature maps to yield the final threat
!
QK T
Attention(Q, K, V ) = sof tmax p V (3) localization map. The bounding boxes were then drawn using
Dq
! the technique in [12].
QK T Dual Token Training Strategy : To capture the contextual
AT = sof tmax p (4) information of the threats from the scans, it is essential to
Dq
build a one-to-one association between each class token and
where Q, K, V ∈ R(2+M )×Dk and Dk = D/k. The at- the respective ground truth label. This is attained by modify-
tention map AT ∈ R(2+M )×(2+M ) captures the pair-wise at- ing the head of the proposed framework, where the final MLP
tention between the input tokens, as shown in Fig. 2. The head used for classification in standard transformer models is
orange-colored columns represent the attention between the replaced with an average pooling layer. The dual output to-
class tokens and patch tokens, from which we can extract kens from the final layer (CT ok = [CT , CB ] , CT ok ∈ R2×D )
threat-specific context map ACT ∈ R1×N ×N . ACT is ob- are averaged along the embedding dimension to obtain the
tained by leveraging and reshaping the attention scores be- scores corresponding to the threat and benign classes, which
are supervised by the one-hot encoded class labels, con-
tween the threat-specific class token CT and the patch tokens
strained via binary cross entropy loss.
(P1 , P2 · · · PM ). In this work, we have only used the final
encoder block in our implementation because the low-level D
1 X
semantics learned by the early layers can lead to noisy activa- y(c) = CT ok (c, l) , c ∈ {0, 1} (8)
D
l
tion that can hinder threat localization.
The context map ACT is then refined using patch-wise at-
where CT ok (c, l) is the lth element along the embedding
tention leveraged from AT , which is straightforward in con-
trast to prior works [21]. The blue-colored columns (see Fig. dimension of the cth token. The proposed training strategy
2) represent the attention scores between the patch tokens, enables each of the dual tokens to model distinctive global
which are averaged across the k attention heads, given by semantic correlations specific to the two classes.
AP ∈ RM ×M , utilized to refine the threat context map ACT :
M
X 3. EXPERIMENTAL ANALYSIS AND RESULTS
ACT ref (j) = AP (j, n) · ACT (n) (5)
n The proposed Context-aware baggage threat localization ap-
where ACT ref is later reshaped into a 2D tensor to yield the proach was evaluated on Compass-XP [22] and SIXray [3].
refined map ( ACT ref ∈ RN ×N ). It can be observed from Compass-XP, released in 2019, comprises of 11,568 scans
Section 3 that AP enhances localization continuity. (with different representations such as low and high energy,
Patch Scoring Module : PSM employs a perturbation-based grayscale, color, and density variants), from which 80% were
strategy to expose additional relevant and occluded object re- used for training, per the protocol. The SIXray dataset, on the
gions. It reveals more salient parts while retaining the re- other hand, consists of five threat categories and is very unbal-
gions captured by the CGM. The technique adapts the Score anced and occluded (guns, pliers, scissors, wrenches, knives).

3540

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on November 24,2023 at 08:42:29 UTC from IEEE Xplore. Restrictions apply.
Table 1. Performance on COMPASS-XP [22] and SIXray [3].

Compass-XP SIXray
Methods
Top-1 GT-Known Top-1 GT-Known Loc.
Grad-CAM (ResNet) 36.5 39.2 22.6 30.1 -
Ablation-CAM 35.9 38.3 21.9 28.8 -
TS-CAM 43.4 45.1 33.8 35.2 -
CHR [3] - - - - 54.8
Ours 55.3 58.2 37.6 38.3 82.9

Table 2. Comparative study to analyze the significance of

CGM and PSM in the proposed method.

Dataset Method Top-1 GT-Known

Fig. 3. Visualization of baggage threat localization using dif- Compass-XP CGM Only 43.8 45.2
ferent methods. Results in columns 2 and 3 are based on PSM Only 41.1 42.6
SIXray CGM Only 31.3 34.1
CNN, while TS-CAM (column 4) is based on Transformers.
PSM Only 29.7 34.4

Main Results : As evident from Table 1, the proposed

framework outperforms TS-CAM [11] both in terms of Top-
1 Loc.Acc. and GT-Known Loc.Acc. yielding 55.3% and
58.2% respectively on Compass-XP [22]. On SIXray, the
proposed framework delivers the best results, achieving Top-
1 Loc.Acc. of 37.6% and GT-Known Loc.Acc. of 38.3%.
Further, we have computed the localization metric as per the
protocol in [3] for comparative analysis, surpassing [3] by
28%. The comparatively lower results on SIXray are primar-
ily due to the heavy occlusion and compactly packed scans
Fig. 4. Qualitative analysis to study the significance of CGM in the dataset. The qualitative analysis is given in Fig. 3,
and PSM in the proposed framework. which depicts the superior results of the proposed method by
localizing the knife that overlays the metal band in Row 2,
and identifying both instances of Guns in Row 3.
We assessed the architecture using SIXray10 subset, which
Ablative Study : We have also qualitatively and quantita-
consists of 89,290 benign scans and 8,910 threat scans.
tively assessed our approach to analyze the significance of
Implementation :The proposed framework was constructed
both CGM and PSM, as shown in Table 2 and Fig. 4. From
using the ImageNet-trained DeiT-S backbone [18]. In partic-
Fig. 4, it can be observed that our approach can expertly lo-
ular, we used the single class token to initialize the proposed
calize occluded baggage threats. As can be seen from the
dual-token architecture. While training, data augmentation
top Row, our approach localizes the second occluded knife
was done (horizontal and vertical flipping), and scans were
(shown using a bounding box), which is not possible with
resized to 224 x 224. The framework was implemented us-
only CGM or PSM blocks alone.
ing PyTorch on a machine with Intel(R) Core(TM) i7-10700K
CPU @ 3.80GHz processor having NVIDIA GeForce RTX
3060 Ti, and trained the model for 20 epochs with batch size 4. CONCLUSION
12 and an initial learning rate of 2 e-5.
Evaluation Metrics : We have adopted the commonly used This work presents a context-aware transformer framework
metrics as in [13]: GT-known Localization Accuracy (posi- for weakly supervised X-ray baggage threat localization by
tive for over 50% IoU between the predicted and ground truth encoding the object-level context of the concealed security
bounding boxes) and Top-1 Localization Accuracy ( positive threats. The proposed dual-token architecture can general-
if correctly classified with over 50% IoU between the predic- ize well to different threat categories by learning the threat-
tions and ground truth). In addition, we have also computed specific semantics from the token-wise attention to generate
localization metric (Loc) as in [3] for comparative study (con- context maps. The patch tokens from the transformer out-
sidered positive if maximal response falls within one of the put are then scored to expose other salient regions, including
ground truth boxes). occluded threats. Experiments on two public X-ray baggage

3541

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on November 24,2023 at 08:42:29 UTC from IEEE Xplore. Restrictions apply.
datasets demonstrate the superiority of the approach. semantic coupled attention map for weakly supervised object
localization,” in Proceedings of the IEEE/CVF International
5. REFERENCES Conference on Computer Vision, 2021, pp. 2886–2895.
[12] Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and
[1] Divya Velayudhan, Taimur Hassan, Ernesto Damiani, and Antonio Torralba, “Learning deep features for discriminative
Naoufel Werghi, “Recent advances in baggage threat detec- localization,” in Proceedings of the IEEE conference on com-
tion: A comprehensive and systematic survey,” ACM Comput- puter vision and pattern recognition, 2016, pp. 2921–2929.
ing Surveys (CSUR), 2022.
[13] Xiaolin Zhang, Yunchao Wei, Jiashi Feng, Yi Yang, and
[2] Divya Velayudhan, Taimur Hassan, Abdelfatah Hassan Thomas S Huang, “Adversarial complementary learning for
Ahmed, Ernesto Damiani, and Naoufel Werghi, “Baggage weakly supervised object localization,” in Proceedings of the
threat recognition using deep low-rank broad learning detec- IEEE conference on computer vision and pattern recognition,
tor,” in 2022 IEEE 21st Mediterranean Electrotechnical Con- 2018, pp. 1325–1334.
ference (MELECON), 2022, pp. 966–971.
[14] Sahil Singla, Besmira Nushi, Shital Shah, Ece Kamar, and Eric
[3] C. Miao, L. Xie, F. Wan, C. Su, H. Liu, J. Jiao, and Q. Ye, Horvitz, “Understanding failures of deep networks via robust
“SIXray: A Large-scale Security Inspection X-ray Bench- feature extraction,” in Proceedings of the IEEE/CVF Confer-
mark for Prohibited Item Discovery in Overlapping Images,” ence on Computer Vision and Pattern Recognition, 2021, pp.
IEEE Conference on Computer Vision and Pattern Recogni- 12853–12862.
tion, 2019.
[15] Haofan Wang, Zifan Wang, Mengnan Du, Fan Yang, Zijian
[4] Abdelfatah Ahmed, Ahmad Obeid, Divya Velayudhan, Taimur Zhang, Sirui Ding, Piotr Mardziel, and Xia Hu, “Score-cam:
Hassan, Ernesto Damiani, and Naoufel Werghi, “Balanced Score-weighted visual explanations for convolutional neural
affinity loss for highly imbalanced baggage threat contour- networks,” in Proceedings of the IEEE/CVF conference on
driven instance segmentation,” in 2022 IEEE International computer vision and pattern recognition workshops, 2020, pp.
Conference on Image Processing (ICIP), 2022, pp. 981–985. 24–25.
[5] Taimur Hassan, Samet Akçay, Mohammed Bennamoun, [16] Bilal Hassan, Shiyin Qin, Taimur Hassan, Ramsha Ahmed,
Salman Khan, and Naoufel Werghi, “Unsupervised anomaly and Naoufel Werghi, “Joint segmentation and quantification
instance segmentation for baggage threat recognition,” Journal of chorioretinal biomarkers in optical coherence tomography
of Ambient Intelligence and Humanized Computing, pp. 1–12, scans: A deep learning approach,” IEEE Transactions on In-
2021. strumentation and Measurement, vol. 70, pp. 1–17, 2021.
[6] Taimur Hassan, Samet Akcay, Mohammed Bennamoun, [17] E. A. Hadhrami, M. A. Mufti, B. Taha, and N. Werghi, “Trans-
Salman Khan, and Naoufel Werghi, “Tensor pooling-driven fer learning with convolutional neural networks for moving tar-
instance segmentation framework for baggage threat recogni- get classification with micro-doppler radar spectrograms,” in
tion,” Neural Computing and Applications, vol. 34, no. 2, pp. 2018 International Conference on Artificial Intelligence and
1239–1250, 2022. Big Data, ICAIBD 2018, 2018, pp. 148–154.
[7] Renshuai Tao, Yanlu Wei, Xiangjian Jiang, Hainan Li, Hao- [18] Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco
tong Qin, Jiakai Wang, Yuqing Ma, Libo Zhang, and Xiang- Massa, Alexandre Sablayrolles, and Hervé Jégou, “Train-
long Liu, “Towards real-world x-ray security inspection: A ing data-efficient image transformers & distillation through at-
high-quality benchmark and lateral inhibition module for pro- tention,” in International Conference on Machine Learning.
hibited items detection,” in Proceedings of the IEEE/CVF In- PMLR, 2021, pp. 10347–10357.
ternational Conference on Computer Vision, 2021, pp. 10923–
10932. [19] Hui Su, Yue Ye, Zhiwei Chen, Mingli Song, and Lechao
Cheng, “Re-attention transformer for weakly supervised ob-
[8] Boying Wang, Libo Zhang, Longyin Wen, Xianglong Liu, and ject localization,” arXiv preprint arXiv:2208.01838, 2022.
Yanjun Wu, “Towards real-world prohibited item detection: A
large-scale x-ray benchmark,” in Proceedings of the IEEE/CVF [20] Muhammad Muzammal Naseer, Kanchana Ranasinghe,
International Conference on Computer Vision, 2021, pp. 5412– Salman H Khan, Munawar Hayat, Fahad Shahbaz Khan, and
5421. Ming-Hsuan Yang, “Intriguing properties of vision transform-
ers,” Advances in Neural Information Processing Systems, vol.
[9] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das,
34, pp. 23296–23308, 2021.
Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra, “Grad-
cam: Visual explanations from deep networks via gradient- [21] Jiwoon Ahn and Suha Kwak, “Learning pixel-level semantic
based localization,” in Proceedings of the IEEE international affinity with image-level supervision for weakly supervised se-
conference on computer vision, 2017, pp. 618–626. mantic segmentation,” in Proceedings of the IEEE conference
on computer vision and pattern recognition, 2018, pp. 4981–
[10] Harish Guruprasad Ramaswamy et al., “Ablation-cam: Visual
4990.
explanations for deep convolutional network via gradient-free
localization,” in Proceedings of the IEEE/CVF Winter Confer- [22] Matthew Caldwell and Lewis D Griffin, “Limits on transfer
ence on Applications of Computer Vision, 2020, pp. 983–991. learning from photographic image data to x-ray threat detec-
tion,” Journal of X-ray Science and Technology, vol. 27, no. 6,
[11] Wei Gao, Fang Wan, Xingjia Pan, Zhiliang Peng, Qi Tian,
pp. 1007–1020, 2019.
Zhenjun Han, Bolei Zhou, and Qixiang Ye, “Ts-cam: Token

3542

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on November 24,2023 at 08:42:29 UTC from IEEE Xplore. Restrictions apply.

Hope - 3 Grade 12: Quarter 1 Week 4 Module 4
80% (5)
Hope - 3 Grade 12: Quarter 1 Week 4 Module 4
3 pages
Part 2
No ratings yet
Part 2
225 pages
Factors Affecting The Academic Performance of ABM Students of Santa Isabel College of Manila AY 2017-2018
No ratings yet
Factors Affecting The Academic Performance of ABM Students of Santa Isabel College of Manila AY 2017-2018
14 pages
METU Thesis Fatihbaltaci
No ratings yet
METU Thesis Fatihbaltaci
105 pages
Lesson Plan Life of Pi
No ratings yet
Lesson Plan Life of Pi
3 pages
TFM Fabian Eitel
No ratings yet
TFM Fabian Eitel
121 pages
Cviii 2024 Ws
No ratings yet
Cviii 2024 Ws
98 pages
Master Thesis Paper p-grad-CAM
No ratings yet
Master Thesis Paper p-grad-CAM
87 pages
120 Pilot Interview Questions and Answers
100% (1)
120 Pilot Interview Questions and Answers
45 pages
Communicative Language Teaching (CLT) - Lesson Plan
No ratings yet
Communicative Language Teaching (CLT) - Lesson Plan
6 pages
Principles and Theories of Teaching and Learning
100% (1)
Principles and Theories of Teaching and Learning
162 pages
JCS 67979 Manuscript Clean 99919 1
No ratings yet
JCS 67979 Manuscript Clean 99919 1
26 pages
A - Survey Object - Detection - and - X-Ray - Security - Imaging
No ratings yet
A - Survey Object - Detection - and - X-Ray - Security - Imaging
26 pages
Robust and Explainable Semi-Supervised Deep Learning Model For Anomaly Detection in Aviation
No ratings yet
Robust and Explainable Semi-Supervised Deep Learning Model For Anomaly Detection in Aviation
21 pages
From Text To Mask Localizing Entities Using The
No ratings yet
From Text To Mask Localizing Entities Using The
43 pages
Dlp-Tle-Ia (Week 1)
100% (5)
Dlp-Tle-Ia (Week 1)
10 pages
Full Download Sensor Data Understanding 1st Edition Marcin Grzegorzek PDF
No ratings yet
Full Download Sensor Data Understanding 1st Edition Marcin Grzegorzek PDF
40 pages
A Review On Multiscale-Deep-Learning Applications
No ratings yet
A Review On Multiscale-Deep-Learning Applications
28 pages
BDCC 08 00116 v2
No ratings yet
BDCC 08 00116 v2
23 pages
The 2010 Secondary Education Curriculum
No ratings yet
The 2010 Secondary Education Curriculum
35 pages
1 s2.0 S0262885622001007 Main
No ratings yet
1 s2.0 S0262885622001007 Main
26 pages
Ankan Bansal Zero-Shot Object Detection ECCV
No ratings yet
Ankan Bansal Zero-Shot Object Detection ECCV
17 pages
Fully-Connected Transformer For Multi-Source Image Fusion
No ratings yet
Fully-Connected Transformer For Multi-Source Image Fusion
18 pages
1-Recent Advances in Object Detection in The Age of Deep Convolutional Neural Networks
No ratings yet
1-Recent Advances in Object Detection in The Age of Deep Convolutional Neural Networks
104 pages
Knowledge-Guided Causal Intervention For Weakly-Supervised Object Localization
No ratings yet
Knowledge-Guided Causal Intervention For Weakly-Supervised Object Localization
13 pages
WBNet Weakly Supervised Salient Object Detection Via Scri - 2024 - Pattern Reco
No ratings yet
WBNet Weakly Supervised Salient Object Detection Via Scri - 2024 - Pattern Reco
15 pages
Revealing The Dark Secrets of Masked Image Modeling
No ratings yet
Revealing The Dark Secrets of Masked Image Modeling
22 pages
Thesis (2) Removed
No ratings yet
Thesis (2) Removed
34 pages
2103 - ICML - Perceiver General Perception With Iterative Attention
No ratings yet
2103 - ICML - Perceiver General Perception With Iterative Attention
16 pages
Batch Normalization Embeddings For Deep Domain Generalization
No ratings yet
Batch Normalization Embeddings For Deep Domain Generalization
15 pages
Xu Multi-Class Token Transformer For Weakly Supervised Semantic Segmentation CVPR 2022 Paper
No ratings yet
Xu Multi-Class Token Transformer For Weakly Supervised Semantic Segmentation CVPR 2022 Paper
10 pages
Mask-Attention-Free Transformer For 3D Instance Segmentation
No ratings yet
Mask-Attention-Free Transformer For 3D Instance Segmentation
11 pages
Rethinking Transformer-Based Blind-Spot Network For Self-Supervised Image Denoising
No ratings yet
Rethinking Transformer-Based Blind-Spot Network For Self-Supervised Image Denoising
13 pages
Diverse Complementary Part Mining For Weakly Supervised Object Localization
No ratings yet
Diverse Complementary Part Mining For Weakly Supervised Object Localization
15 pages
When Transformer Meets Robotic Grasping Exploits Context For Efficient Grasp Detection
No ratings yet
When Transformer Meets Robotic Grasping Exploits Context For Efficient Grasp Detection
8 pages
Oquab Is Object Localization 2015 CVPR Paper
No ratings yet
Oquab Is Object Localization 2015 CVPR Paper
10 pages
C2 AM Contrastive Learning of Class-Agnostic Activation Map For Weakly Supervised Object Localization and Semantic Segmentation
No ratings yet
C2 AM Contrastive Learning of Class-Agnostic Activation Map For Weakly Supervised Object Localization and Semantic Segmentation
10 pages
An Overview of Vision Transformers For Image Processing A Survey
No ratings yet
An Overview of Vision Transformers For Image Processing A Survey
17 pages
Unified Deep Supervised Domain Adaptation and Generalization
No ratings yet
Unified Deep Supervised Domain Adaptation and Generalization
11 pages
Immediate Access Abnormal or Exceptional Mental Health Literacy For Child and Youth Care 1st Canadian Edition Gural Verified PDF Download
No ratings yet
Immediate Access Abnormal or Exceptional Mental Health Literacy For Child and Youth Care 1st Canadian Edition Gural Verified PDF Download
409 pages
Electronics 11 02306 v2 PDF
No ratings yet
Electronics 11 02306 v2 PDF
15 pages
Agro Implicit Occupancy Flow Fields For Perception and Prediction in Self-Driving CVPR 2023 Paper
No ratings yet
Agro Implicit Occupancy Flow Fields For Perception and Prediction in Self-Driving CVPR 2023 Paper
10 pages
Suspicious Activity Detection Using Different Models
No ratings yet
Suspicious Activity Detection Using Different Models
12 pages
Chapter 2 CYTED Book
No ratings yet
Chapter 2 CYTED Book
15 pages
NeurIPS 2023 Clusterfomer Clustering As A Universal Visual Learner Paper Conference
No ratings yet
NeurIPS 2023 Clusterfomer Clustering As A Universal Visual Learner Paper Conference
14 pages
Wangyu Wu, Tianhong Dai, Xiaowei Huang, Fei Ma, Jimin Xiao
No ratings yet
Wangyu Wu, Tianhong Dai, Xiaowei Huang, Fei Ma, Jimin Xiao
5 pages
Object Detection and Translation For Bli
No ratings yet
Object Detection and Translation For Bli
6 pages
Joint Sub-Component Level Segmentation and Classification For Anomaly Detection Within Dual-Energy X-Ray Security Imagery
No ratings yet
Joint Sub-Component Level Segmentation and Classification For Anomaly Detection Within Dual-Energy X-Ray Security Imagery
5 pages
Object Detection Using YOLO: Challenges, Architectural Successors, Datasets and Applications
No ratings yet
Object Detection Using YOLO: Challenges, Architectural Successors, Datasets and Applications
33 pages
1choices Russia Elementary Teacher S Book PDF
No ratings yet
1choices Russia Elementary Teacher S Book PDF
157 pages
Module V-Deep Learning
No ratings yet
Module V-Deep Learning
19 pages
OD Trans Christopher-Lang2022 Q2
No ratings yet
OD Trans Christopher-Lang2022 Q2
15 pages
A Triple Deep Image Prior Model For Image Denoising Based On Mixed Priors and Noise Learning
No ratings yet
A Triple Deep Image Prior Model For Image Denoising Based On Mixed Priors and Noise Learning
19 pages
(PDF) Overview of Computer Vision
No ratings yet
(PDF) Overview of Computer Vision
4 pages
Transforming Sensor Data To The Image Domain For Deep Learning - An Application To Footstep Detection
No ratings yet
Transforming Sensor Data To The Image Domain For Deep Learning - An Application To Footstep Detection
8 pages
Anomaly Detection For People With Visual Impairments Using An Egocentric 360-Degree Camera
No ratings yet
Anomaly Detection For People With Visual Impairments Using An Egocentric 360-Degree Camera
10 pages
Sensors 22 04833
No ratings yet
Sensors 22 04833
17 pages
Transformer For Object Detection Review and Benchmark
No ratings yet
Transformer For Object Detection Review and Benchmark
16 pages
Recent Advances in Deep Learning For Object Detection
No ratings yet
Recent Advances in Deep Learning For Object Detection
26 pages
SAFuseNet Integration of Fusion and Detection For Infrared and Visible
No ratings yet
SAFuseNet Integration of Fusion and Detection For Infrared and Visible
7 pages
Tong 2020
No ratings yet
Tong 2020
14 pages
Applied Computational Intelligence and Soft Computing - 2020 - Kamsing - Deep Neural Learning Adaptive Sequential Monte
No ratings yet
Applied Computational Intelligence and Soft Computing - 2020 - Kamsing - Deep Neural Learning Adaptive Sequential Monte
9 pages
Engproc 33 00020
No ratings yet
Engproc 33 00020
6 pages
Quickbrowser: A Unified Model To Detect and Read Simple Object in Real-Time
No ratings yet
Quickbrowser: A Unified Model To Detect and Read Simple Object in Real-Time
8 pages
Multiple Transformer Mining For Vizwiz Image Caption
No ratings yet
Multiple Transformer Mining For Vizwiz Image Caption
2 pages
Realtime Visual Recognition in Deep Convolutional Neural Networks
No ratings yet
Realtime Visual Recognition in Deep Convolutional Neural Networks
13 pages
Zeiler Ec CV 2014
No ratings yet
Zeiler Ec CV 2014
16 pages
A Literature Review of Object Detection Using YOLOv4 Detector
No ratings yet
A Literature Review of Object Detection Using YOLOv4 Detector
7 pages
Science 8
No ratings yet
Science 8
21 pages
Team Building Presentation Paul Kastigu
No ratings yet
Team Building Presentation Paul Kastigu
15 pages
Improving CNN Performance With Min-Max Objective
No ratings yet
Improving CNN Performance With Min-Max Objective
7 pages
Lesson Plan Short Story
100% (1)
Lesson Plan Short Story
2 pages
1 - Basic Concepts of Engineering Research
No ratings yet
1 - Basic Concepts of Engineering Research
32 pages
The Philippine Professional Standards For Teachers: Gina O. Gonong Director, Research Center For Teacher Quality
No ratings yet
The Philippine Professional Standards For Teachers: Gina O. Gonong Director, Research Center For Teacher Quality
52 pages
4methods and Strategies in Teaching The Arts
No ratings yet
4methods and Strategies in Teaching The Arts
4 pages
Group-Listening + Speaking Presentation
No ratings yet
Group-Listening + Speaking Presentation
42 pages
Numeracy Test Synthesis
No ratings yet
Numeracy Test Synthesis
3 pages
Possessive Pronouns Lesson Plan
No ratings yet
Possessive Pronouns Lesson Plan
3 pages
UG English III Sem Syllabi
No ratings yet
UG English III Sem Syllabi
48 pages
Bobes Ojt Resume
No ratings yet
Bobes Ojt Resume
1 page
Acc QSN at Ans Gce PDF
No ratings yet
Acc QSN at Ans Gce PDF
156 pages
Staj 1
No ratings yet
Staj 1
7 pages
Chapter 4
No ratings yet
Chapter 4
11 pages
Machine Learning Models
100% (1)
Machine Learning Models
2 pages
Neha Chheda Resume 2016
No ratings yet
Neha Chheda Resume 2016
2 pages
2 Module-2 Question 16 09 2023
No ratings yet
2 Module-2 Question 16 09 2023
38 pages
Scratch - Storyline With Exponents
No ratings yet
Scratch - Storyline With Exponents
3 pages
Station Rotations Model Planning Template: Step 1: Reimagine The Learning Environment
No ratings yet
Station Rotations Model Planning Template: Step 1: Reimagine The Learning Environment
10 pages
The Role of Parents in Transforming Education
No ratings yet
The Role of Parents in Transforming Education
3 pages

3538 978-1-7281-9835-4/23/$31.00 ©2023 Ieee Icip 2023

Uploaded by

3538 978-1-7281-9835-4/23/$31.00 ©2023 Ieee Icip 2023

Uploaded by

CONTEXT-AWARE TRANSFORMERS FOR WEAKLY SUPERVISED BAGGAGE THREAT

Divya Velayudhan1 , Abdelfatah Ahmed1 , Taimur Hassan 2 , Mohammed Bennamoun3

Ernesto Damiani 1 , Naoufel Werghi1

1. INTRODUCTION Within the broader vision community, researchers have

978-1-7281-9835-4/23/$31.00 ©2023 IEEE 3538 ICIP 2023

Table 2. Comparative study to analyze the significance of

Dataset Method Top-1 GT-Known

Main Results : As evident from Table 1, the proposed

You might also like