Object Counting Yolo

Uploaded by

Hi Manshu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views7 pages

Object Counting Yolo

Uploaded by

Hi Manshu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

OBJECT COUNTING: YOU ONLY NEED TO LOOK AT ONE

Hui LIN, Xiaopeng HONG, Yabin WANG

School of Cyber Science and Engineering, Xi’an Jiaotong University, China

Emails: [email protected]; [email protected]; [email protected]

ABSTRACT several categories exist within a same image. Moreover in

arXiv:2112.05993v1 [cs.CV] 11 Dec 2021

few-shot setting, these categories will not overlap between

This paper aims to tackle the challenging task of one-
training and inference. This means that the model needs to
shot object counting. Given an image containing novel, pre-
have a strong distinguishing ability between features of differ-
viously unseen category objects, the goal of the task is to
ent categories, and meanwhile, an effective associating abil-
count all instances in the desired category with only one sup-
ity among instances sharing the same category. Second, in
porting bounding box example. To this end, we propose a
one-shot counting, the model learns from only one support-
counting model by which you only need to Look At One in-
ing instance. Much of the difficulty results from the fact that
stance (LaoNet). First, a feature correlation module combines
the supporting sample may differ from other instances in, for
the Self-Attention and Correlative-Attention modules to learn
example, sizes and poses. Hence, the model is required to
both inner-relations and inter-relations. It enables the network
be invariant towards these variations without seeing the com-
to be robust to the inconsistency of rotations and sizes among
monalities across different instances.
different instances. Second, a Scale Aggregation mechanism
Therefore, in this paper, we propose an effective network
is designed to help extract features with different scale infor-
named LaoNet for one-shot object counting. It consists of
mation. Compared with existing few-shot counting methods,
three main parts: feature extraction, feature correlation and
LaoNet achieves state-of-the-art results while learning with a
the density regressor, as shown in Figure 1. The feature corre-
high convergence speed. The code will be available soon.
lation model and the feature extraction model are elaborately
Index Terms— Object Counting, One-Shot Learning, At- designed to address the above two challenges.
tention Mechanism We propose the feature correlation based on Self-
Attention and Correlative-Attention modules to learn inner-
1. INTRODUCTION relations and inter-relations respectively. The Self-Attention
encourages the model to focus more on important features and
Object counting has become increasingly important due to its their correlations, improving the efficiency of information re-
wide range of applications such as crowd surveillance, traffic finement. Previous few-shot counting methods [4, 5] usually
monitoring, wildlife conservation and inventory management. leverage on a convolution operation to match the similarities
Most of the existing counting methods [1, 2, 3] focus on a par- between image features and supporting features. However,
ticular, single category. However, when applying them into as the kernel is derived from supporting features with the de-
new categories, their performances will drop catastrophically. fault size and rotation angle, the convolution operation will
Meanwhile, it is extremely difficult and costly to collect all greatly depend on the quality of supporting features and the
categories and label them for training. consistency of physical properties among different instances.
For humans, the generalization ability allows them to Instead, our designed feature correlation model benefits from
learn and deal with various vision tasks without much prior two kinds of attention modules and addresses the above prob-
knowledge and experience. We are amazed by this remark- lem by considering all correlations.
able ability and in this work, we focus on this learning We further propose a Scale Aggregation mechanism in
paradigm and design a network to efficiently recognize and scale extraction to deal with scale variations among different
count new categories given only one example. We follow the categories and different instances. By learning features from
few-shot setting in [4] and modify it to one-shot object count- multi-subspace, the model aggregates various scale informa-
ing. That is, the model takes an image with unseen novel cate- tion while maintaining a spatial consistency.
gories and a supporting bounding box containing an example To summarize, our contribution is threefold.
instance of desired category as input, and then predicts the
object count in the image. • We design a novel network named LaoNet (A network
However, there are two main challenges. First, the object by which you only need to Look At One instance) for
counting task includes many different categories, and even one-shot object counting. By combining Self-Attention
Fig. 1. The overall architecture of the proposed LaoNet for one-shot object counting. Both the query image and the supporting
box are fed into CNN to extract features. Supporting features are aggregated among scales. Then the flatten features with unique
position embedding are transmitted into feature correlation model with Self-Attentions and Correlative Attentions. Finally, a
density regressor is adopted to predict the final density map.

and Correlative-Attention modules, LaoNet exploits Generic Matching Network (GMN) for class-agnostic count-
the correlation among novel category objects with high ing. However it still needs several dozens to hundreds exam-
accuracy and efficiency. ples of a novel category for adaptation and good performance.
CFOCNet is introduced to match and utilize the similarity be-
• We propose a Scale Aggregation mechanism to extract tween objects within the same category [5]. The work [4]
more comprehensive features and fuse multi-scale in- presents a Few Shot Adaptation and Matching Network (Fam-
formation from the supporting box. Net) to learn feature correlations and few-shot adaptation and
• The experimental results show that our model achieves also introduces a few-shot counting dataset named FSC-147.
state-of-the-art results with significant improvements When the number of labeled example decreases to one,
on FSC-147 [4] and COCO [6] datasets under the one- the task evolves into one-shot counting. In other visual tasks,
shot setting without fine-tuning. researchers develop methods for one-shot segmentation [17]
and one-shot object detection [18, 19]. Compared to the few-
shot setting which usually uses at least three instances for
2. RELATED WORKS each object [4], the one-shot setting, where only one instance
is available, is clearly more challenging.
Object counting methods can be briefly divided into two
It is worth mentioning that detection based approaches
types. Detection based methods [7] count the number of ob-
[20, 21, 22] are inferior for the tasks of few-shot and one-shot
jects by exhaustively detecting every target in images. But
counting. One main reason is that it requires extra and costly
they rely on the complex labels such as bounding boxes. Re-
bounding-box annotations of all instances in the training stage
gression based methods [1, 2] learn to count by predicting a
while one-shot counting approach which we focus on depends
density map, in which each value represents the density of
on dot annotations and only one supporting box. To illus-
target objects at the corresponding location. The count pre-
trate this point further, we perform experiments in Section 4.3
diction equals to the total sum of density map.
to compare with detection based approaches and validate the
Nevertheless, most of the counting methods are category
proposed network for one-shot counting.
specifically, e.g. for human crowd [1, 2, 8, 9, 10, 11], for
cars [3, 12], for plants [13] or for cells [14, 15]. They focus
on only one category and will loss the original satisfied per- 3. APPROACH
formance when transferring to other categories. Moreover,
most traditional approaches usually rely on tens of thousands 3.1. Problem Definition
of instances to train a counting model [2, 8, 9, 11, 3, 12].
To reduce considerably the number of samples needed to One-shot object counting consists of a training set
train a counting model for a particular category, recently, few- (It , st , yt ) ∈ T and a query set (Iq , sq ) ∈ Q, in which cate-
shot counting task has been developed. The key lies in the gories are mutually exclusive. Each input for the model con-
generalization ability of the model to deal with novel cate- tains an image I and a supporting bounding box s annotating
gories from few labeled examples. The study [16] proposes a one object of the desired category. In training set, abundant
point annotations yt are available to supervise the model. In Previous few-shot counting methods [4, 5] usually adopt
inference stage, we aim the model to learn to count the novel a convolution operation where the supporting features act as
objects in Iq with a supporting category instance sampled by kernels to match the similarities for target category. However,
sq . the results will greatly depend on the quality of supporting
features and the consistency of objects’ properties, including
3.2. Feature Correlation rotations and scales.
To this end, we propose a Correlative-Attention module
As the model is required to learn to count from only one sup- to learn inter-relations between query and supporting features
porting object, seizing the correlation between features with and alleviate the constraints of irrelevant properties.
high efficiency is quite important. Therefore, we build the Specifically, we extend the MA by learning correlations
feature correlation model in our one-shot network based on between different feature sequences and add a feed-forward
Self-Attention and Correlative-Attention modules, for learn- network (FFN) to fuse the features, i.e.,
ing the inner-relations and inter-relations respectively.
As illustrated in Figure 1 (violet block), our Self- X ∗ = Corr(X̃, S̃) = G(M A(X̃Q , S̃K , S̃V ) + X̃). (4)
Attention module consists of a Multi-head Attention (MA) G includes two LNs and a FFN in the form of residual (light
and a layer normalization (LN). We first introduce the defi- blue block in Figure 1). Finally, X ∗ and S̃ will be fed into the
nition of attention [23], given the query Q, key K and value cycle as new feature sequences where each cycle consists of
vector V : two Self-Attention modules and a Correlative-Attention mod-
(QW Q )(KW K )T ule.
A(Q, K, V | W ) = S( √ + P E)(V W V ),
d
(1) 3.3. Feature Extraction and Scale Aggregation
where S is the softmax function and √1d is a scaling factor
To extract feature sequences from images, we use VGG-19 as
based on the vector dimension d. W : W Q , W K , W V ∈ our backbone. For query image, the output of the final level
Rd×d are weight matrices for projections and P E is the posi- is directly flattened and transmitted into Self-Attention mod-
tion embedding. ule. For the supporting box, as there are uncontrollable scale
To leverage on more representation subspaces, we adopt variations among instances due to the perspective, we pro-
the extending form with multi attention heads: pose a Scale Aggregation mechanism to fuse different scale
information.
M A(Q, K, V ) = Concat(head1 , .., headh )W O Given l as the number of layers in CNN, we aggregate the
(2)
where headi = A(Q, K, V | Wi ). feature maps among different scales:

The representation dimensions are divided by parallel atten- S = Concat(F l (s), F l−1 (s), ..., F l+1−δ (s)), (5)
tion heads, where parameter matrices Wi : WiQ , WiK , WiV ∈
Rd×d/h and WO ∈ Rd×d . where F i represents a feature map at ith level and δ ∈ [1, l]
One challenging problem in counting task is the existence decides the number of layers taken for aggregation.
of many complex interfering things. To efficiently weaken the Meanwhile, we leverage on identifying position embed-
negative influence by those irrelevant background, we apply ding to help the model distinguish the integrated scale in-
Multi-head Self-Attention in image features to learn inner- formation in attention model. By adopting the fixed sinu-
relations and encourage the model to focus more on repetitive soidal absolute position embedding [23], feature sequences
objects that can be counted. from different scales can still maintain the consistency be-
We denote the feature sequences of the query image and tween positions, i.e.,
the supporting box region as X and S, with sizes X ∈ P E(posj ,2i) = sin(posj /100002i/d ),
RHW ×C and S ∈ Rhw×C . And the refined query feature (6)
is calculated by: P E(posj ,2i+1) = cos(posj /100002i/d ).
i is the dimension and posj is the position for jth feature map.
X̃ = LN (M A(XQ , XK , XV ) + X). (3)

A layer normalization (LN) is adopted to balance the value 3.4. Training Loss
scales. We use Euclidean distance to measure the difference between
Meanwhile, as there is only one supporting object in one- estimated density map and ground truth density map, which is
shot counting problem, refining the salient features within the generated based on annotated points following [1]. The loss
object is necessary and helpful for counting efficiency and is defined as follows:
accuracy. Therefore we apply another Self-Attention module
to supporting feature and get refined S̃. LE = ||Dgt − D||22 , (7)
where D is the estimated density map and Dgt is the ground Val Test
Methods
truth density map. To improve the local pattern consistency, MAE RMSE MAE RMSE
we also adopt a SSIM loss followed the calculation in [8]. By 3-shot
integrating the above two loss functions, we have Mean 53.38 124.53 47.55 147.67
Median 48.68 129.70 47.73 152.46
L = LE + λLSSIM , (8) FR detector [25] 45.45 112.53 41.64 141.04
FSOD detector [26] 36.36 115.00 32.53 140.65
where λ is the balanced weight. GMN [16] 29.66 89.81 26.52 124.57
MAML [27] 25.54 79.44 24.90 112.68
FamNet [4] 23.75 69.07 22.08 99.54
4. EXPERIMENTS 1-shot
CFOCNet [5] 27.82 71.99 28.60 123.96
4.1. Implement Details and Evaluation Metrics FamNet [4] 26.55 77.01 26.76 110.95
LaoNet (Ours) 17.11 56.81 15.78 97.15
We design the density regressor by an upsampling layer and
three convolution layers with ReLU activation. The kernel
Table 1. Comparisons with previous state-of-the-art few-shot
sizes of first two layers are 3 × 3 and that of last is 1 × 1. Ran-
methods on FSC-147. The upper part of the table presents the
dom scaling and flipping are adopted for each training image.
results in 3-shot setting while the lower part presents 1-shot
Adam [24] with a learning rate 0.5 × 10−5 is used to optimize
results. FamNet [4] uses the adaptation strategy during test-
the parameters. We set the number of attention heads h as 4,
ing. It is worth noticing that our one-shot LaoNet outperforms
the correlation cycle T as 2, the number of aggregated layers
all of previous methods, even those in 3-shot setting, without
δ as 2, and the loss balanced parameter λ as 10−4 .
any fine-tuning strategy.
Mean Absolute Error (MAE) and Root Mean Squared Er-
ror (RMSE) are used to measure the performance of our meth-
ods. They are defined by:
M
1 X gt
M AE = N − Ni ,
M i=1 i
v (9)
u
u 1 XM
GT: 33 GT: 14 GT: 35
RM SE = t (N gt − Ni )2 ),
M i=1 i

where M and N gt are the number of images and the ground-

truth count, respectively. The predicted count N is calculated
by integrating the estimated density map D. Pre: 35 Pre: 14 Pre: 37

4.2. Datesets Fig. 2. Visualizations of one-shot counting inputs and cor-

responding predicted density maps. The model can perform
FSC-147 [4] contains a total of 6135 images collected for great counting accuracy even it has never seen strawberry, hot
few-shot counting problem. In each image, three randomly air balloon or cashew before.
selected object instances are annotated by bounding boxes
while other instances are annotated by points. 89 object cat-
egories with 3,659 images are divided for training set. Each
29 categories with 1,286 and 1,190 images respectively are methods specifically designed for one-shot counting, for com-
divided for validation and testing sets. prehensive evaluation, we modify FamNet [4] and CFOC-
MS-COCO [6] is a large dataset widely used in object detec- Net [5] for this setting and also compare with other few-shot
tion and instance segmentation. In val2017 set, there are 80 counting approaches [25, 26, 16, 27, 17].
common object categories with 5,000 images in complex ev- First, quantitative results on FSC-147 are shown in Ta-
eryday scenes. We follow [17] to generate four train/test splits ble 1. We list seven results of previous few-shot detection and
which each contains 60 training and 20 testing categories. counting methods in 3-shot setting and two results of state-
of-the-art counting methods in 1-shot setting for comparison.
The result of FamNet [4] uses the adaptation strategy during
4.3. Comparison with Few-Shot Approaches
testing.
We hold experiments on above two few-shot counting datasets It is worth noticing that our one-shot LaoNet outperforms
to evaluate the proposed network. As there are few existing all of previous few-shot methods, even those in 3 shot set-
Fold 0 Fold 1 Fold 2 Fold 3 Average
Methods
MAE RMSE MAE RMSE MAE RMSE MAE RMSE MAE RMSE
Segment [17]† 2.91 4.20 2.47 3.67 2.64 3.79 2.82 4.09 2.71 3.94
GMN [16]† 2.97 4.02 3.39 4.56 3.00 3.94 3.30 4.40 3.17 4.23
CFOCNet [5]† 2.24 3.50 1.78 2.90 2.66 3.82 2.16 3.27 2.21 3.37
FamNet [4] 2.34 3.78 1.41 2.85 2.40 2.75 2.27 3.66 2.11 3.26
CFOCNet [5] 2.23 4.04 1.62 2.72 1.83 3.02 2.13 3.03 1.95 3.20
LaoNet (Ours) 2.20 3.78 1.32 2.66 1.58 2.19 1.84 2.90 1.73 2.93

Table 2. Results on each of four folds of COCO val2017. Methods with † follow the experiment setting in [5]. Our method
achieves great accuracy without any fine-tuning on testing categories.

Val Test FSC147-COCO Val FSC147-COCO Test

Methods Methods
MAE RMSE MAE RMSE MAE RMSE MAE RMSE
LaoNet 17.11 56.81 15.78 97.15 RetinaNet [20] 63.57 174.36 52.67 85.86
− Self-Attention (X) 19.83 64.84 19.71 107.32 Faster R-CNN [21] 52.79 172.46 36.20 79.59
− Self-Attention (S) 19.67 63.79 18.71 111.83 Mask R-CNN [22] 52.51 172.21 35.56 80.00
− Scale Aggregation 18.82 63.74 17.16 106.40 FamNet [4] 39.82 108.13 22.76 45.92
− SSIM 17.82 57.66 16.11 100.59 LaoNet (Ours) 31.12 97.15 12.89 26.64

Table 3. Ablation study for different terms. X stands for Table 4. Comparisons with pre-trained object detectors on
feature sequences of query image and S stands for that of FSC147-COCO splits of FSC147 which contain images with
supporting box region. Experiments are performed in FSC- COCO categories. Even pre-trained with thousands of anno-
147 val and test. tated examples on MS-COCO dataset, these object detectors
still perform unsatisfied accuracy on counting task.

ting, without any fine-tuning strategy. We have generated new

records by reducing the error of FamNet from 26.55 to 17.11
The result demonstrates a robustness contribution under the
for MAE and from 77.01 to 56.81 for RMSE in validation set,
multi-scale aggregation. Finally, the SSIM loss further im-
from 26.76 to 15.78 for MAE and from 110.95 to 97.15 for
proves the counting accuracy by both lower MAE and RMSE.
RMSE in testing set.
Second, Table 2 shows the results on each of four folds of Convergence Speed. We hold experiments to measure the
COCO val2017. Methods with † in the upper part of the table convergence speed and the performance stability. We pick
follow the experiment setting in [5]. That is, the supporting FamNet [4] as the baseline for LaoNet with a pre-trained
examples are chosen from all instances in the dataset during CNN backbone and an Adam optimizer. We train both two
training and testing, which is laborious and costly under the models on FSC-147 and report the validation MAE for 100
need of all instances annotated by bounding boxes. While our epochs.
setting allows only one fixed instance for each image, we re- As shown in Figure 3, our model has faster convergence
conduct the experiment of CFOCNet [5]. As the result shows, speed and better stability than FamNet. With just 2 epoches,
our method maintains a great performance on COCO dataset. our method achieves a low counting error which FamNet has
to reach after 40 epochs. Meanwhile, the convergence of our
method is smooth and stable, while that of Famet is jagged,
4.4. Discussions with multiple sharp peaks and the highest error of 70.
Contribution of Different Terms. We study the accuracy Comparison with Object Detectors. Object detectors can
contributions of different terms in FSC-147. The result is be used for counting task with the number of predicted de-
shown in Table 3, each row whereof reports the results af- tections. However, even these detectors work with categories
ter removing one component or one term from LaoNet. The which they are trained on instead of one-shot setting, their
Self-Attention modules for the two feature sequences to learn counting performances are still limited. We select images of
inner-relations increase the accuracy in testing set by 19.9% FSC-147-COCO subset from FSC147 Val and Test sets which
and 15.7% for MAE, 9.5% and 13.1% for RMSE, respec- share categories with MS-COCO dataset and conduct quanti-
tively. Compared to other two terms, the Self-Attention mod- tative experiments.
ules contribute most to the performance of our model. As the results shown in Table 4, we compare LaoNet with
The Scale Aggregation mechanism helps more on RMSE. several object detectors which are well pre-trained with thou-
[7] Prithvijit Chattopadhyay, Ramakrishna Vedantam,
Ramprasaath R Selvaraju, Dhruv Batra, and Devi
Parikh, “Counting everyday objects in everyday
scenes,” in CVPR, 2017, pp. 1135–1144.
[8] Xinkun Cao, Zhipeng Wang, Yanyun Zhao, and Fei Su,
“Scale aggregation network for accurate and efficient
Fig. 3. Comparisons of validation MAE during training. The crowd counting,” in ECCV, 2018.
blue line represents our proposed LaoNet. With just one [9] Weizhe Liu, Mathieu Salzmann, and Pascal Fua,
epoch, it can perform a great accuracy which FamNet needs “Context-aware crowd counting,” in CVPR, 2019.
to train for about 20 epochs. [10] Boyu Wang, Huidong Liu, Dimitris Samaras, and
Minh Hoai Nguyen, “Distribution matching for crowd
counting,” Advances in Neural Information Processing
sands of annotated examples on MS-COCO. Nevertheless,
Systems, vol. 33, 2020.
our method, which counts unseen categories, still outperforms
[11] Hui Lin, Xiaopeng Hong, Zhiheng Ma, Xing Wei, Yun-
the detection based methods which have met those categories
feng Qiu, Yaowei Wang, and Yihong Gong, “Direct
in training, by a large margin.
measure matching for crowd counting,” in IJCAI, 2021.
[12] Thomas Moranduzzo and Farid Melgani, “Automatic
5. CONCLUSION car counting method for unmanned aerial vehicle im-
ages,” TGRS, 2013.
This paper targets one-shot object counting, which requires [13] Mélissande Machefer, François Lemarchand, Vir-
the counting model to count objects of new categories by ginie Bonnefond, Alasdair Hitchins, and Panagiotis
looking at only one instance. We propose an efficient network Sidiropoulos, “Mask r-cnn refitting strategy for plant
named LaoNet to address this challenge. LaoNet includes counting and sizing in uav imagery,” Remote Sensing,
a feature correlation module to learn both inner-relations 2020.
and inter-relations and a scale aggregation module to extract [14] Thorsten Falk, Dominic Mai, Robert Bensch, Özgün
multi-scale information for improving robustness. Without Çiçek, Ahmed Abdulkadir, Yassine Marrakchi, Anton
any fine-tuning in inference, our LaoNet outperforms previ- Böhm, Jan Deubner, Zoe Jäckel, Katharina Seiwald,
ous state-of-the-art few-shot counting methods with a high et al., “U-net: deep learning for cell counting, detec-
convergence speed. In the future, we consider applying our tion, and morphometry,” Nature methods, 2019.
model to a wider range of one-shot vision tasks. [15] Weidi Xie, J Alison Noble, and Andrew Zisserman,
“Microscopy cell counting and detection with fully con-
6. REFERENCES volutional regression networks,” Computer methods in
biomechanics and biomedical engineering: Imaging &
[1] Yingying Zhang, Desen Zhou, Siqin Chen, Shenghua Visualization, 2018.
Gao, and Yi Ma, “Single-image crowd counting via [16] Erika Lu, Weidi Xie, and Andrew Zisserman, “Class-
multi-column convolutional neural network,” in CVPR, agnostic counting,” in ACCV, 2018.
2016. [17] Claudio Michaelis, Ivan Ustyuzhaninov, Matthias
[2] Zhiheng Ma, Xing Wei, Xiaopeng Hong, and Yihong Bethge, and Alexander S Ecker, “One-shot instance seg-
Gong, “Bayesian loss for crowd count estimation with mentation,” arXiv preprint, 2018.
point supervision,” in ICCV, 2019. [18] Ting-I Hsieh, Yi-Chen Lo, Hwann-Tzong Chen, and
[3] Debojit Biswas, Hongbo Su, Chengyi Wang, Jason Tyng-Luh Liu, “One-shot object detection with co-
Blankenship, and Aleksandar Stevanovic, “An auto- attention and co-excitation,” in NIPS, 2019.
matic car counting system using overfeat framework,” [19] Xiang Li, Lin Zhang, Yau Pun Chen, Yu-Wing Tai, and
Sensors (Basel), 2017. Chi-Keung Tang, “One-shot object detection without
[4] Viresh Ranjan, Udbhav Sharma, Thu Nguyen, and Minh fine-tuning,” arXiv preprint, 2020.
Hoai, “Learning to count everything,” in CVPR, 2021. [20] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He,
[5] Shuo-Diao Yang, Hung-Ting Su, Winston H Hsu, and and Piotr Dollár, “Focal loss for dense object detection,”
Wen-Chin Chen, “Class-agnostic few-shot object count- in ICCV, 2017.
ing,” in WACV, 2021. [21] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian
[6] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Sun, “Faster r-cnn: Towards real-time object detection
Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and with region proposal networks,” NIPS, 2015.
C Lawrence Zitnick, “Microsoft coco: Common objects [22] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross
in context,” in ECCV. Springer, 2014. Girshick, “Mask r-cnn,” in ICCV, 2017.
[23] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser,
and Illia Polosukhin, “Attention is all you need,” in
NIPS, 2017.
[24] Diederik P Kingma and Jimmy Lei Ba, “Adam:
Amethod for stochastic optimization,” .
[25] Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi
Feng, and Trevor Darrell, “Few-shot object detection
via feature reweighting,” in ICCV, 2019.
[26] Qi Fan, Wei Zhuo, Chi-Keung Tang, and Yu-Wing Tai,
“Few-shot object detection with attention-rpn and multi-
relation detector,” in CVPR, 2020.
[27] Chelsea Finn, Pieter Abbeel, and Sergey Levine,
“Model-agnostic meta-learning for fast adaptation of
deep networks,” in ICML, 2017.

Karina e Zindagi - Hindi 1
50% (2)
Karina e Zindagi - Hindi 1
229 pages
Learning To Count Everything
No ratings yet
Learning To Count Everything
10 pages
2023 Few-Shot Object Counting With Similarity-Aware Feature Enhancement WACV 2023 Paper
No ratings yet
2023 Few-Shot Object Counting With Similarity-Aware Feature Enhancement WACV 2023 Paper
10 pages
2022 Class
No ratings yet
2022 Class
16 pages
An Accurate Car Counting in Aerial Images Based On Convolutional Neural Networks 2023
No ratings yet
An Accurate Car Counting in Aerial Images Based On Convolutional Neural Networks 2023
10 pages
Shi Training-Free Object Counting With Prompts WACV 2024 Paper
No ratings yet
Shi Training-Free Object Counting With Prompts WACV 2024 Paper
9 pages
Countr: Transformer-Based Generalised Visual Counting
No ratings yet
Countr: Transformer-Based Generalised Visual Counting
16 pages
2022 Few
No ratings yet
2022 Few
18 pages
2022 Exemplar Free Class Agnostic Counting ACCV 2022 Paper
No ratings yet
2022 Exemplar Free Class Agnostic Counting ACCV 2022 Paper
17 pages
People Counting in Crowd Faster R-CNN
No ratings yet
People Counting in Crowd Faster R-CNN
9 pages
Traffic Counting in A Single Surveillance Image
No ratings yet
Traffic Counting in A Single Surveillance Image
4 pages
Learning Scalable Omni Scale Distribu 2025 Journal of Visual Communication A
No ratings yet
Learning Scalable Omni Scale Distribu 2025 Journal of Visual Communication A
11 pages
Object Counting
No ratings yet
Object Counting
15 pages
Peerj Cs 2470 - New
No ratings yet
Peerj Cs 2470 - New
23 pages
CVPR2019 Residual Regression With Semantic Prior For Crowd Counting
No ratings yet
CVPR2019 Residual Regression With Semantic Prior For Crowd Counting
10 pages
Few-Shot Object Detection On Remote Sensing Images
No ratings yet
Few-Shot Object Detection On Remote Sensing Images
14 pages
JETIR2209375
No ratings yet
JETIR2209375
6 pages
2022 Represent Compare and Learn A Similarity-Aware Framework For Class-Agnostic Counting CVPR 2022 Paper
No ratings yet
2022 Represent Compare and Learn A Similarity-Aware Framework For Class-Agnostic Counting CVPR 2022 Paper
10 pages
CNN-based Density Estimation and Crowd Counting: A Survey
No ratings yet
CNN-based Density Estimation and Crowd Counting: A Survey
25 pages
CentroidNetV2 A Hybrid Deep Neural Network For Small-Object Segmentation and Counting 2021
No ratings yet
CentroidNetV2 A Hybrid Deep Neural Network For Small-Object Segmentation and Counting 2021
16 pages
Deep Crowd Counting in Congested Scenes Through Refine Modules
No ratings yet
Deep Crowd Counting in Congested Scenes Through Refine Modules
10 pages
Generalized Focal Loss Towards Efficient Representation Learning For Dense Object Detection
No ratings yet
Generalized Focal Loss Towards Efficient Representation Learning For Dense Object Detection
15 pages
A Survey On Deep Learning-Based Single Image Crowd Counting - Network Design, Loss Function and Supervisory Signal
No ratings yet
A Survey On Deep Learning-Based Single Image Crowd Counting - Network Design, Loss Function and Supervisory Signal
23 pages
Object Counting and Density Calculation Using Matlab
No ratings yet
Object Counting and Density Calculation Using Matlab
46 pages
Object Counting and Density Calculation Using Matlab: For More Information Contact
No ratings yet
Object Counting and Density Calculation Using Matlab: For More Information Contact
46 pages
Image-Based Learning To Measure Traffic Density Using A Deep Convolutional Neural Network 2018
No ratings yet
Image-Based Learning To Measure Traffic Density Using A Deep Convolutional Neural Network 2018
6 pages
Few Shot Reweighting
No ratings yet
Few Shot Reweighting
12 pages
Fnins 18 1349204
No ratings yet
Fnins 18 1349204
10 pages
Person Head Detection Based Deep Model For People Counting in Sports Videos
No ratings yet
Person Head Detection Based Deep Model For People Counting in Sports Videos
8 pages
Dip Project
No ratings yet
Dip Project
36 pages
Single-Image Crowd Counting Via Multi-Column Convolutional Neural Network 2016
No ratings yet
Single-Image Crowd Counting Via Multi-Column Convolutional Neural Network 2016
9 pages
4 Sindagi Generating High-Quality Crowd ICCV 2017 Paper
No ratings yet
4 Sindagi Generating High-Quality Crowd ICCV 2017 Paper
10 pages
DeFRCN Decoupled Faster R-CNN For Few-Shot Object Detection
No ratings yet
DeFRCN Decoupled Faster R-CNN For Few-Shot Object Detection
17 pages
SSD Single Shot MultiBox Detector
No ratings yet
SSD Single Shot MultiBox Detector
10 pages
CCF-Net: A Cascade Center-Based Framework Towards E Cient Human Parts Detection
No ratings yet
CCF-Net: A Cascade Center-Based Framework Towards E Cient Human Parts Detection
13 pages
Few-Shot Object Detection A Comprehensive Survey
No ratings yet
Few-Shot Object Detection A Comprehensive Survey
21 pages
Object Detection Using Deep Learning Approach
100% (1)
Object Detection Using Deep Learning Approach
9 pages
Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
No ratings yet
Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
6 pages
A Benchmark For Multi-Class Object Counting and
No ratings yet
A Benchmark For Multi-Class Object Counting and
36 pages
Sayali
No ratings yet
Sayali
7 pages
Meta RCNN
No ratings yet
Meta RCNN
9 pages
Ultra Fast Object Counting Based-On Cellular Neural Network 2006
No ratings yet
Ultra Fast Object Counting Based-On Cellular Neural Network 2006
4 pages
Agarwal Contrastive Learning of Semantic Concepts For Open-Set Cross-Domain Retrieval WACV 2023 Paper
No ratings yet
Agarwal Contrastive Learning of Semantic Concepts For Open-Set Cross-Domain Retrieval WACV 2023 Paper
10 pages
Attention and Feature Fusion SSD For Remote Sensing Object Detection
No ratings yet
Attention and Feature Fusion SSD For Remote Sensing Object Detection
9 pages
Project Detecto!: A Real-Time Object Detection Model
No ratings yet
Project Detecto!: A Real-Time Object Detection Model
3 pages
Report Eigenface 2012
No ratings yet
Report Eigenface 2012
9 pages
Real-Time Object Detection Using Deep Learning: Journal of Advances in Mathematics and Computer Science June 2023
No ratings yet
Real-Time Object Detection Using Deep Learning: Journal of Advances in Mathematics and Computer Science June 2023
10 pages
Yang QueryDet Cascaded Sparse Query For Accelerating High-Resolution Small Object Detection CVPR 2022 Paper
No ratings yet
Yang QueryDet Cascaded Sparse Query For Accelerating High-Resolution Small Object Detection CVPR 2022 Paper
10 pages
Advances in Convolution Neural Networks Based Crowd Counting and Density Estimation 2021
No ratings yet
Advances in Convolution Neural Networks Based Crowd Counting and Density Estimation 2021
21 pages
Applsci 12 07825
No ratings yet
Applsci 12 07825
23 pages
DL Unit-5
No ratings yet
DL Unit-5
34 pages
B.E Cse Batchno 66
No ratings yet
B.E Cse Batchno 66
52 pages
IET Image Processing - 2018 - Ma - Scene Invariant Crowd Counting Using Multi Scales Head Detection in Video Surveillance
No ratings yet
IET Image Processing - 2018 - Ma - Scene Invariant Crowd Counting Using Multi Scales Head Detection in Video Surveillance
7 pages
Data Science Interview Questions 1
No ratings yet
Data Science Interview Questions 1
15 pages
Counting Individuals in An Image Using Machine Learning Technique
No ratings yet
Counting Individuals in An Image Using Machine Learning Technique
5 pages
Objectdetection
No ratings yet
Objectdetection
7 pages
Efficient Few-Shot Object Detection Via Knowledge Inheritance
No ratings yet
Efficient Few-Shot Object Detection Via Knowledge Inheritance
16 pages
Mini Project Synopsis
No ratings yet
Mini Project Synopsis
6 pages
Object Counting and Instance Segmentation With Image-Level Supervision
No ratings yet
Object Counting and Instance Segmentation With Image-Level Supervision
9 pages
cs224n Practice Midterm 3 Sol
No ratings yet
cs224n Practice Midterm 3 Sol
14 pages
Soft Defined Radio Report
No ratings yet
Soft Defined Radio Report
10 pages
TX Ty TZ Ty TZ TX TZ TX Ty Matlab Code:: % Symbolizing The Variables
No ratings yet
TX Ty TZ Ty TZ TX TZ TX Ty Matlab Code:: % Symbolizing The Variables
3 pages
Documents: Search Books, Presentatio
No ratings yet
Documents: Search Books, Presentatio
14 pages
S4 2023 Add Ons
No ratings yet
S4 2023 Add Ons
8 pages
Ibm Devops and Software Engineering: Sahish Pandav
No ratings yet
Ibm Devops and Software Engineering: Sahish Pandav
1 page
Activity Management Process Map
100% (2)
Activity Management Process Map
7 pages
Sap Abap Guide
No ratings yet
Sap Abap Guide
30 pages
5 Best Voicemail Greeting Examples For 2022 Tip
No ratings yet
5 Best Voicemail Greeting Examples For 2022 Tip
1 page
214 Implementing Dark Mode On Ios
No ratings yet
214 Implementing Dark Mode On Ios
142 pages
Algorithms and Flowcharts
No ratings yet
Algorithms and Flowcharts
31 pages
Wasim Mohammed CV New
No ratings yet
Wasim Mohammed CV New
5 pages
SLC Free 2 Manual: Warning
No ratings yet
SLC Free 2 Manual: Warning
4 pages
Modeling Transformer Differential Protection With Harmonic Restraint
No ratings yet
Modeling Transformer Differential Protection With Harmonic Restraint
6 pages
Avaya CMS Maintaining and Troubleshooting 19.2 March 2021
No ratings yet
Avaya CMS Maintaining and Troubleshooting 19.2 March 2021
308 pages
MCA NewProgrammeGuideCRCFinalJanuary2022
No ratings yet
MCA NewProgrammeGuideCRCFinalJanuary2022
81 pages
STS Module 2
No ratings yet
STS Module 2
8 pages
CS168: The Modern Algorithmic Toolbox Lecture #3: Similarity Metrics and Kd-Trees
No ratings yet
CS168: The Modern Algorithmic Toolbox Lecture #3: Similarity Metrics and Kd-Trees
6 pages
Project Report On Golden Bricks
No ratings yet
Project Report On Golden Bricks
29 pages
Addition of Matrices
No ratings yet
Addition of Matrices
4 pages
Portable Bluetooth Speaker: Service Manual
No ratings yet
Portable Bluetooth Speaker: Service Manual
45 pages
Orla CDP45 - PDF
No ratings yet
Orla CDP45 - PDF
85 pages
Principles of Communication Syllabus
No ratings yet
Principles of Communication Syllabus
1 page
1700 Animated 3
100% (1)
1700 Animated 3
143 pages
Bio 11 Syllabus
No ratings yet
Bio 11 Syllabus
4 pages
12 Issue Akira
100% (1)
12 Issue Akira
20 pages
Iphone 14 Plus
No ratings yet
Iphone 14 Plus
1 page
Holiday Homework
No ratings yet
Holiday Homework
22 pages
"Dorothy Meets The Scarecrow" : English
No ratings yet
"Dorothy Meets The Scarecrow" : English
25 pages

Object Counting Yolo

Uploaded by

Object Counting Yolo

Uploaded by

OBJECT COUNTING: YOU ONLY NEED TO LOOK AT ONE

Hui LIN, Xiaopeng HONG, Yabin WANG

School of Cyber Science and Engineering, Xi’an Jiaotong University, China

ABSTRACT several categories exist within a same image. Moreover in

few-shot setting, these categories will not overlap between

where M and N gt are the number of images and the ground-

4.2. Datesets Fig. 2. Visualizations of one-shot counting inputs and cor-

Val Test FSC147-COCO Val FSC147-COCO Test

ting, without any fine-tuning strategy. We have generated new

You might also like