Guangjian Zheng, Min Tan, Jun Yu, Qing Wu, Jianping Fan

Uploaded by

rekka mastouri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views6 pages

Guangjian Zheng, Min Tan, Jun Yu, Qing Wu, Jianping Fan

Uploaded by

rekka mastouri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Proceedings of the IEEE International Conference on Multimedia and Expo (ICME) 2017 10-14 July 2017

FINE-GRAINED IMAGE RECOGNITION VIA WEAKLY SUPERVISED CLICK DATA

GUIDED BILINEAR CNN MODEL

Guangjian Zheng1 , Min Tan1 , Jun Yu1 , Qing Wu1 , Jianping Fan2
1
Key Laboratory of Complex Systems Modeling and Simulation,
School of Computer Science and Technology, Hangzhou Dianzi University
2
Department of Computer Science, University of North Carolina at Charlotte

ABSTRACT distinctive object parts learned from visual features [6, 7] or

properties extracted from human annotations [8]. This way is
Bilinear convolutional neural networks (BCNN) model, the
impracticable since it always requires time-consuming human
state-of-the-art in fine-grained image recognition, fails in dis-
annotations. To deal with this problem, Microsoft released [9]
tinguishing the categories with subtle visual differences. We
a large-scale real-world image click data to public, which is
design a novel BCNN model guided by user click data (C-
obtained from the click log of the commercial search engines.
BCNN) to improve the performance via capturing both the
It consists of three parts: queries, clicked images, and the
visual and semantical content in images. Specially, to deal
corresponding click count. With click data, each image can
with the heavy noise in large-scale click data, we propose a
be represented as the click count vector based on its clicked
weakly supervised learning approach to learn the C-BCNN,
query set, namely query-click feature. Compared with visual
namely W-C-BCNN. It can automatically weight the training
features, the click feature not only easily describes image se-
images based on their reliability. Extensive experiments are
mantics, but also be invariant to changes of image conditions.
conducted on the public Clickture-Dog dataset. It shows that:
Owing to these advantages, we propose to combine the click
(1) integrating CNN with click feature largely improves the
feature with the BCNN feature to jointly represent images.
performance; (2) both the click data and visual consistency
Though with powerful representation ability, the click fea-
can help to model image reliability. Moreover, the method
ture brings some new challenges in image recognition: 1) how
can be easily customized to medical image recognition. Our
to integrate the click feature with the sophisticated BCNN
model performs much better than conventional BCNN mod-
model; 2) how to determine whether an image/query is re-
els on both the Clickture-Dog and medical image dataset.
liable to learn the image recognition model, given that there
Index Terms— Fine-grained Image Recognition, User are many noisy images/queries in user click data? In this pa-
Click Data, Bilinear CNN, Weakly Supervised Learning per, we propose a click data guided BCNN model for fine-
grained image recognition. Specially, to deal with the heavy
1. INTRODUCTION noise, we introduce a variable to represent the image reliabil-
ity and present a weakly supervised learning method to itera-
Fine-grained image recognition aims to distinguish ob- tively learn the reliability variable and BCNN model. Images
jects in subordinate classes, e.g. bird species [1, 2], dog with higher reliability will contribute more in the training.
breeds [3], plants [4]. It is very challenging due to the small We conduct extensive comparisons and validations on
visual differences among different categories. Even the deep both a public Clickture-Dog and medical image dataset,
learning model, the state-of-the-art model in vision systems, which demonstrates the advantages of our method.
fails in this task. Recently, Lin et.al. [5] proposed a bilinear The contributions of this work are three-folds:
convolutional neural networks (BCNN) that use two separate
feature extractors by convolutional neural networks (CNN) to • The paper is the FIRST TIME to construct a deep net-
jointly represent fine-grained images. It is proved to be one work (C-BCNN) with the combined visual and click
of the best fine-grained recognition models owing to the local feature. Compared with traditional BCNN, C-BCNN
pairwise feature interactions. Though BCNN achieves high greatly improves the fine-grained image recognition.
accuracy, the semantic gap remains a challenge.
Designing powerful semantical feature is a good choice • We propose a weakly supervised model W-C-BCNN to
to help to bridge this gap. Many researchers propose to learn automatically weight training images in learning the C-
attribute-related image features, and the attributes are either BCNN model. Image reliability and C-BCNN model
are iteratively learned. Our model is capable of dealing
Min Tan is the corresponding author. Email: [email protected]. with large-scale images with heavy noise.

978-1-5090-6067-2/17/$31.00 2017
c IEEE
978-1-5090-6067-2/17/$31.00 ©2017 IEEE ICME 2017

661
• We present an efficient optimization to iteratively learn images/queries obtained from user click data are extremely
both the C-BCNN model and image reliability. Image noisy, we have interest in the answer to the question: how
reliability (weight) is optimized via solving a softmax- to determine whether an image/query is reliable to learn the
loss-based quadratic programming. image recognition model, given that there are many noisy im-
ages/queries? Intuitively, images of higher quality should be
2. OUR METHOD more reliable and contribute more in the training than that of
lower quality. To address these issues, we introduce a vari-
We present a novel Weakly supervised user Click data able to characterize each image’s reliability, and propose a
guided Bilinear CNN (W-C-BCNN), where an image is rep- method to iteratively learn both the C-BCNN model and im-
resented as the combined deep visual and semantical feature. age reliability. Fig. 1 illustrates our pipeline for fine-grained
We firstly review the classical BCNN model proposed in [5], image recognition. In the following sections, we will show
and then illustrate our model, including the model structure our model and its optimization in detail.
and weakly supervised learning procedure.
2.2.1. C-BCNN Structure
2.1. Review of Classical BCNN Model Our C-BCNN is constructed based on [5]. Fig. 2 illus-
A BCNN model [5] consists of two CNN feature extrac- trates the structure of the classical BCNN and our C-BCNN
tors, whose outputs are multiplied using outer product at each respectively. The main differences lie in the feature concate-
location of an image for image representation. Fig. 2(a) shows nating layer behind the 2 normalization layer. It is designed
the model structure of a classical BCNN model. It is particu- for integrating the CNN feature with semantical feature. More
larly useful for fine-grained categorization since it can model specifically, the normalized BCNN vector z is passed through
local pairwise feature interactions in a translation invariant a feature concatenating layer, generating a combined feature
manner. When it is used for a classification task, the BCNN vector of oi ← [zi , τ ui ]. Here, zi and ui are the deep visual
model B is defined as a quadruple B = (fA , fB , P, C), where and semantical feature for image i, and τ denotes the weight
fA , fB are two CNN feature functions, P is a pooling func- of click feature in the combined feature.
tion and C is a classification function. This BCNN extractors We employ the user click data to construct the semantical
the deep visual φ for an image I as below: feature ui , and each image is represented as a click feature
vector by concatenating the click count for each query. As the
φ (I) = bilinear(l, I, fA , fB ), (1) query set obtained from the large-scale click data is extremely
l∈L
huge and redundant, we merge queries with similar semantics,
where bilinear(l, I, fA , fB ) = fA (l, I)T fB (l, I) is the bi- and represent each image as a click feature vector based on
linear feature combination of fA and fB at each location each query cluster instead of original queries:
l ∈ L. The mapping function f : I × L → Rc×D out-
puts a feature vector of size c × D for image I at locations L. ui = ( ci,j , ci,j , ..., ci,j ), (2)
j∈G1 j∈G2 j∈G£
For classification tasks, the function C is trained using image
where Gj is the index set for the j-th query cluster.
features φ . Note that φ is a high-dimensional feature vector.
For example, when fA and fB extract features of size C × M
and C × N respectively, φ is a feature vector with M × N 2.2.2. The Weakly Supervised Learning of C-BCNN
dimensions. The following classification function C is trained Given n training data (Ii , yi ), where yi ∈ [1, 2, ..., N ] de-
on the reshaped feature of size M N × 1. notes the category label, the parameters θ for the C-BCNN
To obtain an improvedperformance, the signed square- model B is learned by solving the following weakly super-
root step y ← sign(x) |x| and 2 normalization z ← vised C-BCNN learning (W-C-BCNN) problem:
y/y2 are conducted on x = φ (I), which is used as the
input for softmax classification layer. Afterwards, an end-to-
end training is applied to learn the BCNN model [5]. (θθ ∗ , w∗ )
C
n
1 2
= argmin θθ 2 + wi (yi , oi )
2.2. Our Method θ ,w 2 n i=1
⎧+
αP (w) + βS(G, w) (3)
The BCNN model distinguishes object only by visual fea- n
⎨ i=1 wi = n N
ture, therefore the subtle visual differences among categories s.t. (yi , oi ) = − log(eoyi / j=1 eoj )
remain a big challenge. We design a novel C-BCNN model ⎩
wi > 0, ∀i,
to simultaneously extract the deep visual zi and semantical
feature ui for image xi . where wi represents the reliability for sample i, (yi , oi ) is the
We employ the user click data to construct a semantical softmax-loss for Ii , and oi is the combined BCNN feature zi
feature ui for each image based on its clicked queries. As with semantical feature ui .

662
Learning C-BCNN model where T (·) is a transformation scaling the range of u to han-
Sample Weight dle the large unbalance of click counts among images.
weight learning Smoothness Assumption. The smooth term is construct-
Sample loss
Training Sample Deep C-BCNN
Similarity graph
ed based on visual consistency. It is assumed that images sim-
samples re-weighting learning model
Weight prior
ilar in visually should be assigned with similar weight. Based
on this assumption, we possess a graph regularization based
Testing samples Final result [12] smooth term using similarity based adjacency graph G.
The BCNN visual feature z is used to measure the similarity
Testing
and construct G as follows:
Fig. 1. Pipeline of image recognition via click data guid-
2
ed BCNN (C-BCNN) model with weakly supervised train- S(G, w) = ∀i,j∈χk gi,j (wi −wj ) /2
(6)
ing procedure. In the training phase, it iteratively learns a C- gi,j = sim(xi , xj ) = exp (− zi − zj ).
BCNN model and the image weight. During testing, we rec-
ognize each testing image using the learned C-BCNN model. 2.2.3. Optimization

6RIWPD[
As (3) is a complicated non-convex problem, it is hard to
find the convergent global optimal solution. We utilize the al-
'URSRXW.HHS
ternation to achieve the local optimal, and find the solution for
6RIWPD[ )LOWHUFRQFDW one variable in each turn with others being fixed. We obtain
/QRUPDOL]DWLRQ /QRUPDOL]DWLRQ &OLFNIHDWXUH
the optimal solution via iteratively optimizing two steps: 1)
fix each wi , and solve a weighted C-BCNN problem to obtain
6TXDUHURRW 6TXDUHURRW
θ ; 2) fix each θ , and solve wi using quadratic programming.
%LOLQHDUYHFWRU %LOLQHDUYHFWRU This procedure is shown in Fig. 1.
,QSXWLPDJH ,QSXWLPDJH
Learning θ . Similar to [5], θ can be trained by back-
propagating the gradients of the classification loss (e.g. con-
(a) Original [5] (b) Proposed
ditional log-likelihood). Let d/dx be the gradient of the loss
function to x, then by chain rule of gradients we solve the
Fig. 2. Visualization of the two BCNNs models. The biggest two deep network A and B by:
difference (plotted in red) between our C-BCNN and the o-
riginal BCNN lies in the combination of BCNN feature with d d d d
= B( )T , =A , (7)
user click feature. dA dx dB dx
d do dz dy
d
where dx = do dz dy dx .
Since the reliability variable w in the training is unknown Learning w. With fixed BCNN model θ , we re-construct
in advance and should be estimated during training, we em- G by (6) and solve w by the following problem:
ploy a weakly supervised training method [10] to solve (3),
wherein both the reliability variables and C-BCNN model are
C
n
2
iteratively learned in each iteration. The sample weight mod- w∗ = argmin wi li + α w − wc 2
el is constructed by click data based weight prior and visual w n i=1
consistency based smooth constrains. 1 2
+ β gi,j (wi − wj ) (8)
Weight Prior. The weight prior term P (w) possesses a 2
∀s,t ∀i,j∈As,t
regularization constrain. Recently, Tan et al. proposed to dis- T
criminatively train a binary SVM classifier to estimate weight, I w=n
s.t.
but the reliability classifier is heuristically trained [11]. In this 0 ≤ wi ≤ n, ∀i,
paper, we utilize click data to model weight prior. Intuitively,
where I is an unit vector. Based on Laplacian, we re-write (8)
an image with larger user click count should be more reliable
as follows:
and contribute more in the training. For each image, we use
the total click count to estimate the weight prior as follows:
w∗ = argmin 21 wT (2βLlap + 2αE)w + ( C n l − 2αw ) w
c T

2 w T
P (w) = w − w c 2 , (4) I w=n
s.t.
0 ≤ wi ≤ n, ∀i,
where wc is the normalized click vector deﬁned as: (9)
where E is an identity matrix, and Llap is the Laplacian ma-
wc =wc / wc , wc = T (u), (5) trix of graph G that is deﬁned as:

663
the weakly supervised training procedure, are evaluated. The
performance is evaluated by recognition accuracy.
Llap = D −
G Extensive experiments are conducted. Firstly, we show
(10)
D = diag( j g1,j , j g2,j , ..., j gn,j ). the experimental settings including the used dataset; second-
ly, we evaluate our combined BCNN and click feature; ﬁ-
By re-writing (8), we have:
nally, we show the effect of the weakly supervised training
1 C procedure, wherein both the click data based prior and visual
w∗ = argmin wT (2βLlap + 2αE)w + ( l − 2αwc )T w consistency based graph-regularization are evaluated.
w 2 n

I w=n
T
s.t. 3.1. Experimental Settings
0 ≤ wi ≤ n, ∀i.
(11) With no publicly available training/testing split for both
We use interior point algorithm1 to solve the quadratic pro- the datasets, similar to [13], we randomly split the two
gramming problem (11). datasets into three parts: 50% for training, 30% for valida-
tion, and 20% for testing respectively.
2.3. Extensions for W-C-BCNN
We discuss two extensions for our method. One is an im- 3.1.1. The Clickture-Dog dataset
proved per-category weight learning method, another is con-
It consists of dog images of 344 categories. To ensure
structing weight prior by other kinds of data.
a valid training/testing split, we ﬁlter out the categories that
contain less than 3 images. Also, we randomly select 300
2.3.1. Per-Category Weight Learning samples for the categories with more than 300 samples to
We propose a per-category weight optimization for a avoid unbalance among categories. Altogether, we obtain a
dataset that is very unbalanced. Denote μ j as the weight dog-breed dataset with 30, 568 dog images of 283 categories.
vector for samples in category j, we conduct a per-category For each image, the clicked query set and their correspond-
weight optimization to obtain w∗ = {μμ∗1 , ..., μ ∗N }. Each μ ∗j is ing count are collected from Clickture-Full [9] (refer to [13]).
obtained by solving the following problem: Different from [14], we did not conduct any further prepro-
cessing, e.g. data cleaning.

μ ∗j = argmin 21 μ T (2βLjlap + 2αE)μμ + (C n lj − 2αwj ) μ

c T
3.1.2. The WCE Dataset
μ T
I μ = |χj |
s.t. We adopted 12, 090 original WCE images from differ-
0 ≤ μ i ≤ |χj |, ∀i,
ent patients at different times. Specially, 390 images are
(12)
with hemorrhage, and the rest 11, 700 images are normal.
where Ljlap is a Laplacian matrix representing the pair-wised
Due to the huge unbalance of the hemorrhage/normal images,
image similarity for category j.
we augmented the hemorrhage images by 30 times. Occlu-
sion and image cropping are employed for training and vali-
2.3.2. Recognition Tasks without Click Data dataion/testing images respectively.
As most datasets do not contain click data, thus we pro-
pose to adopt image quality feature to generate semantical im- 3.2. The Combined BCNN and Click Feature
age feature and model sample weight prior. For example, in
medical image recognitions, we suppose that each sample’s Considering the high-dimension in the query-click fea-
reliability (weight) can be estimated by its quality, and the ture, we utilize query merging to construct a compact query-
quality feature can represent semantical content in images. cluster-click feature representing each image. We ﬁrstly use
click data to merge semantically similar queries; with the
merged queries, we construct a 4, 318-dimensional compact
3. EXPERIMENTS click feature based on (2). We compared the recognition ac-
curacy of the proposed C-BCNN with the original BCNN
We evaluate the performance of C-BCNN on the both
models. In order to better demonstrate the advantage of in-
public Clickture-Dog dataset [9] and a Wireless capsule en-
tegrating semantical feature, we treat samples equally in this
doscopy (WCE) dataset that contains a large number of im-
experiment. Fig. 2 plots the training/validation errors in each
ages inside the intestine. The two components of our ap-
iteration for the original BCNN and C-BCNN respectively.
proach, i.e. the combined feature extracted by C-BCNN and
We see that the C-BCNN obtains a large improvement over
1 http://cn.mathworks.com/help/optim/ug/quadprog.html conventional BCNN on both the training and validation set.

664
100
CíBCNN train: 8.7 Table 1. Comparison of recognition accuracy (%) for C-
CíBCNN val: 26.7
Training error (%)
80 BCNN train: 25.6 BCNN with BCNN on Clickture-Dog and WCE dataset.
BCNN val: 55 BCNN C-BCNN
60 Clickture-Dog 33.20 51.20
WCE 97.10 98.40
40

20 truncated2 function is applied on the optimal weight vector.

0
0 20 40 60 80 100
Epoch

Fig. 3. Visualization of the top-1 error of our C-BCNN model

compared with original BCNN model in the training.

Į=0.01 Į=0.1 Į=1 Į=10

Į=0.01 Į=0.1 Į=1 Į=10
0.001 45.8 46.2 44.9 45.1
0.001 45 46.8 44.7 44.6
0.01 47.1 46.5 46.7 46.8
0.01 46.6 47.9 46.2 44.2
ȕ 0.1 47.1 47.1 46.5 47.9
ȕ 0.1 46.7 47.9 46.6 47.1
1 47.8 48.9 47.9 46.6
1 48.1 48.9 47.1 48.3
10 48.3 47.4 46.6 47.9
10 48.3 47.4 46.6 47.5

Fig. 5. Some training samples enumerated with weight in

descending order from top to bottom and left to right.
(a) Original (b) Truncation

Fig. 4. Recognition accuracies (%) with different weight 4. CONCLUSION

transformation under different combination of α and β.
We present a novel weakly supervised click data guid-
ed bilinear CNN model (W-C-BCNN) for fine-grained im-
3.3. The Weakly Supervised Training age recognition tasks, where an image is represented as the
combined BCNN and user click feature. Compared with the
As aforementioned, our weakly supervised training pro-
original BCNN model, we employ an additional user click
cedure can help to iteratively select better training samples to
feature to capture the semantical content in images. Con-
learn the deep model, where better samples will be assigned
sidering the heavy noise in click data, a weakly supervised
with larger weight. Fig. 5 illustrates some training images
method is proposed to learn this model, where images with
enumerated with their weight in descending order, where im-
higher reliability will contribute more in the training. In W-
ages are sampled with a fixed interval from one category on
C-BCNN, we iteratively learn both the C-BCNN model and
the Clickture-Dog dataset. It can be found that images as-
the sample weight which determines each image’s reliabili-
signed with smaller weight are mostly those with lower qual-
ty. Moreover, We extend our method to deal with a common
ity, i.e. with clustered background, object-less. It implies the
fine-grained image recognition dataset that does not contain
advantage of our method in sample selection especially for a
user click data. Extensive validations are conducted on a pub-
noisy and unbalanced database. We use the weighted sam-
lic Clickture-Dog dataset and WCE medical image dataset,
ples to do the following recognition task, and obtain a better
which demonstrates the advantages of our method.
performance.
Future work will concentrate on several open problems:
To improve the computational efficiency, we use the re-
1) Rather than combining the off-the-shelf user click feature,
duced C-BCNN feature by a max-pooling strategy (with
we consider design an end-to-end training structure to learn
stride 4 × 4) as the input of the Multi-SVM classification.
the deep click feature; 2) considering the low convergency
We firstly test the performance with different combination of
of the bilinear model, we improve the optimization to speed
α and β, and the result is shown in Fig. 4. Using the opti-
up the training procedure, e.g. learning sparse bilinear mod-
mal α and β, we compare the proposed method with original
el [15]; 3) with limited samples containing user click infor-
C-BCNN, and result is shown in Table 2. Note that the pool-
mation, we consider study the C-BCNN training in one-shot
ing operation makes the result in Table 2 be different to that
learning scenario [16]; 4) owing to the advantages in dealing
in Table 1. Both “W-C-BCNN” and “W-C-BCNN(T)” are
with large-scale and noisy dataset, we consider to extend its
methods with weakly supervised training scheme, and their
application to other fields, e.g. transportation systems [17].
difference lies in that the weight obtained by (11) is directly
used in“W-C-BCNN”, while in “W-C-BCNN(T)”, a simple 2 Assign elements larger than 2 with equal weight.

665
C-BCNN OUR OUR(T) [8] A. Vedaldi, S. Mahendran, S. Tsogkas, S. Maji, R. Gir-
Clickture-Dog 50.60 52.90 52.90 shick, J. Kannala, E. Rahtu, I. Kokkinos, M. B.
WCE 98.40 99.40 99.50 Blaschko, D. Weiss, B. Taskar, K. Simonyan, N. Saphra,
and S. Mohamed, “Understanding objects in detail with
Table 2. Comparison of recognition accuracy (%) between C- fine-grained attributes,” in IEEE Conference on Com-
BCNN learning with W-C-BCNN by different weight trans- puter Vision and Pattern Recognition, 2014.
formations. “Our” denotes the proposed W-C-BCNN method.
[9] Xian-Sheng Hua, Linjun Yang, Jingdong Wang, Jing
Wang, Ming Ye, Kuansan Wang, Yong Rui, and Jin Li,
5. ACKNOWLEDGEMENTS “Clickage: Towards bridging semantic and intent gaps
via mining click logs of search engines,” in ACM In-
This work was supported by National Natural Science ternational Conference on Multimedia. ACM, 2013, pp.
Foundation of China (No. 61602136, No.61622205, and No. 243–252.
61601158), and Zhejiang Provincial Natural Science Founda-
tion of China under Grant LR15F020002. [10] M. Tan, B. Wang, Z. Wu, J. Wang, and G. Pan, “Weakly
supervised metric learning for traffic sign recognition in
6. REFERENCES a lidar-equipped vehicle,” IEEE Transactions on Intel-
ligent Transportation Systems, vol. 17, no. 5, pp. 1415–
[1] T. Berg, Jiongxin Liu, Seung Woo Lee, M. L. Alexan- 1427, May 2016.
der, D. W. Jacobs, and P. N. Belhumeur, “Birdsnap: [11] Min Tan, Zhenfang Hu, Baoyuan Wang, Jieyi Zhao, and
Large-scale fine-grained visual categorization of bird- Yueming Wang, “Robust object recognition via weakly
s,” in IEEE Conference on Computer Vision and Pattern supervised metric and template learning,” Neurocom-
Recognition, 2014, pp. 2019–2026. puting, vol. 101, pp. 96–107, 2016.
[2] A. Iscen, G. Tolias, P. H. Gosselin, and H. Jegou, “A [12] Xuelong Li, Guosheng Cui, and Yongsheng Dong,
comparison of dense region detectors for image search “Graph regularized non-negative low-rank matrix fac-
and fine-grained classification,” IEEE Transactions on torization for image clustering,” IEEE Transactions on
Image Processing, vol. 24, no. 8, pp. 2369–81, 2015. Cybernetics, 2016.
[3] Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng [13] Min Tan, Jun Yu, Guangjian Zheng, Weichen Wu, and
Yao, and Li Fei-Fei, “Novel dataset for fine-grained im- Kejia Sun, “Deep neural network boosted large scale
age categorization,” in IEEE Conference on Computer image recognition using user click data,” in Internation-
Vision and Pattern Recognition, Colorado Springs, CO, al Conference on Internet Multimedia Computing and
June 2011. Service, 2016, pp. 118–121.
[4] Shenghua Gao, Ivor Wai-Hung Tsang, and Yi Ma, [14] Chenghua Li, Qiang Song, Yuhang Wang, Hang Song,
“Learning category-specific dictionary and shared dic- Qi Kang, Jian Cheng, and Hanqing Lu, “Learning to
tionary for fine-grained image categorization,” IEEE recognition from bing clickture data,” in IEEE Interna-
Transactions on Image Processing, vol. 23, no. 2, pp. tional Conference on Multimedia and Expo, 2016, pp.
623–634, Feb. 2014. 1–4.
[5] Aruni RoyChowdhury Tsung-Yu Lin and Subhransu [15] Min Tan, Gang Pan, Yueming Wang, Yuting Zhang, and
Maji, “Bilinear CNN Models for Fine-grained Visu- Zhaohui Wu, “L1-norm latent SVM for compact fea-
al Recognition,” in IEEE International Conference on tures in object detection,” Neurocomputing, vol. 139,
Computer Vision, 2015. no. 0, pp. 56 – 64, 2014.
[6] Ning Zhang, Manohar Paluri, Marc’Aurelio Ranzato, [16] Q. Zheng, A Kumar, and G. Pan, “A 3d feature descrip-
Trevor Darrell, and Lubomir Bourdev, “Panda: Pose tor recovered from a single 2d palmprint image.,” IEEE
aligned networks for deep attribute modeling,” in IEEE Transactions on Pattern Analysis and Machine Intelli-
Computer Vision and Pattern Recognition, 2014, pp. gence, vol. 38, no. 6, pp. 1272–1279, 2016.
1637–1644.
[17] L. Chen, D. Zhang, X. Ma, and L. Wang, “Container
[7] Hong Shao, Shuang Chen, Jie-yi Zhao, Wen-cheng Cui, port performance measurement and comparison lever-
and Tian-shu Yu, “Face recognition based on subset s- aging ship gps traces and maritime open data,” IEEE
election via metric learning on manifold,” Frontiers of Transactions on Intelligent Transportation Systems, vol.
Information Technology & Electronic Engineering, vol. 5, no. 2, pp. 1–16, 2016.
16, no. 12, pp. 1046–1058, 2015.

666

Efficient Net B0
No ratings yet
Efficient Net B0
4 pages
Vineland Social Maturity Scale
100% (7)
Vineland Social Maturity Scale
18 pages
Deep Learning in Object Detection, PDF
No ratings yet
Deep Learning in Object Detection, PDF
64 pages
Convolutional Neural PDF
No ratings yet
Convolutional Neural PDF
187 pages
Exemplar Problems in Mathematics For K12
100% (1)
Exemplar Problems in Mathematics For K12
343 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
73 pages
Loreggia Giacomo
No ratings yet
Loreggia Giacomo
80 pages
Lecture2 Advanced CNN
No ratings yet
Lecture2 Advanced CNN
55 pages
A Sample of Qualitative Research Proposal Written in The APA Style
100% (6)
A Sample of Qualitative Research Proposal Written in The APA Style
10 pages
Harley MSC Thesis Menos Especializadpo
No ratings yet
Harley MSC Thesis Menos Especializadpo
71 pages
Dlincv 161110052148 PDF
No ratings yet
Dlincv 161110052148 PDF
271 pages
AML - Lecture - 11 - 19nov24
No ratings yet
AML - Lecture - 11 - 19nov24
103 pages
Unit 3
No ratings yet
Unit 3
105 pages
tmpD684 TMP
No ratings yet
tmpD684 TMP
8 pages
Lect11 Neural Nets2
No ratings yet
Lect11 Neural Nets2
48 pages
A Survey On Deep Learning-Based Fine-Grained Object Classification and Semantic Segmentation
No ratings yet
A Survey On Deep Learning-Based Fine-Grained Object Classification and Semantic Segmentation
17 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
155 pages
Lec6 RNN Attention Search
No ratings yet
Lec6 RNN Attention Search
62 pages
Master's Thesis Deep Learning For Visual Recognition: Remi Cadene Supervised by Nicolas Thome and Matthieu Cord
No ratings yet
Master's Thesis Deep Learning For Visual Recognition: Remi Cadene Supervised by Nicolas Thome and Matthieu Cord
58 pages
CVlecture 5
No ratings yet
CVlecture 5
56 pages
Fine-Tuning CNN Image Retrieval With No Human Annotation
No ratings yet
Fine-Tuning CNN Image Retrieval With No Human Annotation
14 pages
Presented by Pratap Solapur Under The Guidance of Prof. P. B. Patil
No ratings yet
Presented by Pratap Solapur Under The Guidance of Prof. P. B. Patil
20 pages
Deep Learning For DIC
No ratings yet
Deep Learning For DIC
35 pages
Pcanet: A Simple Deep Learning Baseline For Image Classification?
No ratings yet
Pcanet: A Simple Deep Learning Baseline For Image Classification?
15 pages
Image Recognition by Predicted User Click Feature With Multidomain Multitask Transfer Deep Network
No ratings yet
Image Recognition by Predicted User Click Feature With Multidomain Multitask Transfer Deep Network
16 pages
Dlcv2017d3l1segmentation 170623173102
No ratings yet
Dlcv2017d3l1segmentation 170623173102
36 pages
Content and Pedagogy For The Mother Tongue
No ratings yet
Content and Pedagogy For The Mother Tongue
22 pages
FACTORS Affecting SHS Students Chapter 1 2 3 1
No ratings yet
FACTORS Affecting SHS Students Chapter 1 2 3 1
22 pages
Memorandum DM CI 2020 00085
No ratings yet
Memorandum DM CI 2020 00085
69 pages
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
No ratings yet
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
55 pages
Two at Once: Enhancing Learning and Generalization Capacities Via IBN-Net
No ratings yet
Two at Once: Enhancing Learning and Generalization Capacities Via IBN-Net
16 pages
Conv Thesis 8
No ratings yet
Conv Thesis 8
38 pages
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
No ratings yet
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
43 pages
Cloning Safe Driving Behavior For Self-D PDF
No ratings yet
Cloning Safe Driving Behavior For Self-D PDF
8 pages
2015 - DeepLab v1 - Semantic Image Segmentation With Deep Convolutional Nets and Fully Connected Crfs
No ratings yet
2015 - DeepLab v1 - Semantic Image Segmentation With Deep Convolutional Nets and Fully Connected Crfs
14 pages
VGG Image Classification Practical
No ratings yet
VGG Image Classification Practical
11 pages
Effects of Image Degradation and Degradation Removal To CNN-Based Image Classification
No ratings yet
Effects of Image Degradation and Degradation Removal To CNN-Based Image Classification
15 pages
Zhang Deep Graphical Feature Learning For The Feature Matching Problem ICCV 2019 Paper
No ratings yet
Zhang Deep Graphical Feature Learning For The Feature Matching Problem ICCV 2019 Paper
10 pages
A Lightweight Binarized Convolutional Neural Network Model For Small Memory and Low-Cost Mobile Devices
No ratings yet
A Lightweight Binarized Convolutional Neural Network Model For Small Memory and Low-Cost Mobile Devices
11 pages
Lin Bilinear CNN Models ICCV 2015 Paper
No ratings yet
Lin Bilinear CNN Models ICCV 2015 Paper
9 pages
A Guide To Image Captioning. How Deep Learning Helps in Captioning
No ratings yet
A Guide To Image Captioning. How Deep Learning Helps in Captioning
17 pages
CNN Maryim 2020
No ratings yet
CNN Maryim 2020
12 pages
Building Efficient Lightweight CNN Models
No ratings yet
Building Efficient Lightweight CNN Models
25 pages
Assignment-6 STC-DL
No ratings yet
Assignment-6 STC-DL
17 pages
Lightweight Filtering of Noisy Web Data
No ratings yet
Lightweight Filtering of Noisy Web Data
12 pages
Oquab Is Object Localization 2015 CVPR Paper
No ratings yet
Oquab Is Object Localization 2015 CVPR Paper
10 pages
Fast Unsupervised Object Localization: Dwaraknath, Anjan Menghani, Deepak Mongia, Mihir
No ratings yet
Fast Unsupervised Object Localization: Dwaraknath, Anjan Menghani, Deepak Mongia, Mihir
8 pages
Back To Simplicit - How To Train Accurate BNNs From Scratch
No ratings yet
Back To Simplicit - How To Train Accurate BNNs From Scratch
9 pages
Fu Relaxing From Vocabulary ICCV 2015 Paper
No ratings yet
Fu Relaxing From Vocabulary ICCV 2015 Paper
9 pages
1 s2.0 S0169023X24000090 Main
No ratings yet
1 s2.0 S0169023X24000090 Main
17 pages
Lecture 19
No ratings yet
Lecture 19
19 pages
Review of Image Classification Algorithms Based On
No ratings yet
Review of Image Classification Algorithms Based On
10 pages
Image Retrival
No ratings yet
Image Retrival
7 pages
A Light-Weight Model With Granularity Feature Representation For Fine-Grained Visual Classification
No ratings yet
A Light-Weight Model With Granularity Feature Representation For Fine-Grained Visual Classification
12 pages
DL Assignment 4
No ratings yet
DL Assignment 4
7 pages
Transforming Sensor Data To The Image Domain For Deep Learning - An Application To Footstep Detection
No ratings yet
Transforming Sensor Data To The Image Domain For Deep Learning - An Application To Footstep Detection
8 pages
BNN (Binaray+Neural+Network) PYNQ Online Course
No ratings yet
BNN (Binaray+Neural+Network) PYNQ Online Course
7 pages
IEEE (Org)
No ratings yet
IEEE (Org)
5 pages
Ref 1284
No ratings yet
Ref 1284
14 pages
A2 Clinical Psychology Revision Notes Edexcel
100% (1)
A2 Clinical Psychology Revision Notes Edexcel
12 pages
Image Recognition Based On Deep Learning
No ratings yet
Image Recognition Based On Deep Learning
5 pages
Deep Learning Approach For Object Detection Using CNN: Abstract
No ratings yet
Deep Learning Approach For Object Detection Using CNN: Abstract
7 pages
Classify Webcam Images Using Deep Learning
No ratings yet
Classify Webcam Images Using Deep Learning
17 pages
Joint Estimation of Age and Gender From Unconstrained Face Images Using Lightweight Multi-Task CNN For Mobile Applications
No ratings yet
Joint Estimation of Age and Gender From Unconstrained Face Images Using Lightweight Multi-Task CNN For Mobile Applications
4 pages
Chapter 1. Basic Concepts in Assessment - Docx 2021
No ratings yet
Chapter 1. Basic Concepts in Assessment - Docx 2021
26 pages
Module 1 (Arts and Humanities)
No ratings yet
Module 1 (Arts and Humanities)
2 pages
Phillip Kevin Lane: Kotler - Keller
No ratings yet
Phillip Kevin Lane: Kotler - Keller
30 pages
Indirect Questions
100% (1)
Indirect Questions
11 pages
Week 1 Doing Philosphy
No ratings yet
Week 1 Doing Philosphy
5 pages
The Art of Listening
No ratings yet
The Art of Listening
4 pages
Golden Rules of Process Safety
No ratings yet
Golden Rules of Process Safety
1 page
FL L Wilson Selected Visible Thinking Routines Handout
No ratings yet
FL L Wilson Selected Visible Thinking Routines Handout
7 pages
The Use of Information Technology in Teaching English
No ratings yet
The Use of Information Technology in Teaching English
6 pages
Ad Analysis - Example
No ratings yet
Ad Analysis - Example
3 pages
Power DISC
No ratings yet
Power DISC
13 pages
ch3 Ogl220
No ratings yet
ch3 Ogl220
2 pages
Assess01 Exams May 2016
No ratings yet
Assess01 Exams May 2016
3 pages
Charles Forceville - Visual and Multimodal Communication - Applying The Relevance Principle-Oxford University Press (2020) - 20-51
No ratings yet
Charles Forceville - Visual and Multimodal Communication - Applying The Relevance Principle-Oxford University Press (2020) - 20-51
46 pages
Learning To Become A Taste Expert: Kathryn A. Latour John A. Deighton
No ratings yet
Learning To Become A Taste Expert: Kathryn A. Latour John A. Deighton
55 pages
12 Days of Christmas Word Problems
No ratings yet
12 Days of Christmas Word Problems
5 pages
HCI 6 Feedback, Internationalization, and User Flow
No ratings yet
HCI 6 Feedback, Internationalization, and User Flow
44 pages
Academia de San Lorenzo Dema-Ala Inc.: Tialo, Sto. Cristo, City of San Jose Del Monte, Bulacan
No ratings yet
Academia de San Lorenzo Dema-Ala Inc.: Tialo, Sto. Cristo, City of San Jose Del Monte, Bulacan
2 pages
4FP Structure
No ratings yet
4FP Structure
15 pages
Contoh RPH Catch Up Plan
No ratings yet
Contoh RPH Catch Up Plan
12 pages
REVIEWER IN ORAL COMMUNICATION 4th QTR
No ratings yet
REVIEWER IN ORAL COMMUNICATION 4th QTR
2 pages
9 Egb Uit 3 Microcurricular Planning
No ratings yet
9 Egb Uit 3 Microcurricular Planning
3 pages
A Multi-View Deep Convolutional Neural Networks For Lung Nodule Segmentation
No ratings yet
A Multi-View Deep Convolutional Neural Networks For Lung Nodule Segmentation
4 pages
Zhao2018 AGILE CNN
No ratings yet
Zhao2018 AGILE CNN
11 pages
A Morphological Operation-Based Approach For Sub-Pleural Lung Nodule Detection From CT Images
No ratings yet
A Morphological Operation-Based Approach For Sub-Pleural Lung Nodule Detection From CT Images
6 pages
Uni Course
No ratings yet
Uni Course
2 pages
PSYCH 100 Chapter 2
No ratings yet
PSYCH 100 Chapter 2
4 pages
IGNOU MCA Digital Image Processing and Computer Vision Unsolved Paper Book MCS 230
From Everand
IGNOU MCA Digital Image Processing and Computer Vision Unsolved Paper Book MCS 230
Manish Soni
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet