0% found this document useful (0 votes)
58 views6 pages

Guangjian Zheng, Min Tan, Jun Yu, Qing Wu, Jianping Fan

Uploaded by

rekka mastouri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views6 pages

Guangjian Zheng, Min Tan, Jun Yu, Qing Wu, Jianping Fan

Uploaded by

rekka mastouri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Proceedings of the IEEE International Conference on Multimedia and Expo (ICME) 2017 10-14 July 2017

FINE-GRAINED IMAGE RECOGNITION VIA WEAKLY SUPERVISED CLICK DATA


GUIDED BILINEAR CNN MODEL

Guangjian Zheng1 , Min Tan1 , Jun Yu1 , Qing Wu1 , Jianping Fan2
1
Key Laboratory of Complex Systems Modeling and Simulation,
School of Computer Science and Technology, Hangzhou Dianzi University
2
Department of Computer Science, University of North Carolina at Charlotte

ABSTRACT distinctive object parts learned from visual features [6, 7] or


properties extracted from human annotations [8]. This way is
Bilinear convolutional neural networks (BCNN) model, the
impracticable since it always requires time-consuming human
state-of-the-art in fine-grained image recognition, fails in dis-
annotations. To deal with this problem, Microsoft released [9]
tinguishing the categories with subtle visual differences. We
a large-scale real-world image click data to public, which is
design a novel BCNN model guided by user click data (C-
obtained from the click log of the commercial search engines.
BCNN) to improve the performance via capturing both the
It consists of three parts: queries, clicked images, and the
visual and semantical content in images. Specially, to deal
corresponding click count. With click data, each image can
with the heavy noise in large-scale click data, we propose a
be represented as the click count vector based on its clicked
weakly supervised learning approach to learn the C-BCNN,
query set, namely query-click feature. Compared with visual
namely W-C-BCNN. It can automatically weight the training
features, the click feature not only easily describes image se-
images based on their reliability. Extensive experiments are
mantics, but also be invariant to changes of image conditions.
conducted on the public Clickture-Dog dataset. It shows that:
Owing to these advantages, we propose to combine the click
(1) integrating CNN with click feature largely improves the
feature with the BCNN feature to jointly represent images.
performance; (2) both the click data and visual consistency
Though with powerful representation ability, the click fea-
can help to model image reliability. Moreover, the method
ture brings some new challenges in image recognition: 1) how
can be easily customized to medical image recognition. Our
to integrate the click feature with the sophisticated BCNN
model performs much better than conventional BCNN mod-
model; 2) how to determine whether an image/query is re-
els on both the Clickture-Dog and medical image dataset.
liable to learn the image recognition model, given that there
Index Terms— Fine-grained Image Recognition, User are many noisy images/queries in user click data? In this pa-
Click Data, Bilinear CNN, Weakly Supervised Learning per, we propose a click data guided BCNN model for fine-
grained image recognition. Specially, to deal with the heavy
1. INTRODUCTION noise, we introduce a variable to represent the image reliabil-
ity and present a weakly supervised learning method to itera-
Fine-grained image recognition aims to distinguish ob- tively learn the reliability variable and BCNN model. Images
jects in subordinate classes, e.g. bird species [1, 2], dog with higher reliability will contribute more in the training.
breeds [3], plants [4]. It is very challenging due to the small We conduct extensive comparisons and validations on
visual differences among different categories. Even the deep both a public Clickture-Dog and medical image dataset,
learning model, the state-of-the-art model in vision systems, which demonstrates the advantages of our method.
fails in this task. Recently, Lin et.al. [5] proposed a bilinear The contributions of this work are three-folds:
convolutional neural networks (BCNN) that use two separate
feature extractors by convolutional neural networks (CNN) to • The paper is the FIRST TIME to construct a deep net-
jointly represent fine-grained images. It is proved to be one work (C-BCNN) with the combined visual and click
of the best fine-grained recognition models owing to the local feature. Compared with traditional BCNN, C-BCNN
pairwise feature interactions. Though BCNN achieves high greatly improves the fine-grained image recognition.
accuracy, the semantic gap remains a challenge.
Designing powerful semantical feature is a good choice • We propose a weakly supervised model W-C-BCNN to
to help to bridge this gap. Many researchers propose to learn automatically weight training images in learning the C-
attribute-related image features, and the attributes are either BCNN model. Image reliability and C-BCNN model
are iteratively learned. Our model is capable of dealing
Min Tan is the corresponding author. Email: [email protected]. with large-scale images with heavy noise.

978-1-5090-6067-2/17/$31.00 2017
c IEEE
978-1-5090-6067-2/17/$31.00 ©2017 IEEE ICME 2017

661
• We present an efficient optimization to iteratively learn images/queries obtained from user click data are extremely
both the C-BCNN model and image reliability. Image noisy, we have interest in the answer to the question: how
reliability (weight) is optimized via solving a softmax- to determine whether an image/query is reliable to learn the
loss-based quadratic programming. image recognition model, given that there are many noisy im-
ages/queries? Intuitively, images of higher quality should be
2. OUR METHOD more reliable and contribute more in the training than that of
lower quality. To address these issues, we introduce a vari-
We present a novel Weakly supervised user Click data able to characterize each image’s reliability, and propose a
guided Bilinear CNN (W-C-BCNN), where an image is rep- method to iteratively learn both the C-BCNN model and im-
resented as the combined deep visual and semantical feature. age reliability. Fig. 1 illustrates our pipeline for fine-grained
We firstly review the classical BCNN model proposed in [5], image recognition. In the following sections, we will show
and then illustrate our model, including the model structure our model and its optimization in detail.
and weakly supervised learning procedure.
2.2.1. C-BCNN Structure
2.1. Review of Classical BCNN Model Our C-BCNN is constructed based on [5]. Fig. 2 illus-
A BCNN model [5] consists of two CNN feature extrac- trates the structure of the classical BCNN and our C-BCNN
tors, whose outputs are multiplied using outer product at each respectively. The main differences lie in the feature concate-
location of an image for image representation. Fig. 2(a) shows nating layer behind the 2 normalization layer. It is designed
the model structure of a classical BCNN model. It is particu- for integrating the CNN feature with semantical feature. More
larly useful for fine-grained categorization since it can model specifically, the normalized BCNN vector z is passed through
local pairwise feature interactions in a translation invariant a feature concatenating layer, generating a combined feature
manner. When it is used for a classification task, the BCNN vector of oi ← [zi , τ ui ]. Here, zi and ui are the deep visual
model B is defined as a quadruple B = (fA , fB , P, C), where and semantical feature for image i, and τ denotes the weight
fA , fB are two CNN feature functions, P is a pooling func- of click feature in the combined feature.
tion and C is a classification function. This BCNN extractors We employ the user click data to construct the semantical
the deep visual φ for an image I as below: feature ui , and each image is represented as a click feature
 vector by concatenating the click count for each query. As the
φ (I) = bilinear(l, I, fA , fB ), (1) query set obtained from the large-scale click data is extremely
l∈L
huge and redundant, we merge queries with similar semantics,
where bilinear(l, I, fA , fB ) = fA (l, I)T fB (l, I) is the bi- and represent each image as a click feature vector based on
linear feature combination of fA and fB at each location each query cluster instead of original queries:
l ∈ L. The mapping function f : I × L → Rc×D out-   
puts a feature vector of size c × D for image I at locations L. ui = ( ci,j , ci,j , ..., ci,j ), (2)
j∈G1 j∈G2 j∈G£
For classification tasks, the function C is trained using image
where Gj is the index set for the j-th query cluster.
features φ . Note that φ is a high-dimensional feature vector.
For example, when fA and fB extract features of size C × M
and C × N respectively, φ is a feature vector with M × N 2.2.2. The Weakly Supervised Learning of C-BCNN
dimensions. The following classification function C is trained Given n training data (Ii , yi ), where yi ∈ [1, 2, ..., N ] de-
on the reshaped feature of size M N × 1. notes the category label, the parameters θ for the C-BCNN
To obtain an improvedperformance, the signed square- model B is learned by solving the following weakly super-
root step y ← sign(x) |x| and 2 normalization z ← vised C-BCNN learning (W-C-BCNN) problem:
y/y2 are conducted on x = φ (I), which is used as the
input for softmax classification layer. Afterwards, an end-to-
end training is applied to learn the BCNN model [5]. (θθ ∗ , w∗ )
C
n
1 2
= argmin θθ 2 + wi (yi , oi )
2.2. Our Method θ ,w 2 n i=1
⎧+ 
αP (w) + βS(G, w) (3)
The BCNN model distinguishes object only by visual fea- n
⎨ i=1 wi = n N
ture, therefore the subtle visual differences among categories s.t. (yi , oi ) = − log(eoyi / j=1 eoj )
remain a big challenge. We design a novel C-BCNN model ⎩
wi > 0, ∀i,
to simultaneously extract the deep visual zi and semantical
feature ui for image xi . where wi represents the reliability for sample i, (yi , oi ) is the
We employ the user click data to construct a semantical softmax-loss for Ii , and oi is the combined BCNN feature zi
feature ui for each image based on its clicked queries. As with semantical feature ui .

662
Learning C-BCNN model where T (·) is a transformation scaling the range of u to han-
Sample Weight dle the large unbalance of click counts among images.
weight learning Smoothness Assumption. The smooth term is construct-
Sample loss
Training Sample Deep C-BCNN
Similarity graph
ed based on visual consistency. It is assumed that images sim-
samples re-weighting learning model
Weight prior
ilar in visually should be assigned with similar weight. Based
on this assumption, we possess a graph regularization based
Testing samples Final result [12] smooth term using similarity based adjacency graph G.
The BCNN visual feature z is used to measure the similarity
Testing
and construct G as follows:
Fig. 1. Pipeline of image recognition via click data guid-
  2
ed BCNN (C-BCNN) model with weakly supervised train- S(G, w) = ∀i,j∈χk gi,j (wi −wj ) /2
(6)
ing procedure. In the training phase, it iteratively learns a C- gi,j = sim(xi , xj ) = exp (−  zi − zj ).
BCNN model and the image weight. During testing, we rec-
ognize each testing image using the learned C-BCNN model. 2.2.3. Optimization

6RIWPD[
As (3) is a complicated non-convex problem, it is hard to
find the convergent global optimal solution. We utilize the al-
'URSRXW .HHS
ternation to achieve the local optimal, and find the solution for
6RIWPD[ )LOWHUFRQFDW one variable in each turn with others being fixed. We obtain
/QRUPDOL]DWLRQ /QRUPDOL]DWLRQ &OLFNIHDWXUH
the optimal solution via iteratively optimizing two steps: 1)
fix each wi , and solve a weighted C-BCNN problem to obtain
6TXDUHURRW 6TXDUHURRW
θ ; 2) fix each θ , and solve wi using quadratic programming.
%LOLQHDUYHFWRU %LOLQHDUYHFWRU This procedure is shown in Fig. 1.
,QSXWLPDJH ,QSXWLPDJH
Learning θ . Similar to [5], θ can be trained by back-
propagating the gradients of the classification loss (e.g. con-
(a) Original [5] (b) Proposed
ditional log-likelihood). Let d/dx be the gradient of the loss
function  to x, then by chain rule of gradients we solve the
Fig. 2. Visualization of the two BCNNs models. The biggest two deep network A and B by:
difference (plotted in red) between our C-BCNN and the o-
riginal BCNN lies in the combination of BCNN feature with d d d d
= B( )T , =A , (7)
user click feature. dA dx dB dx
d do dz dy
d
where dx = do dz dy dx .
Since the reliability variable w in the training is unknown Learning w. With fixed BCNN model θ , we re-construct
in advance and should be estimated during training, we em- G by (6) and solve w by the following problem:
ploy a weakly supervised training method [10] to solve (3),
wherein both the reliability variables and C-BCNN model are
C
n
2
iteratively learned in each iteration. The sample weight mod- w∗ = argmin wi li + α w − wc 2
el is constructed by click data based weight prior and visual w n i=1
consistency based smooth constrains. 1   2
+ β gi,j (wi − wj ) (8)
Weight Prior. The weight prior term P (w) possesses a 2
∀s,t ∀i,j∈As,t
regularization constrain. Recently, Tan et al. proposed to dis-  T
criminatively train a binary SVM classifier to estimate weight, I w=n
s.t.
but the reliability classifier is heuristically trained [11]. In this 0 ≤ wi ≤ n, ∀i,
paper, we utilize click data to model weight prior. Intuitively,
where I is an unit vector. Based on Laplacian, we re-write (8)
an image with larger user click count should be more reliable
as follows:
and contribute more in the training. For each image, we use
the total click count to estimate the weight prior as follows:
w∗ = argmin 21 wT (2βLlap + 2αE)w + ( C n l − 2αw ) w
c T

2 w  T
P (w) = w − w c 2 , (4) I w=n
s.t.
0 ≤ wi ≤ n, ∀i,
where wc is the normalized click vector defined as: (9)
where E is an identity matrix, and Llap is the Laplacian ma-
wc =wc / wc  , wc = T (u), (5) trix of graph G that is defined as:

663
the weakly supervised training procedure, are evaluated. The
 performance is evaluated by recognition accuracy.
Llap = D −
G   Extensive experiments are conducted. Firstly, we show
(10)
D = diag( j g1,j , j g2,j , ..., j gn,j ). the experimental settings including the used dataset; second-
ly, we evaluate our combined BCNN and click feature; fi-
By re-writing (8), we have:
nally, we show the effect of the weakly supervised training
1 C procedure, wherein both the click data based prior and visual
w∗ = argmin wT (2βLlap + 2αE)w + ( l − 2αwc )T w consistency based graph-regularization are evaluated.
w 2 n

I w=n
T
s.t. 3.1. Experimental Settings
0 ≤ wi ≤ n, ∀i.
(11) With no publicly available training/testing split for both
We use interior point algorithm1 to solve the quadratic pro- the datasets, similar to [13], we randomly split the two
gramming problem (11). datasets into three parts: 50% for training, 30% for valida-
tion, and 20% for testing respectively.
2.3. Extensions for W-C-BCNN
We discuss two extensions for our method. One is an im- 3.1.1. The Clickture-Dog dataset
proved per-category weight learning method, another is con-
It consists of dog images of 344 categories. To ensure
structing weight prior by other kinds of data.
a valid training/testing split, we filter out the categories that
contain less than 3 images. Also, we randomly select 300
2.3.1. Per-Category Weight Learning samples for the categories with more than 300 samples to
We propose a per-category weight optimization for a avoid unbalance among categories. Altogether, we obtain a
dataset that is very unbalanced. Denote μ j as the weight dog-breed dataset with 30, 568 dog images of 283 categories.
vector for samples in category j, we conduct a per-category For each image, the clicked query set and their correspond-
weight optimization to obtain w∗ = {μμ∗1 , ..., μ ∗N }. Each μ ∗j is ing count are collected from Clickture-Full [9] (refer to [13]).
obtained by solving the following problem: Different from [14], we did not conduct any further prepro-
cessing, e.g. data cleaning.

μ ∗j = argmin 21 μ T (2βLjlap + 2αE)μμ + (C n lj − 2αwj ) μ


c T
3.1.2. The WCE Dataset
μ  T
I μ = |χj |
s.t. We adopted 12, 090 original WCE images from differ-
0 ≤ μ i ≤ |χj |, ∀i,
ent patients at different times. Specially, 390 images are
(12)
with hemorrhage, and the rest 11, 700 images are normal.
where Ljlap is a Laplacian matrix representing the pair-wised
Due to the huge unbalance of the hemorrhage/normal images,
image similarity for category j.
we augmented the hemorrhage images by 30 times. Occlu-
sion and image cropping are employed for training and vali-
2.3.2. Recognition Tasks without Click Data dataion/testing images respectively.
As most datasets do not contain click data, thus we pro-
pose to adopt image quality feature to generate semantical im- 3.2. The Combined BCNN and Click Feature
age feature and model sample weight prior. For example, in
medical image recognitions, we suppose that each sample’s Considering the high-dimension in the query-click fea-
reliability (weight) can be estimated by its quality, and the ture, we utilize query merging to construct a compact query-
quality feature can represent semantical content in images. cluster-click feature representing each image. We firstly use
click data to merge semantically similar queries; with the
merged queries, we construct a 4, 318-dimensional compact
3. EXPERIMENTS click feature based on (2). We compared the recognition ac-
curacy of the proposed C-BCNN with the original BCNN
We evaluate the performance of C-BCNN on the both
models. In order to better demonstrate the advantage of in-
public Clickture-Dog dataset [9] and a Wireless capsule en-
tegrating semantical feature, we treat samples equally in this
doscopy (WCE) dataset that contains a large number of im-
experiment. Fig. 2 plots the training/validation errors in each
ages inside the intestine. The two components of our ap-
iteration for the original BCNN and C-BCNN respectively.
proach, i.e. the combined feature extracted by C-BCNN and
We see that the C-BCNN obtains a large improvement over
1 http://cn.mathworks.com/help/optim/ug/quadprog.html conventional BCNN on both the training and validation set.

664
100
CíBCNN train: 8.7 Table 1. Comparison of recognition accuracy (%) for C-
CíBCNN val: 26.7
Training error (%)
80 BCNN train: 25.6 BCNN with BCNN on Clickture-Dog and WCE dataset.
BCNN val: 55 BCNN C-BCNN
60 Clickture-Dog 33.20 51.20
WCE 97.10 98.40
40

20 truncated2 function is applied on the optimal weight vector.


0
0 20 40 60 80 100
Epoch

Fig. 3. Visualization of the top-1 error of our C-BCNN model


compared with original BCNN model in the training.

Į=0.01 Į=0.1 Į=1 Į=10


Į=0.01 Į=0.1 Į=1 Į=10
0.001 45.8 46.2 44.9 45.1
0.001 45 46.8 44.7 44.6
0.01 47.1 46.5 46.7 46.8
0.01 46.6 47.9 46.2 44.2
ȕ 0.1 47.1 47.1 46.5 47.9
ȕ 0.1 46.7 47.9 46.6 47.1
1 47.8 48.9 47.9 46.6
1 48.1 48.9 47.1 48.3
10 48.3 47.4 46.6 47.9
10 48.3 47.4 46.6 47.5

Fig. 5. Some training samples enumerated with weight in


descending order from top to bottom and left to right.
(a) Original (b) Truncation

Fig. 4. Recognition accuracies (%) with different weight 4. CONCLUSION


transformation under different combination of α and β.
We present a novel weakly supervised click data guid-
ed bilinear CNN model (W-C-BCNN) for fine-grained im-
3.3. The Weakly Supervised Training age recognition tasks, where an image is represented as the
combined BCNN and user click feature. Compared with the
As aforementioned, our weakly supervised training pro-
original BCNN model, we employ an additional user click
cedure can help to iteratively select better training samples to
feature to capture the semantical content in images. Con-
learn the deep model, where better samples will be assigned
sidering the heavy noise in click data, a weakly supervised
with larger weight. Fig. 5 illustrates some training images
method is proposed to learn this model, where images with
enumerated with their weight in descending order, where im-
higher reliability will contribute more in the training. In W-
ages are sampled with a fixed interval from one category on
C-BCNN, we iteratively learn both the C-BCNN model and
the Clickture-Dog dataset. It can be found that images as-
the sample weight which determines each image’s reliabili-
signed with smaller weight are mostly those with lower qual-
ty. Moreover, We extend our method to deal with a common
ity, i.e. with clustered background, object-less. It implies the
fine-grained image recognition dataset that does not contain
advantage of our method in sample selection especially for a
user click data. Extensive validations are conducted on a pub-
noisy and unbalanced database. We use the weighted sam-
lic Clickture-Dog dataset and WCE medical image dataset,
ples to do the following recognition task, and obtain a better
which demonstrates the advantages of our method.
performance.
Future work will concentrate on several open problems:
To improve the computational efficiency, we use the re-
1) Rather than combining the off-the-shelf user click feature,
duced C-BCNN feature by a max-pooling strategy (with
we consider design an end-to-end training structure to learn
stride 4 × 4) as the input of the Multi-SVM classification.
the deep click feature; 2) considering the low convergency
We firstly test the performance with different combination of
of the bilinear model, we improve the optimization to speed
α and β, and the result is shown in Fig. 4. Using the opti-
up the training procedure, e.g. learning sparse bilinear mod-
mal α and β, we compare the proposed method with original
el [15]; 3) with limited samples containing user click infor-
C-BCNN, and result is shown in Table 2. Note that the pool-
mation, we consider study the C-BCNN training in one-shot
ing operation makes the result in Table 2 be different to that
learning scenario [16]; 4) owing to the advantages in dealing
in Table 1. Both “W-C-BCNN” and “W-C-BCNN(T)” are
with large-scale and noisy dataset, we consider to extend its
methods with weakly supervised training scheme, and their
application to other fields, e.g. transportation systems [17].
difference lies in that the weight obtained by (11) is directly
used in“W-C-BCNN”, while in “W-C-BCNN(T)”, a simple 2 Assign elements larger than 2 with equal weight.

665
C-BCNN OUR OUR(T) [8] A. Vedaldi, S. Mahendran, S. Tsogkas, S. Maji, R. Gir-
Clickture-Dog 50.60 52.90 52.90 shick, J. Kannala, E. Rahtu, I. Kokkinos, M. B.
WCE 98.40 99.40 99.50 Blaschko, D. Weiss, B. Taskar, K. Simonyan, N. Saphra,
and S. Mohamed, “Understanding objects in detail with
Table 2. Comparison of recognition accuracy (%) between C- fine-grained attributes,” in IEEE Conference on Com-
BCNN learning with W-C-BCNN by different weight trans- puter Vision and Pattern Recognition, 2014.
formations. “Our” denotes the proposed W-C-BCNN method.
[9] Xian-Sheng Hua, Linjun Yang, Jingdong Wang, Jing
Wang, Ming Ye, Kuansan Wang, Yong Rui, and Jin Li,
5. ACKNOWLEDGEMENTS “Clickage: Towards bridging semantic and intent gaps
via mining click logs of search engines,” in ACM In-
This work was supported by National Natural Science ternational Conference on Multimedia. ACM, 2013, pp.
Foundation of China (No. 61602136, No.61622205, and No. 243–252.
61601158), and Zhejiang Provincial Natural Science Founda-
tion of China under Grant LR15F020002. [10] M. Tan, B. Wang, Z. Wu, J. Wang, and G. Pan, “Weakly
supervised metric learning for traffic sign recognition in
6. REFERENCES a lidar-equipped vehicle,” IEEE Transactions on Intel-
ligent Transportation Systems, vol. 17, no. 5, pp. 1415–
[1] T. Berg, Jiongxin Liu, Seung Woo Lee, M. L. Alexan- 1427, May 2016.
der, D. W. Jacobs, and P. N. Belhumeur, “Birdsnap: [11] Min Tan, Zhenfang Hu, Baoyuan Wang, Jieyi Zhao, and
Large-scale fine-grained visual categorization of bird- Yueming Wang, “Robust object recognition via weakly
s,” in IEEE Conference on Computer Vision and Pattern supervised metric and template learning,” Neurocom-
Recognition, 2014, pp. 2019–2026. puting, vol. 101, pp. 96–107, 2016.
[2] A. Iscen, G. Tolias, P. H. Gosselin, and H. Jegou, “A [12] Xuelong Li, Guosheng Cui, and Yongsheng Dong,
comparison of dense region detectors for image search “Graph regularized non-negative low-rank matrix fac-
and fine-grained classification,” IEEE Transactions on torization for image clustering,” IEEE Transactions on
Image Processing, vol. 24, no. 8, pp. 2369–81, 2015. Cybernetics, 2016.
[3] Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng [13] Min Tan, Jun Yu, Guangjian Zheng, Weichen Wu, and
Yao, and Li Fei-Fei, “Novel dataset for fine-grained im- Kejia Sun, “Deep neural network boosted large scale
age categorization,” in IEEE Conference on Computer image recognition using user click data,” in Internation-
Vision and Pattern Recognition, Colorado Springs, CO, al Conference on Internet Multimedia Computing and
June 2011. Service, 2016, pp. 118–121.
[4] Shenghua Gao, Ivor Wai-Hung Tsang, and Yi Ma, [14] Chenghua Li, Qiang Song, Yuhang Wang, Hang Song,
“Learning category-specific dictionary and shared dic- Qi Kang, Jian Cheng, and Hanqing Lu, “Learning to
tionary for fine-grained image categorization,” IEEE recognition from bing clickture data,” in IEEE Interna-
Transactions on Image Processing, vol. 23, no. 2, pp. tional Conference on Multimedia and Expo, 2016, pp.
623–634, Feb. 2014. 1–4.
[5] Aruni RoyChowdhury Tsung-Yu Lin and Subhransu [15] Min Tan, Gang Pan, Yueming Wang, Yuting Zhang, and
Maji, “Bilinear CNN Models for Fine-grained Visu- Zhaohui Wu, “L1-norm latent SVM for compact fea-
al Recognition,” in IEEE International Conference on tures in object detection,” Neurocomputing, vol. 139,
Computer Vision, 2015. no. 0, pp. 56 – 64, 2014.
[6] Ning Zhang, Manohar Paluri, Marc’Aurelio Ranzato, [16] Q. Zheng, A Kumar, and G. Pan, “A 3d feature descrip-
Trevor Darrell, and Lubomir Bourdev, “Panda: Pose tor recovered from a single 2d palmprint image.,” IEEE
aligned networks for deep attribute modeling,” in IEEE Transactions on Pattern Analysis and Machine Intelli-
Computer Vision and Pattern Recognition, 2014, pp. gence, vol. 38, no. 6, pp. 1272–1279, 2016.
1637–1644.
[17] L. Chen, D. Zhang, X. Ma, and L. Wang, “Container
[7] Hong Shao, Shuang Chen, Jie-yi Zhao, Wen-cheng Cui, port performance measurement and comparison lever-
and Tian-shu Yu, “Face recognition based on subset s- aging ship gps traces and maritime open data,” IEEE
election via metric learning on manifold,” Frontiers of Transactions on Intelligent Transportation Systems, vol.
Information Technology & Electronic Engineering, vol. 5, no. 2, pp. 1–16, 2016.
16, no. 12, pp. 1046–1058, 2015.

666

You might also like