Convolutional Neural Networks With Gener PDF

This document proposes a new method called Generalized Attentional Pooling (GAP) for action recognition in still images using convolutional neural networks (CNNs). GAP is a generalization of existing attentional pooling techniques that approximates the weight matrix in the second-order pooling layer of a CNN using a top-down vector, multiple bottom-up matrices, and a bottom-up vector. When incorporated into a CNN and provided with additional human pose information, the proposed GAP-CNN+Pose method achieves state-of-the-art accuracy on the large-scale MPII still image dataset for action recognition.

Uploaded by

yudi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views4 pages

Convolutional Neural Networks With Gener PDF

Uploaded by

yudi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Convolutional Neural Networks with Generalized

Attentional Pooling for Action Recognition

Yunfeng Wang∗ , Wengang Zhou† , Qilin Zhang‡ and Houqiang Li§
∗†§ Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, China
‡ Highly Automated Driving, HERE Technologies, Chicago, Illinois, USA

Email: [email protected], † [email protected], ‡ [email protected], § [email protected]

∗

Abstract—Inspired by the recent advance in attentional pooling action classification [3]. Despite their performance advantages
techniques in image classification and action recognition tasks, we over the standard full-image based CNNs, the hard attention
propose the Generalized Attentional Pooling (GAP) based Convo- based CNNs suffer from significantly higher computational
lutional Neural Network (CNN) algorithm for action recognition
in still images. The proposed GAP-CNN can be formulated as complexity due to the extra human bounding boxes detection
a new approximation of the second-order/bilinear pooling tech- step. Worse still, the required manual labeling of such bound-
niques widely used in fine-grained image classification. Unlike the ing boxes in training data is prohibitively time-consuming and
existing rank-1 approximation, a generalized factoring (with non- potentially expensive.
linear functions) is introduced to exploit the intrinsic structural Pooling layer is an indispensable component of a modern
information of the sample covariance matrices of convolutional
layer outputs. Without requiring preprocessing steps such as CNN. Popular pooling algorithms include mean pooling and
object (e.g., human body) bounding boxes detection, the proposed max pooling, both of which are first-order pooling (pooling
GAP-CNN automatically focuses on the most informative part in operates on the feature map/matrix itself). Alternatively, the
still images. With the additional guidance of keypoints of human second-order pooling (pooling operates on the sample co-
pose, the proposed GAP-CNN algorithm achieves the state-of- variance matrix of the feature map/matrix) is advocated in
the-art action recognition accuracy on the large-scale MPII still
image dataset. [11], especially in applications such as semantic segmenta-
Index Terms—Action Recognition, Generalized Attentional tion and fine-grained image classification. In [9], an evolved
Pooling, Convolutional Neural Network variant of the second-order pooling is proposed, with low-
rank approximation and reformulation as attentional pooling.
I. I NTRODUCTION However, it assumes a rank-1 approximation of the weight
Human action recognition is a fundamental and well ex- matrix, which is arguably too restrictive and could potentially
plored research area in computer vision, due to its widespread lead to performance penalties.
applications in human-computer interaction, surveillance and Inspired by [9], we propose a generalized factoring scheme
game control. Traditional methods are based on handcrafted (with additional non-linear functions) of the weight matrix,
features, such as dense trajectory [1], object detection [2] or to exploit the intrinsic structural information of the sample
context mining [3] in image content. Recent convolutional covariance matrices of convolutional layer outputs. With the
neural networks (CNNs) based approaches have achieved proposed factoring scheme, the weights matrix of a pooling
impressive performance in action recognition with both still layer is approximated by a top-down vector, a bottom-up
images and videos. Among them, multi-stream CNN methods vector and multiple bottom-up matrices. Parameters such as
such as “Two-Stream” [4] and its derivatives [5], [6] are among the optimal number of bottom-up matrices are empirically
the top performers on the UCF101 [7] and HMDB51 [8] video determined via cross validation. By incorporating extra super-
action recognition datasets. Currently, the ResNet-101 based vision in the form of human pose keypoints, our proposed
attentional pooling method [9] keeps the record of highest Generalized Attentional Pooling (GAP) based CNN+Pose
action recognition accuracy in still images with the MPII [10] (GAP-CNN+Pose) method can achieve even better results than
dataset. the original attentional pooling [9] on the large-scale MPII
Previously in action recognition in still images, it is the still image action recognition dataset, indicating that GAP is
norm to feed entire images to a CNN for classification. Later complementary to hard attention.
the hard attention concept is introduced with fine-grained The primary contribution of this paper is a new, generalized
features around the human bounding boxes or human pose factoring/approximation to the weight matrix in the second
keypoints, and such features are subsequently fed to CNNs for order pooling layer of a CNN, with the action recognition
application in a large-scale MPII still image dataset.
This work was supported in part to Dr. Houqiang Li by 973 Program under
contract No. 2015CB351803 and NSFC under contract No. 61390514, and in II. R ELATED W ORK
part to Dr. Wengang Zhou by NSFC under contract No. 61472378 and No.
61632019, the Fundamental Research Funds for the Central Universities, and Visual recognition has been widely studied in recent years,
Young Elite Scientists Sponsorship Program By CAST (2016QNRC001). with both still image datasets and video datasets [1], [3], [12]–
Fig. 1. Overview of the proposed GAP-CNN+Pose algorithm. Input images are fed into a ResNet-101 CNN (with the last pooling layer removed) to generate
the feature map/matrix X. Subsequently, two types of attention are imposed on the feature map, following [9]. The top branch denotes the top-down attention
(i.e., class-specific attention), which is constructed by multiplying the feature map/matrix X with a list of class-dependent vectors a1 , a2 , · · · aK . On the
bottom branch of the architecture, a series of T class-agnostic matrices U1 , · · · UT are multiplied after nonlinear transformations f (·), e.g., rectified linear unit
(ReLU), followed by a class-agnostic vector c to represent the bottom-up attention, i.e., the saliency-based attention. The additional human pose information
is incorporated via the pose heatmaps and `2 regression.

[18]. For large scale still image action recognition datasets map, and f is the number of channels. Conventional 1st order
such as MPII [10] and HICO [19], the performance of popular mean pooling and binary classification score computation can
baseline methods is unimpressive, e.g., about 30% mAP on be formulated as
MPII dataset. Thanks to the extremely large number of classes 1 T
scorebin
order1 (X) = 1 Xw, (1)
(393 and 600 classes for MPII and HICO, respectively.) as n
well as high diversity1 , it is highly challenging to achieve high with n1 X T 1 being the mean-pooled feature and w being a
recognition accuracy on such datasets. On the contrary, popular f × 1 scoring weights.
video based action recognition datasets like UCF101 [7] and Correspondingly, let matrix W ∈ Rf ×f denote the scoring
HMDB51 [8] are comparatively much smaller, with only 101 weight matrix after a second order pooling layer [11]. Follow-
and 51 categories, respectively. ing [9], the binary classification score is obtained by
In this paper we focus on action recognition with still
images. R*CNN [3] is a recent work in this field, in which R- scorebin T T
order2 (X) = T r(X XW ), (2)
CNN [9] is adapted to include one primary region and multiple n×f T
where X ∈ R and Σ := X X is traditionally termed
proposal regions. The proposal region with the highest score the sample covariance matrix2 . Substitution of Σ into Eq. (2)
is selected to cooperate with primary region to recognize the yields
action in an image. Assisted with bounding boxes of the X
subject (e.g., human), R*CNN achieves good result on the scorebin T
order2 (X) = T r(ΣW ) = Σi,j Wi,j , (3)
i,j
MPII dataset [10].
f ×f
The most related work is [9], in which a rank-1 approxima- where Σ, W ∈ R . From Eq. (3), matrix W can be inter-
tion of the weight matrix is proposed and attentional pooling preted as the element-wise weights of the sample covariance
is reformulated as low-rank second-order pooling. In [9], the matrix Σ.
attentional pooling is reformulated as a drop-in replacement Unlike the highly restrictive rank-1 approximation of W
for the popular mean pooling or max pooling near the end (W := abT ) in [9], we propose a gentler regularization by
of CNNs. In contrast with [9], the proposed GAP extends setting
the rank-1 approximation to a series of generalized non-linear W := af (V c)T , a ∈ Rf ×1 , c ∈ Rr×1 , V ∈ Rf ×r , (4)
factoring, and GAP can be incorporated after any layer in a
CNN. where f (·) is an element-wise nonlinear transform function
that keeps output dimensions as the input dimension, e.g.,
III. F ORMULATION rectified linear unit (ReLU). In addition, V can be further
The proposed GAP architecture is illustrated in Fig. 1. Let factorized into T matrices as
T
X ∈ Rn×f denote the reshaped output feature of a given Y
layer, where n is the total number of spatial elements in the V = f (Ut ) = f (U1 )f (U2 ) . . . f (UT ), (5)
t=1
feature map, i.e., the product of width and height of the feature
2 Sometimes sample mean values are subtracted before computing such
1 In addition, it could be ambiguous to determine an action class based on sample covariance matrix. A constant factor of 1/(n−1) can also be included
a still image without temporal cues, e.g., “sit down” versus “stand up”. in the definition of Σ.
where U1 ∈ Rf ×r1 , U2 ∈ Rr1 ×r2 , · · · , UT ∈ Rr(T −1) ×r .
By the introduction of the matrix factorizations and non-
linear functions in Eq. (4)–(5), more structural information
in the sample covariance matrix Σ could potentially be ex-
ploited. Practically, such factorization and nonlinearity are
implemented as convolutional and ReLU layers, respectively.
The optimal value of T is empirically decided to balance
performance and model complexity3 . Substitution of Eq. (4)–
(5) into Eq. (2) yields a reformulation as the attentional score,
scorebin T T

att (X) = T r X Xf (V c)a (6)
= (Xa)T (Xf (V c)) . (7)
Eq. (7) indicates that the score can be seen as the inner product
of two attentional heatmaps. Similarly, such derivations can be Fig. 2. Illustration of different mAP with respect to varying weights for the
extended to K-class (K ≥ 3) classifier. Let Wk be the class- regularization pose `2 loss based on the validation split of the MPII dataset.
X-axis is on inverted logarithmic scale while Y-axis is on linear scale.
specific weights for class k, k = 1, · · · , K. Eq. (2) can be
rewritten as,
scoreKclass T T
order2 (X, k) = T r(X XWk ), (8) Weight of Pose Regularization. Cross validation experiments
with Wk ∈ R f ×f
. Parallel to Eq. (6)–(7), Let Wk := 4 are conducted to empirically determine the optimal weight of
ak (V c)T , the class-specific attentional pooling and scoring the regularization `2 loss from pose keypoints. Without loss
is obtained as of generality, the weight of the cross-entropy loss is fixed at
constant value 1, and the weight of the pose regularization loss
scoreKclass
att (X, k) = (Xak )T X(f (V c)). (9) varies from 1 to 10−8 , as shown in Fig. 2. From Fig. 2, we
In Eq. (9), the former terms Xak represent the class-specific observe that the mAP is insensitive to the choice of weight
top-down attentional feature maps; while the latter terms value for the pose regularization loss. The highest mAP is
Xf (V c) denote the saliency-based, class-agnostic bottom-up achieved with such weight at approximately 10−6 , thus 10−6
attentional feature maps. As advocated in [20] and [9], the is chosen and fixed throughout the rest of the paper.
fusion of top-down and bottom-up attention maps is motivated Number of Bottom-up Matrices. In this part we show the
biologically, and it is beneficial to modulate saliency maps experiments designed to determine the optimal number of
with class-specific top-down information. bottom-up matrices, i.e., T in Eq. (5). Since convolution
From [9], human pose regularization can contribute to the operations in CNNs are implemented by matrix multiplication,
action recognition accuracy. Therefore, we incorporate human we take advantage of the existing convolution layers to imple-
body keypoints heatmaps and use it as the regularization ment matrix multiplication operations. We set r1 = 4096 and
term for the cross-entropy loss in Fig. 1. Specifically, two determine the remaining values by induction as ri+1 = ri /2,
additional convolutional layers are added after the last layer i = 1, · · · , T − 2. We use the convolutional layer Ci with
of the ResNet-101 CNN and a 16-channel regression layer the input ri−1 and output ri to represent Ui . ReLU layers are
to predict the pose keypoints. An l2 loss is used to calculate added between such convolution layers. Recognition accuracy
the cost between the predicted heatmaps and the ground truth and mAP are used as criteria in the choice of T based on the
heatmaps. validation split of the MPII dataset, as shown in Fig. 3. We
The overall loss is calculated by weighted sum of this l2 observe that both criteria reach plateau with T over 3. To keep
loss and a cross-entropy loss, making it possible to optimize the number of such convolutional layers as small as possible
the entire GAP-CNN+Pose network in an end-to-end manner. (for computational efficiency), T is fixed at 3 in the rest of
this paper.
IV. E XPERIMENTS
Attention Visualization. Figure 4 shows several typical exam-
Dataset. In this section, experiments are conducted on the ples of the GAP-CNN predicted attention heatmaps imposed
challenging large-scale action recognition datasets, i.e., the on input images. We observe that the most informative parts of
MPII still image dataset [10]. The MPII human pose dataset such input images are mostly highlighted in the corresponding
contains 15205 images in 393 action classes, grouped into a heatmaps.
train split, a validation split and a test split, with 8218, 6987 Comparison. Because the ground truth labels for the test
and 5708 images, respectively. The dataset is also annotated split of the MPII dataset is not publicly available, the vali-
with ground truth human body keypoints. We use the mean dation split is used for such evaluation. The comparison of
average precision (mAP) and classification accuracy as criteria the proposed GAP-CNN method with competing algorithms
to evaluate the performance of competing methods. (without pose information) are summarized in the top half of
3 More details are presented in Section IV. Table I. Our proposed GAP-CNN method achieves both the
4 Note that V and c are bottom-up parameters, thus are class-agnostic. highest mAP and the highest recognition accuracy. In addition,
pose-enhanced version of attentional pooling [9], supporting
our speculation that the proposed GAP model could be com-
plementary to hard attention.
V. C ONCLUSION
In this paper, the Generalized Attentional Pooling based
Convolutional Neural Network (GAP-CNN) algorithm is pro-
posed for action recognition in still images. Empirical exper-
iments are carried out to determine the practically optimal
number of bottom-up pooling matrices. In addition, extra
supervisions such as human pose keypoints are exploited.
With the practically optimal number of bottom-up attentional
pooling and a single top-down pooling, the proposed GAP-
CNN algorithm outperforms 4 competing algorithms, includ-
ing the original attentional pooling method [9]. Even after
Fig. 3. Illustration of recognition accuracies and mAP with different T values the incorporation of human pose keypoints information, the
based on the validation split of the MPII dataset. We observe that both mAP proposed GAP-CNN+Pose algorithm nevertheless achieves the
and accuracy reach plateau with T over 3. T = 3 is the choice to maximize
mAP and accuracy. state-of-the-art action recognition performance on the large-
scale MPII still image dataset.
R EFERENCES
[1] H. Wang, A. Kläser, C. Schmid, and C.-L. Liu, “Action recognition by
dense trajectories,” in CVPR, 2011.
[2] B. Yao and L. Fei-Fei, “Modeling mutual context of object and human
pose in human-object interaction activities,” in CVPR, 2010.
[3] G. Gkioxari, R. Girshick, and J. Malik, “Contextual action recognition
with r*cnn,” in ICCV, 2015.
[4] K. Simonyan and A. Zisserman, “Two-stream convolutional networks
for action recognition in videos,” in NIPS, 2014.
[5] C. Feichtenhofer, A. Pinz, and A. Zisserman, “Convolutional two-stream
network fusion for video action recognition,” in CVPR, 2016.
[6] L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool,
“Temporal segment networks: towards good practices for deep action
recognition,” in ECCV, 2016.
[7] K. Soomro, A. R. Zamir, and M. Shah, “Ucf101: A dataset of 101 human
actions classes from videos in the wild,” arXiv preprint arXiv:1212.0402,
2012.
[8] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “HMDB: a
large video database for human motion recognition,” in ICCV, 2011.
[9] R. Girdhar and D. Ramanan, “Attentional pooling for action recogni-
tion,” in NIPS, 2017.
[10] M. Andriluka, L. Pishchulin, P. Gehler, and S. Bernt, “2d human pose
estimation: New benchmark and state of the art analysis,” in CVPR,
2014.
[11] J. Carreira, R. Caseiro, J. Batista, and C. Sminchisescu, “Semantic
segmentation with second-order pooling,” in ECCV, 2012.
[12] H. Wang and C. Schmid, “Action recognition with improved trajecto-
Fig. 4. Examples of merged attentions on training images. All input images ries,” in ICCV, 2013.
are color images with RGB values, which are only shown in grayscale in [13] Q. Zhang and G. Hua, “Multi-view visual recognition of imperfect
Fig. 4 to facilitate the visualization of heatmaps. We can find that our method testing data,” in ACM MM, 2015.
can focus on the important parts in images. [14] Y. Wang, W. Zhou, Q. Zhang, X. Zhu, and H. Li, “Low-latency
human action recognition with weighted multi-region convolutional
neural network,” arXiv preprint arXiv:1805.02877, 2018.
TABLE I [15] X. Lv, L. Wang, Q. Zhang, Z. Niu, N. Zheng, and G. Hua, “Video object
P ERFORMANCE COMPARISON ON THE VALIDATION SET OF MPII. co-segmentation from noisy videos by a multi-level hypergraph model,”
in ICIP, 2018.
Method mAP Accuracy [16] J. Zang, L. Wang, Z. Liu, Q. Zhang, Z. Niu, G. Hua, and N. Zheng,
VGG16, R-CNN [3] 16.5% - “Attention-based temporal weighted convolutional neural network for
VGG16, R*CNN [3] 21.7% - action recognition,” in AIAI, 2018.
ResNet-101 [9] 26.2% - [17] J. Huang, W. Zhou, Q. Zhang, H. Li, and W. Li, “Video-based sign
Attn. Pool [9] 30.3% 35.3% language recognition without temporal segmentation,” in AAAI, 2018.
Proposed GAP-CNN 30.6% 36.0% [18] Q. Zhang, G. Hua, W. Liu, Z. Liu, and Z. Zhang, “Auxiliary training
Attn. Pool.+Pose [9] 30.6% 35.7% information assisted visual recognition,” IPSJ Trans. Comput. Vis. and
Proposed GAP-CNN+Pose 31.6% 36.9% Appl., vol. 7, pp. 138–150, 2015.
[19] Y.-W. Chao, Z. Wang, Y. He, J. Wang, and J. Deng, “Hico: A benchmark
for recognizing human-object interactions in images,” in ICCV, 2015.
[20] V. Navalpakkam and L. Itti, “An integrated model of top-down and
our proposed GAP-CNN+Pose algorithm also outperforms the bottom-up attention for optimizing detection speed,” in CVPR, 2006.

ICGTETM 2016 Proceedings PDF
No ratings yet
ICGTETM 2016 Proceedings PDF
690 pages
10.1007@s00371 020 01868 8
No ratings yet
10.1007@s00371 020 01868 8
15 pages
Gkioxari Contextual Action Recognition ICCV 2015 Paper
No ratings yet
Gkioxari Contextual Action Recognition ICCV 2015 Paper
9 pages
1 s2.0 S0031320316000169 Main
No ratings yet
1 s2.0 S0031320316000169 Main
14 pages
Action Classification and Highlighting in Videos
No ratings yet
Action Classification and Highlighting in Videos
12 pages
Learning Drone-Control Actions in Surveillance Videos: Sunyoung Cho, Dae Hoe Kim, Yong Woon Park
No ratings yet
Learning Drone-Control Actions in Surveillance Videos: Sunyoung Cho, Dae Hoe Kim, Yong Woon Park
4 pages
Ufc Sports Data
No ratings yet
Ufc Sports Data
10 pages
Action Recognition With Trajectory-Pooled Deep-Convolutional Descriptors
No ratings yet
Action Recognition With Trajectory-Pooled Deep-Convolutional Descriptors
10 pages
Wang - Human Action Recognition Algorithm Based On Multi-Feature Map Fusion - 2020
No ratings yet
Wang - Human Action Recognition Algorithm Based On Multi-Feature Map Fusion - 2020
9 pages
Continuous Human Action Recognition For Human Machine Interaction A Review
No ratings yet
Continuous Human Action Recognition For Human Machine Interaction A Review
31 pages
Action Recognition From Video Using
No ratings yet
Action Recognition From Video Using
16 pages
Xiao-Song2018 Article ActionRecognitionBasedOnHierar PDF
No ratings yet
Xiao-Song2018 Article ActionRecognitionBasedOnHierar PDF
14 pages
Improving CNN Performance With Min-Max Objective
No ratings yet
Improving CNN Performance With Min-Max Objective
7 pages
3D Convolutional Neural Networks For Human Action Recognition
No ratings yet
3D Convolutional Neural Networks For Human Action Recognition
11 pages
Raushan Pandey Review Paper of Deep Learning
No ratings yet
Raushan Pandey Review Paper of Deep Learning
3 pages
Nibali Extraction and Classification CVPR 2017 Paper
No ratings yet
Nibali Extraction and Classification CVPR 2017 Paper
11 pages
Chiang 2015
No ratings yet
Chiang 2015
14 pages
3D Convolutional Neural Networks For Human Action Recognition
No ratings yet
3D Convolutional Neural Networks For Human Action Recognition
11 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Feichtenhofer Convolutional Two-Stream Network CVPR 2016 Paper
No ratings yet
Feichtenhofer Convolutional Two-Stream Network CVPR 2016 Paper
9 pages
Skeletonbased Human ActionInteraction Classification in Sparse Image Sequences
No ratings yet
Skeletonbased Human ActionInteraction Classification in Sparse Image Sequences
14 pages
SIDDHARTH SHARMA - FinalProjectReport - Siddharth
No ratings yet
SIDDHARTH SHARMA - FinalProjectReport - Siddharth
27 pages
SLFLSDFKSFLDKJ
No ratings yet
SLFLSDFKSFLDKJ
3 pages
CNN Unconstrained Video Classification
No ratings yet
CNN Unconstrained Video Classification
9 pages
Action Recog
No ratings yet
Action Recog
11 pages
P-CNN: Pose-Based CNN Features For Action Recognition: Guilhem CH Eron Ivan Laptev Cordelia Schmid Inria
No ratings yet
P-CNN: Pose-Based CNN Features For Action Recognition: Guilhem CH Eron Ivan Laptev Cordelia Schmid Inria
9 pages
Two-Stream Convolutional Networks For Action Recognition in Videos
No ratings yet
Two-Stream Convolutional Networks For Action Recognition in Videos
9 pages
General Activity Detection
No ratings yet
General Activity Detection
10 pages
CNN-based and DTW Features For Human Activity Recognition On Depth Maps
No ratings yet
CNN-based and DTW Features For Human Activity Recognition On Depth Maps
14 pages
Activity Recognition Based On Spatio-Temporal Features With Transfer Learning
No ratings yet
Activity Recognition Based On Spatio-Temporal Features With Transfer Learning
9 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Action Recognition Using High-Level Action Units
No ratings yet
Action Recognition Using High-Level Action Units
5 pages
Action Recognition 2
No ratings yet
Action Recognition 2
6 pages
Human Action Recognition Using CNN and LSTM-RNN With Attention Model
No ratings yet
Human Action Recognition Using CNN and LSTM-RNN With Attention Model
5 pages
Human Action Recognition System For Elderly and Children Care Using Three Stream ConvNet
No ratings yet
Human Action Recognition System For Elderly and Children Care Using Three Stream ConvNet
5 pages
Convolution Neural Network For Human Activity
No ratings yet
Convolution Neural Network For Human Activity
5 pages
Cheron P-CNN Pose-Based CNN ICCV 2015 Paper
No ratings yet
Cheron P-CNN Pose-Based CNN ICCV 2015 Paper
9 pages
Taylor Eccv 10
No ratings yet
Taylor Eccv 10
14 pages
Zhou Learning Deep Features CVPR 2016 Paper PDF
No ratings yet
Zhou Learning Deep Features CVPR 2016 Paper PDF
9 pages
Human Action Recognition by Learning Bases of Action Attributes and Parts
No ratings yet
Human Action Recognition by Learning Bases of Action Attributes and Parts
8 pages
Gkioxari Detecting and Recognizing CVPR 2018 Paper
No ratings yet
Gkioxari Detecting and Recognizing CVPR 2018 Paper
9 pages
566600917
No ratings yet
566600917
43 pages
A Comprehensive Study of Deep Video Action Recognition
No ratings yet
A Comprehensive Study of Deep Video Action Recognition
30 pages
Action Classification Based On 2D Coordinates Obtained by Real-Time Pose Estimation
No ratings yet
Action Classification Based On 2D Coordinates Obtained by Real-Time Pose Estimation
7 pages
Human Activity Recognition With Sensor Approach
No ratings yet
Human Activity Recognition With Sensor Approach
179 pages
Final Selected Report
No ratings yet
Final Selected Report
4 pages
2013 9 9 3659 3666
No ratings yet
2013 9 9 3659 3666
8 pages
A New Method For Violence Detection in Surveillance Scenes
No ratings yet
A New Method For Violence Detection in Surveillance Scenes
23 pages
Video Tutorial CVPR19
No ratings yet
Video Tutorial CVPR19
40 pages
ICME2010
No ratings yet
ICME2010
7 pages
Informatics 09 00056
No ratings yet
Informatics 09 00056
13 pages
Detecting Violence in Video Based On Dee
No ratings yet
Detecting Violence in Video Based On Dee
15 pages
Yang STEP Spatio-Temporal Progressive Learning For Video Action Detection CVPR 2019 Paper
No ratings yet
Yang STEP Spatio-Temporal Progressive Learning For Video Action Detection CVPR 2019 Paper
9 pages
APCS Thesis-Proposal
No ratings yet
APCS Thesis-Proposal
18 pages
A Critical Review of Action Recognition Benchmarks: Tal Hassner The Open University of Israel
No ratings yet
A Critical Review of Action Recognition Benchmarks: Tal Hassner The Open University of Israel
6 pages
10224/submission 10224
No ratings yet
10224/submission 10224
10 pages
3.action Recognition
No ratings yet
3.action Recognition
10 pages
Video Survivallence
No ratings yet
Video Survivallence
3 pages
HAR Documentation
No ratings yet
HAR Documentation
15 pages
Abu Farha MS-TCN Multi-Stage Temporal Convolutional Network For Action Segmentation CVPR 2019 Paper
No ratings yet
Abu Farha MS-TCN Multi-Stage Temporal Convolutional Network For Action Segmentation CVPR 2019 Paper
10 pages
Fast and Robust Dynamic Hand Gesture Recognition Via Key Frames Extraction and Feature Fusion
No ratings yet
Fast and Robust Dynamic Hand Gesture Recognition Via Key Frames Extraction and Feature Fusion
11 pages
Preliminary Study On Real-Time Pore Water Pressure Response and Reinforcement Mechanism of Air-Booster Vacuum Preloading Treated Dredged Slurry
No ratings yet
Preliminary Study On Real-Time Pore Water Pressure Response and Reinforcement Mechanism of Air-Booster Vacuum Preloading Treated Dredged Slurry
9 pages
No Peneliti Penerbit Jurnal Judul Objek Hasil
No ratings yet
No Peneliti Penerbit Jurnal Judul Objek Hasil
3 pages
IIR Filter-Sofyan Ahmadi
No ratings yet
IIR Filter-Sofyan Ahmadi
9 pages
Dell - 14R-5437 - Wistron Oak 14 DOE40-HW X02 PDF
No ratings yet
Dell - 14R-5437 - Wistron Oak 14 DOE40-HW X02 PDF
104 pages
Transmissio Tower Design
No ratings yet
Transmissio Tower Design
0 pages
3d Printing Handout
No ratings yet
3d Printing Handout
3 pages
Apollo 11 Apollo 12: Space Missions
No ratings yet
Apollo 11 Apollo 12: Space Missions
6 pages
Receitas
No ratings yet
Receitas
18 pages
ĐỀ CƯƠNG SỐ 2 giũa kỳ 2 anh 8 - KEY
No ratings yet
ĐỀ CƯƠNG SỐ 2 giũa kỳ 2 anh 8 - KEY
10 pages
"An Investment in Knowledge Pays The Best Interest". Benjamin Franklin
No ratings yet
"An Investment in Knowledge Pays The Best Interest". Benjamin Franklin
8 pages
Global Advertising Issues and Challenges: Hany H. Makhlouf
No ratings yet
Global Advertising Issues and Challenges: Hany H. Makhlouf
6 pages
Work From Home
100% (2)
Work From Home
8 pages
Effectiveness of Internal Control System On Profitability of Smes in Biñan City
No ratings yet
Effectiveness of Internal Control System On Profitability of Smes in Biñan City
8 pages
C Program Examples
100% (2)
C Program Examples
6 pages
T-Light DCU2 Pro
100% (1)
T-Light DCU2 Pro
2 pages
A Pawn in Someone Else's Game?: The Cognitive, Motivational, and Paradigmatic Barriers To Excelling in Negotiations
No ratings yet
A Pawn in Someone Else's Game?: The Cognitive, Motivational, and Paradigmatic Barriers To Excelling in Negotiations
5 pages
This Guide Is Supposed To Get You Started With TibEd
No ratings yet
This Guide Is Supposed To Get You Started With TibEd
6 pages
Dem Theory
No ratings yet
Dem Theory
8 pages
Practice Questions Lecture 23-45
100% (1)
Practice Questions Lecture 23-45
49 pages
Solid Waste Management - Mismanagement From Houseboats of Dal Lake: Assessing Strategies For Effective Waste Reduction and Resource Recovery
No ratings yet
Solid Waste Management - Mismanagement From Houseboats of Dal Lake: Assessing Strategies For Effective Waste Reduction and Resource Recovery
7 pages
Sim-Ace Users Guide Ep-Dcx346
No ratings yet
Sim-Ace Users Guide Ep-Dcx346
56 pages
Aesthetic Perception
No ratings yet
Aesthetic Perception
6 pages
226912228
100% (2)
226912228
3 pages
Creating Interactive APEX Reports Over OLAP Cubes
100% (1)
Creating Interactive APEX Reports Over OLAP Cubes
35 pages
Addition Fact Strategy Doubles Free Sample
100% (1)
Addition Fact Strategy Doubles Free Sample
13 pages
IRJET Modification and Analysis of Exist
No ratings yet
IRJET Modification and Analysis of Exist
5 pages
Boening Display Info
No ratings yet
Boening Display Info
4 pages
Chapter 4 Systems Design: Process Costing: Multiple Choice Questions
100% (1)
Chapter 4 Systems Design: Process Costing: Multiple Choice Questions
3 pages
Review of Livestock Feed Formulation Techniques
No ratings yet
Review of Livestock Feed Formulation Techniques
9 pages
Case Analysis 3. MR Raos Confusion
No ratings yet
Case Analysis 3. MR Raos Confusion
10 pages
Grades 1 To 12 Daily Lesson Log School Grade Level 11 Teacher Learning Area HOPE 1 Teaching Dates and Time Week 4 Quarter FIRST
No ratings yet
Grades 1 To 12 Daily Lesson Log School Grade Level 11 Teacher Learning Area HOPE 1 Teaching Dates and Time Week 4 Quarter FIRST
3 pages
Geo 3 Rock Slope Stability
No ratings yet
Geo 3 Rock Slope Stability
30 pages

Convolutional Neural Networks With Gener PDF

Uploaded by

Convolutional Neural Networks With Gener PDF

Uploaded by

Convolutional Neural Networks with Generalized

Attentional Pooling for Action Recognition

Email: [email protected], † [email protected], ‡ [email protected], § [email protected]

You might also like