Gaitpt: Skeletons Are All You Need For Gait Recognition
Gaitpt: Skeletons Are All You Need For Gait Recognition
Gaitpt: Skeletons Are All You Need For Gait Recognition
Table 1. GaitPT comparison to other skeleton-based architectures on CASIA-B. We report the average recognition accuracy for individual
probe angles excluding identical-view cases. For our model, we show the mean and standard deviation computed across 3 runs. GaitPT
obtains an average improvement of over 6% mean accuracy compared to the previous state-of-the-art.
of-the-art, GaitGraph [47], which utilizes graph convolu- vant walking sequences of the dataset, we utilize a sequence
tions for spatio-temporal feature extraction. GaitFormer length of 30 for both training and testing, which is in line
[10] lags behind other graph-based methods, proving that with other works in gait recognition [62]. As the GREW
spatial attention is a crucial component in gait recognition. authors do not release the labels for the testing set, we re-
Our results demonstrate the effectiveness of a hierarchical port the results obtained on the public leaderboard1 for the
approach to motion understanding in the context of gait GREW Competition.
recognition in controlled scenarios. Table 2 displays the comparison between GaitPT
and other methods, including both skeleton-based and
4.2. Evaluation In the Wild appearance-based approaches, on the GREW test set in
As the GaitPT architecture achieves adequate recogni- terms of Rank-1, Rank-5, Rank-10, and Rank-20 accuracy.
tion performance in controlled settings, we study its capa- Rank-K accuracy computes the percentage of samples for
bilities in more difficult real-world scenarios. which the correct label is among the top K predictions made
by the model. The results of the other models are taken from
Method R-1 Acc. (%) R-5 Acc. (%) R-10 Acc. (%) R-20 Acc. (%) the GREW [63] paper. GaitPT outperforms skeleton-based
GEINet [40] 6.82 13.42 16.97 21.01
TS-CNN [53] 13.55 24.55 30.15 37.01
approaches such as PoseGait [24] and GaitGraph [47] by
GaitSet [7] 46.28 63.58 70.26 76.82 over 50% in Rank-1 Accuracy. Moreover, it manages to out-
GaitPart [14] 44.01 60.68 67.25 73.47
PoseGait [24] 0.23 1.05 2.23 4.28
perform appearance-based state-of-the-art methods such as
GaitGraph [47] 1.31 3.46 5.08 7.51 GaitSet [7] and GaitPart [14] by approximately 6% and 8%
GaitPT (Ours) 52.16 ± 0.5 68.44 ± 0.6 74.07 ± 0.5 78.33 ± 0.4
in terms of Rank-1 Accuracy. These results demonstrate the
capabilities of GaitPT to generalize in unconstrained set-
Table 2. Comparison between GaitPT and other methods on the tings and the fact that skeleton-based data can be advanta-
GREW benchmark in terms of Rank-1, Rank-5, Rank-10, and geous for in-the-wild scenarios.
Rank-20 Recognition Accuracy. For our model, we show the mean Gait3D [62] is a large-scale dataset obtained in uncon-
and standard deviation computed across 3 runs. GaitPT manages strained settings consisting of 4000 unique subjects and
to outperform by a significant margin both skeleton-based and over 25,000 walking sequences. The dataset includes 3D
appearance-based state-of-the-art methods for in-the-wild scenar-
meshes, 3D skeletons, 2D skeletons, and silhouette images
ios. Table adapted from [63].
obtained from all recorded sequences.
We train our architecture using the provided 2D skele-
GREW [63] is one of the largest benchmarks for gait tons obtained through HRNet [44], following the same eval-
recognition in the wild, containing 26,000 unique identities uation protocol as Zheng et al. [62] in which 3000 subjects
and over 128,000 gait sequences. The videos for this dataset are placed in the training set and the remaining 1000 in the
were obtained from 882 cameras in public spaces and the gallery-probe sets. Similarly to the evaluation on CASIA-
labelling was done by 20 annotators for over 3 months. B, we normalize the 2D skeletons based on the dimensions
GREW releases silhouettes, optical flow, 3D skeletons, and of the image. In line with the methodology employed by
2D skeletons for the recorded walking sequences. the authors of Gait3D, we utilize a sequence length of 30
We train the GaitPT architecture on the provided 2D
skeletons which are normalized based on the dimensions 1 https://codalab.lisn.upsaclay.fr/competitions/