Video Based Action Recognition RM
Video Based Action Recognition RM
Balgynbek Dikhan
School of Engineering and Digital Sciences
Data Science
[email protected]
1
Computer Vision and IEEE Transactions on Pattern Anal-
ysis and Machine Intelligence, which are esteemed for
their rigorous peer-reviewed articles that span the depth
and breadth of computer vision and machine learning tech-
niques. These journals have been pivotal in disseminat-
ing cutting-edge research, including foundational theories
and innovative methodologies that underpin current action
recognition technologies.
Conference proceedings, notably from the IEEE Confer-
ence on Computer Vision and Pattern Recognition (CVPR)
and the International Conference on Computer Vision
(ICCV), serve as critical platforms for the exchange of ideas Figure 1. Enter Caption
and the unveiling of breakthrough technologies in real-time.
These conferences bring together the global community of
researchers, practitioners, and thought leaders, facilitating in its myriad forms.
a vibrant discourse on the latest developments, practical
challenges, and future directions in action recognition. Pa- References
pers presented at these venues often set new benchmarks, [1] A. Alpher. Frobnication. Journal of Foo, 12(1):234–778,
propose novel architectures such as convolutional neural 2002.
networks for video analysis, and introduce comprehensive [2] A. Alpher and J. P. N. Fotheringham-Smythe. Frobnication
datasets that become standard resources for the research revisited. Journal of Foo, 13(1):234–778, 2003.
community. [3] A. Alpher, J. P. N. Fotheringham-Smythe, and G. Gamow.
The contributions within these sources are not only tech- Can a machine frobnicate? Journal of Foo, 14(1):234–778,
2004.
nical in nature but also encompass a broad range of appli-
[4] Authors. The frobnicatable foo filter, 2014. Face and
cation areas, from surveillance and security to healthcare
Gesture submission ID 324. Supplied as additional material
and entertainment. This breadth underscores the interdisci- fg324.pdf.
plinary appeal of action recognition, drawing insights from [5] Authors. Frobnication tutorial, 2014. Supplied as additional
computer science, psychology, and neuroscience to enhance material tr.pdf.
algorithmic interpretations of human motion. The evolution [6] J. Donahue, L. Anne Hendricks, S. Guadarrama,
of research in this field, as chronicled by these main sources, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Dar-
reflects a trajectory of increasing sophistication and appli- rell. Long-term recurrent convolutional networks for visual
cability, highlighting both the challenges overcome and the recognition and description. In Proceedings of the IEEE
vast potential for future explorations. conference on computer vision and pattern recognition,
Moreover, specialized workshops and symposia that ac- pages 2625–2634, 2015.
[7] C. Feichtenhofer, A. Pinz, and R. P. Wildes. Spatiotemporal
company major conferences, as well as focused issues in
multiplier networks for video action recognition. In Proceed-
journals, underline the dynamic sub-domains within ac-
ings of the IEEE Conference on Computer Vision and Pattern
tion recognition, such as the analysis of complex actions in Recognition (CVPR), pages 4724–4733, 2017.
sports videos, the role of sensor fusion in enhancing recog- [8] H. Gammulle, S. Denman, S. Sridharan, and C. Fookes. Two
nition accuracy, and the exploration of deep learning models stream lstm: A deep fusion framework for human action
for spatial-temporal feature extraction. These venues offer recognition. In 2017 IEEE winter conference on applications
nuanced insights into specific challenges and innovations, of computer vision (WACV), pages 177–186. IEEE, 2017.
enriching the broader discourse with targeted investigations [9] C. Gao, Y. Du, J. Liu, J. Lv, L. Yang, D. Meng, and A. G.
and specialized knowledge. Hauptmann. Infar dataset: Infrared action recognition at dif-
In summary, the overview of main sources in the hu- ferent times. Neurocomputing, 212:36–47, 2016.
man action recognition domain reveals a vibrant and evolv- [10] R. Ghosh, A. Gupta, A. Nakagawa, A. Soares, and
N. Thakor. Spatiotemporal filtering for event-based action
ing field. The depth and diversity of these sources reflect
recognition. arXiv preprint arXiv:1903.07067, 2019.
the ongoing dialogue within the research community, the
[11] Z. Jiang, V. Rozgic, and S. Adali. Learning spatiotemporal
continuous refinement of methodologies, and the persis- features for infrared action recognition with 3d convolutional
tent quest to bridge the gap between human cognitive ca- neural networks. In Proceedings of the IEEE conference on
pabilities and computational models. Through this exten- computer vision and pattern recognition workshops, pages
sive body of work, the field progresses towards more nu- 115–123, 2017.
anced, accurate, and versatile action recognition systems, [12] R. Kavi, V. Kulathumani, F. Rohit, and V. Kecojevic. Mul-
capable of understanding and interpreting human behavior tiview fusion for activity recognition using deep neural
networks. Journal of Electronic Imaging, 25(4):043010–
043010, 2016.
[13] Y. Kim and T. Moon. Human detection and activity classifi-
cation based on micro-doppler signatures using deep convo-
lutional neural networks. IEEE geoscience and remote sens-
ing letters, 13(1):8–12, 2015.
[14] I. Laptev. On space-time interest points. International jour-
nal of computer vision, 64:107–123, 2005.
[15] W. Lin, M.-T. Sun, R. Poovandran, and Z. Zhang. Human
activity recognition for video surveillance. 2008 IEEE Inter-
national Symposium on Circuits and Systems (ISCAS), pages
2737–2740, 2008.
[16] J. Liu, A. Shahroudy, D. Xu, A. C. Kot, and G. Wang.
Skeleton-based action recognition using spatio-temporal
lstm network with trust gates. IEEE transactions on pat-
tern analysis and machine intelligence, 40(12):3007–3021,
2017.
[17] R. Poppe. A survey on vision-based human action recogni-
tion. Image and Vision Computing, 28(6):976–990, 2010.
[18] H. Rahmani and A. Mian. 3d action recognition from novel
viewpoints. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 1506–1515,
2016.
[19] I. Rodomagoulakis, N. Kardaris, V. Pitsikalis, E. Mavroudi,
A. Katsamanis, A. Tsiami, and P. Maragos. Multimodal hu-
man action recognition in assistive human-robot interaction.
2016 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), pages 2702–2706, 2016.
[20] M. D. Rodriguez, J. Ahmed, and M. Shah. Action mach
a spatio-temporal maximum average correlation height filter
for action recognition. In 2008 IEEE conference on computer
vision and pattern recognition, pages 1–8. IEEE, 2008.
[21] K. Simonyan and A. Zisserman. Two-stream convolutional
networks for action recognition in videos. Advances in neu-
ral information processing systems, 27, 2014.
[22] K. Soomro and A. R. Zamir. Action recognition in realistic
sports videos. In Computer vision in sports, pages 181–208.
Springer, 2015.
[23] Y. Wang, Y. Xiao, F. Xiong, W. Jiang, Z. Cao, J. T. Zhou,
and J. Yuan. 3dv: 3d dynamic voxel for action recognition
in depth video. In Proceedings of the IEEE/CVF conference
on computer vision and pattern recognition, pages 511–520,
2020.
[24] R. Yang and R. Yang. Dmm-pyramid based deep architec-
tures for action recognition with depth cameras. In Asian
Conference on Computer Vision, pages 37–49. Springer,
2014.
[25] M. Zeng, L. T. Nguyen, B. Yu, O. J. Mengshoel, J. Zhu,
P. Wu, and J. Zhang. Convolutional neural networks for hu-
man activity recognition using mobile sensors. In 6th inter-
national conference on mobile computing, applications and
services, pages 197–205. IEEE, 2014.