0% found this document useful (0 votes)
46 views8 pages

Tennis Strokes Recognition From Generated Stick Figure Video Overlays

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views8 pages

Tennis Strokes Recognition From Generated Stick Figure Video Overlays

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Tennis Strokes Recognition from Generated Stick Figure Video

Overlays

Boris Bačić1 a
and Ishara Bandara2 b
1Auckland University of Technology, Auckland, New Zealand
2Robert Gordon University, Aberdeen, U.K.

Keywords: Computer Vision, Deep Learning, Spatiotemporal Data Classification, Human Motion Modelling and
Analysis (HMMA), Sport Science, Augmented Broadcasting.

Abstract: In this paper, we contribute to the existing body of knowledge of video indexing technology by presenting a
novel approach for recognition of tennis strokes from consumer-grade video cameras. To classify four
categories with three strokes of interest (forehand, backhand, serve, no-stroke), we extract features as a time
series from stick figure overlays generated using OpenPose library. To process spatiotemporal feature space,
we experimented with three variations of LSTM-based classifier models. From a selection of publicly
available videos, trained models achieved an average accuracy of between 97%–100%. To demonstrate
transferability of our approach, future work will include other individual and team sports, while maintaining
focus on feature extraction techniques with minimal reliance on domain expertise.

1 INTRODUCTION software (e.g. LongoMatch, Coach Logic, Nacsport,


Metrica Sports, and Sports Code).
Automated video indexing of recognised motion LongoMatch, as one of the earliest open source
patterns and human motion activity have broad coaching software, was originally designed for video
application. For example, in the last decade, we have replay analysis of team sports with customisable and
seen enhancements in augmented video broadcasting manual event indexing (during live video capture or
with real-time game statistics. Overlaid statistics can in post-match analysis) with rudimentary overlay
help commentators to share strategic information capabilities (Bačić & Hume, 2012). Today, it is a
which often only competitive-level athletes, coaches high-end annual-subscription licensed commercial
and domain experts would intuitively consider. product, reflecting the opportunities in this area.
Quantifying motion events has also become pervasive Thanks to recent advancements in computer
in other contexts such as augmented video coaching, vision, deep learning, recurrent neural networks and
surveillance, elderly care, activity monitoring and human pose estimation, the underlying development
various mobile and smartwatch apps development. of automated indexing of human motion events and
sport-specific movement patterns has become less
1.1 Vision and Motivation labour intensive and less dependent on expertise-
driven feature extraction techniques than in the past.
In the authors’ view, identifying task-specific motion To quantify aspects of the game relying on
events will enable further advancements in areas such Computer Vision and Artificial Intelligence (AI), the
as rehabilitation, smart cities, and to improve privacy, research questions associated with our work are:
safety and usability of spaces where human activity 1. Can we develop recognition of sport-specific
occurs. To illustrate the demand for quantifying movement patterns such as tennis strokes?
events as part of strategic video analysis, an online 2. Can we reduce dependence on expert-driven
search will return numerous examples of commercial feature engineering and produce simplified

a
https://orcid.org/0000-0003-0305-4322
b
https://orcid.org/0000-0002-7346-248X

397
Bačić, B. and Bandara, I.
Tennis Strokes Recognition from Generated Stick Figure Video Overlays.
DOI: 10.5220/0010827300003124
In Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022) - Volume 5: VISAPP, pages
397-404
ISBN: 978-989-758-555-5; ISSN: 2184-4321
Copyright c 2022 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
VISAPP 2022 - 17th International Conference on Computer Vision Theory and Applications

feature extraction techniques relying on (3) further sub-event processing i.e., phasing analysis
common-sense visual observation? via produced ensemble of ESN (Bačić, 2016a).
3. If so, can we develop a multi-stage video Prior work on video analysis applied Histograms
processing and modelling framework that is of Oriented Gradients (HOG), Local Binary Pattern
transferable to other sports? (LBP) and Scale Invariant Local Ternary Pattern
(SILTP) for human activity recognition (HAR) in
1.2 Background and Prior Work surveillance (Lu, Shen, Yan, & Bačić, 2018). A pilot
case study on cricket batting balance (Bandara &
Advancements in motion pattern indexing can not Bačić, 2020) used recurrent neural networks (RNN)
only be evaluated by improving classification and pose estimation to generate classification of
performance for a specific task, low-cost real-time batting balance (from rear or front foot). This prior
computing and extending the number of labelled work on privacy-preservation filtering is aligned with
events of interest, but also on their universal privacy-preserving elderly care monitoring systems
applicability to various sources such as 3D motion and with extracting diagnostic information for
data (Bačić & Hume, 2018), video (Bloom & silhouette-based augmented coaching (Bačić, Meng,
Bradley, 2003; D. Connaghan, Conaire, Kelly, & & Chan, 2017; Chan & Bačić, 2018). It is also
Connor, 2010; Martin, Benois-Pineau, Peteri, & generally applicable to usability and safety of spaces
Morlier, 2018; Ramasinghe, Chathuramali, & where human activity occurs such as smart cities,
Rodrigo, 2014; Shah, Chockalingam, Paluri, Pradeep, future environments and traffic safety (Bačić, Rathee,
& Raman, 2007), and sensor signal processing & Pears, 2021).
(Anand, Sharma, Srivastava, Kaligounder, &
Prakash, 2017; Damien Connaghan et al., 2011; Kos,
Ženko, Vlaj, & Kramberger, 2016; Taghavi, Davari, 2 METHODOLOGY
Tabatabaee Malazi, & Abin, 2019; Xia et al., 2020).
To our knowledge, tennis shots or strokes action Considering past research, our objective is to produce
recognition relying on computer vision started in a relatively simple and generalised initial solution and
2001, by combining computer vision and hidden a human motion modelling (HMMA) framework for
Markov model (HMM) approaches, before HD TV- video indexing applicable to tennis. The tennis
broadcast resolution became available (Petkovic, dataset was created from both amateur and
Jonker, & Zivkovic, 2001). After Sepp Hochreiter professional players’ videos. It is also expected that
and Jürgen Schmidhuber invented Long Short Term the produced framework may be easily transferable to
Memory (LSTM) in 1997, LSTMs have been used in other sport disciplines and related contexts such as
action recognition (Cai & Tang, 2018; Liu, rehabilitation and improving safety and usability of
Shahroudy, Xu, Kot, & Wang, 2018; Zhao, Yang, spaces where human movement occurs. As part of
Chevalier, Xu, & Zhang, 2018). In 2017, inertial movement pattern analysis, we focused on expressing
sensors with Convolutional Neural Networks (CNN) features as spatiotemporal human movement patterns
and bi-directional LSTM networks were used to from faster moving segments (e.g., dominant hand
recognise actions in multiple sports (Anand et al., holding a racquet) relative to the more static trunk
2017). In 2018, an LSTM with Inception v3 was used segment.
to recognise actions in tennis videos achieving 74%
classification accuracy (Cai & Tang, 2018).
2.1 Stick Figure Overlays as Initial
For prototyping explainable AI in next-generation
augmented coaching software, which is expected to Data Preprocessing
capture expert’s assessment and continue to provide
comprehensive coaching diagnostic feedback (Bačić To retrieve player’s motion-based data from video,
& Hume, 2018), we can rely on multiple data sources we generated stick figure overlays using OpenPose
including those operating beyond human vision. (https://github.com/CMU-Perceptual-Computing-
Prior work on 3D motion data is categorised as: Lab/openpose) and 25 key point estimator
(1) traditional feature-based swing indexing based on COCO+Foot (Figure 1 and Figure 2).
sliding window and thresholding (Bacic, 2004) and Figure 2 shows an example of data format
representing the key points coordinate of a player in
expert-driven algorithmic approach in tennis shots
and stance classification (Bačić, 2016c); (2) each video frame recorded as multi time series data.
featureless approach for accurate swing indexing As video overlays, animated stick figure topology of
generated key points (Figure 3) represents a way of
using Echo State Network (ESN) (Bačić, 2016b) and
extracting information from video to facilitate human

398
Tennis Strokes Recognition from Generated Stick Figure Video Overlays

motion modelling and analysis (HMMA) and assist in


the feature extraction process.

Figure 1: Stick figure overlay topology: 25 key points


COCO+Foot model. Reproduced from (Bandara & Bačić,
2020), copyright permission (IEEE No. 5170730636786).
Figure 3: A mosaic of stick figure overlays generated
during the serve, forehand and backhand strokes as
2.2 Data Collection and Analysis intermediate computer vision processing steps.

Data collection for experimental work was carried on


Google cloud platform using publicly available
videos of tennis practice matches, also available via
YouTube (12kgp-Tennis, 2019; Back of the line
tennis, 2018; Page, 2020; Tenfitmen Tennis Impulse,
2020; Tennis Legend TV, 2019; Top Tennis Training
- Pro Tennis Lessons, 2014; TV Tennis Pro, 2020).

Figure 2: Example of 25 key points extracted from stick


figure overlay in a single frame in JSON format. Each key
point consists of three variables (x,y,confidence).

The dataset included a balanced distribution of 150


extracted tennis strokes (30 forehands, 30 backhands
and 30 serves, 60 no stroke play), which were labelled
manually into their corresponding output classes. All Figure 4: Human motion data processing, analysis and
modelling framework for video indexing. Adapted from the
videos were of the same framerate (30fps) and of
initial concept (Bandara & Bačić, 2020).
duration between 0.8 – 1.0 seconds (28-31 frames).
In Table 1, a function 𝑑 P , P represents Euclidean
2.3 Feature Extraction Technique
distance measure (1) between the key points P and
Our approach to Feature Extraction Technique (FET) P of overlaid stick figure calculated for each video
is based on visual analysis between faster and slower frame.
moving body segments (Figure 4 and Table 1).

399
VISAPP 2022 - 17th International Conference on Computer Vision Theory and Applications

patterns as the distance values variations during a


𝑑 P,P 𝑥 𝑥 𝑦 𝑦 (1) serve, forehand, a backhand stroke and a no stroke
play.
Pixel coordinates of a stick figure’s key point Pn From Figure 6, it can be clearly observed that the
are denoted as: 𝑥 , 𝑦 ∈ P ; 1 𝑛 25. feature values sub plot of the backhand is similar to
the forehand. However, the directions of the forehand
2.4 Feature Space: Visualisation values and backhand values are different so the
Remarks approach is also robust to e.g., inside-out forehands
(executed from the player’s backhand side of the
Based on visual movement pattern analysis, we court). Plots of features like hand-to-hand distance
identified nine distance-based features that are almost identical in both forehand and backhand
collectively produce the best results (Table 1). As stroke plots. The serve is always a one-handed stroke
such, the 25 key-points time series data representing and the dominant hand reaches above the head during
the human body were further reduced to 36% a serve. Therefore, in the feature value variation plot
representing the feature space. of the serve (Figure 6), the dominant hand to
dominant leg pixel distance value increases before
contact with the ball and decreases after contact with
the ball.
Regarding multi-class classification problem
investigated here, there are four output classes
representing three common stroke categories, and no
stroke players’ activity (e.g., walking or running).
Hence, there is no visible distinctive pattern that we
Figure 5: Data snippet showing produced feature space and could immediately associate with no stroke output
output class (in last column) based on calculated distances. class. Another example of no stroke activity is during
the game breaks, where players can be taking
Figure 5 shows feature space as the distance dataset courtside rest.
and Figure 6 indicates characteristic spatiotemporal

Table 1: Feature space description and design rationale.

Distance Definition Distance Design Rationale


Measure

1. Dominant hand to chest To improve separation of serves from forehand and


d(P4,P1)
backhand
2. Non-dominant hand to chest d(P7,P1) To improve separation of forehand and backhand strokes

3. Dominant hand to dominant To improve separation of serves and strokes starting from
d(P4,P11)
side foot dominant hand side
4. Non-dominant hand to To identify strokes starting from the non-dominant side
d(P7,P14)
non-dominant hand side foot

5. Hand to hand d(P4,P7) To identify one-handed strokes and serves

6. Dominant hand to non- To identify the circular motion around the hip in ground
dominant side hip d(P4,P12) strokes and to identify strokes starting from the dominant
side

7. Non-Dominant hand to To identify strokes starting from the non-dominant side or


dominant side hip d(P7,P9) circular motion around the hip in ground strokes (forehand
and backhand)

8. Body to dominant hand To identify strokes starting from the dominant side over
P4(x) - P8(x)
x-axis distance body’s vertical (symmetrical) axis

9. Body to non-dominant hand To identify strokes starting from the non-dominant side over
P7(x) - P8(x)
x-axis distance body’s vertical axis.

400
Tennis Strokes Recognition from Generated Stick Figure Video Overlays

Table 2: Model and parameter summaries.


Classifier Layer Output Shape Parameters
LSTM (None,100) 44.000
Dropout (None,100) 0
LSTM
Dense (None,100) 10.100
Dense (None,4) 404
Total
54.504
params.
Bidirectional
LSTM (None,27,200) 88.000
Bidirectional
Bidirectional LSTM (None,200) 240.800
LSTM Dropout (None,200) 0
Dense (None,100) 20.100
Dropout (None,100) 0
Dense (None,4) 404
Total
349.304
params.
Time Distributed
Conv 1D (None,3,23,64) 1.024
Time Distributed
Conv 1D (None,3,19,64) 20.544
Time Distributed
Dropout (None,3,19,64) 0
CNN +
Time Distributed
LSTM
Max Pooling 1D (None,3,9,64) 0
Time Distributed
Flatten (None,3,576) 0
LSTM (None,200) 621.600
Dropout (None,200 0
Dense (None,4) 804
Figure 6: Spatiotemporal multi plot depicting output class Total
patterns from feature value variations. Top-to-bottom 643.972
params.
subfigures: A serve, forehand and backhand patterns (x-
axis: Frames, y-axis: Distance values. Nine colour-coded Table 3 shows the classification results with the
timeseries in legend are arranged by Table 1 row order. LSTM, Bidirectional LSTM and CNN+LSTM
networks. CNN+LSTM network consists of both
CNN layers and LSTM layers.
3 CLASSIFIER MODELLING
Table 3: Classification performance.
AND RESULTS
Classifier Output class Precision Recall F1-score
As part of data filtering process, visual inspection of Backhand 0,83 1,00 0,93
Forehand 1,00 1,00 1,00
collected videos, revealed that one video footage had LSTM
Serve 1,00 1,00 1,00
to be removed from the dataset due to being recorded No Stroke 1,00 0,92 0,96
from a substantially different camera position and F1 MCS 0,97
where the majority of the frames did not show visible Backhand 1,00 1,00 1,00
player’s figure. Bidirectional Forehand 1,00 1,00 1,00
The dataset was randomly divided into a training LSTM Serve 1,00 1,00 1,00
dataset and testing portion. 119 strokes (approx. 80%) No Stroke 1,00 1,00 1,00
were considered as the training dataset (24 serves, 24 F1 MCS 1,00
Backhand 1,00 1,00 1,00
forehands, 24 backhands and 47 no stroke play), and
CNN+ Forehand 1,00 0,83 0,91
30 strokes (approx. 20%) were considered as a testing LSTM Serve 1,00 1,00 1,00
dataset (6 serves, 6 forehands, 6 backhands, 12 no No Stroke 0,92 1,00 0,96
stroke play). Next, the created time series training F1 MCS 0,97
dataset was classified by experimenting with three Note: F1 multi-class score (F1 MCS). For reader’s convenience, all
different LSTM networks (LSTM, Bidirectional values are rounded to two decimal points. Achieved classification
performance for both LSTM and CNN+LSTM networks reached
LSTM and CNN+LSTM network). Table 2. provides 96,67%, which were rounded to 97% (shown as 0,97), while
the summary of produced LSTM model variations Bidirectional LSTM network performance reached 100% for the
used in our experiments. validation dataset.

401
VISAPP 2022 - 17th International Conference on Computer Vision Theory and Applications

The above-expected experimental results suggest that detected, the next rolling-window of 27 frames are
the improved solution would include modelling and buffered and supplied to the classifier. If a stroke is
analysis of additional output classes (e.g., volleys, detected, the overlay with the identified stroke will be
drop shots, serve variations) requiring (sub)phasing displayed over the next 27 frames (which are skipped
movement analysis. Similar to prior work on 3D from feature processing, considering minimum times
kinematic data (Bačić, 2016a), the ensemble between shots e.g., for the opposing player’s stroke).
orchestration control would not only rely on a
weighted probabilistic equation but also on expert’s
knowledge captured in a state automata machine. 5 DISCUSSION
Such approach allows ensemble modelling on small
and large dataset, where parameter optimisation and
For a prototype, the classification performance results
human-labelling efforts can be further reduced by exceeded expectations for the collected dataset (with
transfer learning and adaptive system design. approx. 80:20% split used for model training and
testing). We expect that expanding the dataset may
reduce classification performance, justifying a
4 IMPLEMENTATION FOR follow-up investigation into achieving an improved
VIDEO STREAMING solution that will generalise well on future data.
Another limitation is that occasionally, OpenPose
Trained model can be used to classify strokes in and fails to generate the correct stick figure, warranting
display overlaid text for video streaming. Model input further investigations to improve overall robustness
is a spatiotemporal dataset of nine features (Table 1). and accuracy. Further improvement is intended by
Spatiotemporal dataset subsample should be using additional videos taken from other vantage
imputed to a classifier as a block of experimentally points e.g. in front of the player. Considering the
determined size of 27 frames (Figure 7). Buffering of computational performance of pose estimation, we
27 frames (of approx. 1 second) represents rolling will look at implementation on lower cost platforms,
window concept in time-series analysis, in which key including tablets and mobiles. In coaching scenarios,
points from 2D pose estimation skeleton overlay were the intended platform would also process video feeds
converted into the 9 distance-based features from fixed camera positions akin to the dataset used
generating a 9x27 size buffered data block. in this paper. Unlike carrying and managing inertial
Therefore, after 27 frames of data were buffered, sensors, video is considered: (1) an unobtrusive data
a trained model (i.e. classifier) was used to detect a source not interfering with the player’s feel; and (2)
stroke and to classify the stroke. If a stroke is not to minimise the possibility of motion data
interpretation being contested.
During match situations, players may move closer
to the net. When players are close to the net, they will
perform stroke exchange in higher frequency than
compared to producing strokes behind the baseline.
Therefore, time between strokes may be sometimes
less than a second. For the scope of this research and
the proof-of-concept, one second (or longer time)
splits between the strokes in video have been
considered as sufficient for stroke identification and
classification. Future work will involve modelling of
increased number of output classes including faster
strokes exchange (e.g., containing further information
such as: direction, depth, drop shot and lob volleys).

6 CONCLUSION
This paper contributes to video indexing and human
activity recognition by applying a multidisciplinary
Figure 7: Strokes classification and overlay annotation as combination of computer vision, pose estimation and
video processing workflow concept.

402
Tennis Strokes Recognition from Generated Stick Figure Video Overlays

recurrent neural networks. Related contributions to REFERENCES


sport analytics, broadcasting and general information
retrieval from low-cost video were also motivated by 12kgp-Tennis. (2019, 28 Aug. 2020). Nick Kyrgios -
prior work in golf and tennis relying on sensors and Jeremy Chardy (4k 60fps). Retrieved from
3D kinematic data. Aligned with prior work, the https://www.youtube.com/watch?v=KQI6ZvE14nw
presented tennis stroke recognition from monocular Anand, A., Sharma, M., Srivastava, R., Kaligounder, L., &
video is also aimed at contributing to how machines Prakash, D. (2017). Wearable motion sensor based
can quantify, assess and diagnose aspects of human analysis of swing sports. In ICMLA, 16th IEEE
movement and provide comprehensive feedback – all International Conference on Machine Learning and
Applications. Cancún, Mexico.
contributing to the area of interpretable AI.
Bačić, B. (2004). Towards a neuro fuzzy tennis coach:
The presented video processing and modelling Automated extraction of the region of interest (ROI). In
framework, using selected publicly available tennis FUZZ-IEEE, International Conference on Fuzzy
videos, was implemented in Python on Google cloud Systems. Budapest, Hungary.
platform. The framework uses generated trajectories Bačić, B. (2016a). Echo state network ensemble for human
of key points (represented as human stick figure video motion data temporal phasing: A case study on tennis
overlays) which were further transformed into the forehands. In Neural Information Processing (Vol.
spatiotemporal feature space. Multi time data series 9950). Springer.
from the feature space were processed using three Bačić, B. (2016b). Echo state network for 3D motion
pattern indexing: A case study on tennis forehands. In
variations of LSTM classifiers. As a multi-class
Image and Video Technology. Lecture Notes in
classifier, the developed tennis shots recognition Computer Science (Vol. 9431). Springer.
system exceeded expected performance (96,67% – Bačić, B. (2016c). Extracting player’s stance information
100%), and did not rely on specialist expertise or from 3D motion data: A case study in tennis
insights for developed feature extraction techniques. groundstrokes. In PSIVT 2015 Image and Video
Using video-based feature extraction techniques Technology. Springer.
to provide diagnostic information without redundant Bačić, B., & Hume, P. (2012). Augmented video coaching,
data: (1) minimises reliance on domain expertise; (2) qualitative analysis and postproduction using open
enables interaction with and visualisation of source software. In ISBS, 30th International
Conference on Biomechanics in Sports. Melbourne,
intermediate preprocessing operations (via animated
VIC.
stick figure overlays), which is also important for Bačić, B., & Hume, P. A. (2018). Computational
initially small dataset modelling, and transparent and intelligence for qualitative coaching diagnostics:
comprehensive feature engineering process; and (3) Automated assessment of tennis swings to improve
maximises the role of AI, computer vision and pose performance and safety. Big Data, 6(4).
estimation for human motion modelling and analysis doi:https://doi.org/10.1089/big.2018.0062
(HMMA), and for advancements of sport science. Bačić, B., Meng, Q., & Chan, K. Y. (2017). Privacy
Our multi-class approach is transferable to signal preservation for eSports: A case study towards
processing and has been evaluated in prior work on augmented video golf coaching system. In DeSE, 10th
International Conference on Developments in e-
indexing and analysis of two-class classification in
Systems Engineering. Paris, France.
cricket. Future work will include application to other Bačić, B., Rathee, M., & Pears, R. (2021). Automating
sports, alongside broader contexts involving privacy- inspection of moveable lane barrier for Auckland
preserving filtering and data fusion from wearable harbour bridge traffic safety. Neural Information
and equipment-attached sensors. Processing (Vol 12532). Springer.
Back of the line tennis. (2018). Tennis practice match.
Retrieved from https://www.youtube.com/watch?v=nq
C0K4yGdxM
ACKNOWLEDGEMENTS Bandara, I., & Bačić, B. (2020). Strokes classification in
cricket batting videos. In CITISIA, 5th International
The authors wish to acknowledge the contributions of Conference on Innovative Technologies in Intelligent
the OpenPose team, in sharing and maintaining their Systems and Industrial Applications. Sydney, NSW.
code and documentations. Bloom, T., & Bradley, A. (2003). Player tracking and stroke
recognition in tennis video. In APRS Workshop on
Digital Image Computing. Brisbane, QLD.
Cai, J.-X., & Tang, X. (2018). RGB video based tennis
action recognition using a deep weighted long short-
term memory. Retrieved from https://arxiv.org/abs/
1808.00845v2

403
VISAPP 2022 - 17th International Conference on Computer Vision Theory and Applications

Chan, K. Y., & Bačić, B. (2018). Pseudo-3D binary Tennis Legend TV. (2019, 28 Aug. 2020). Roland-Garros
silhouette for augmented golf coaching. In ISBS, XXXVI 2019: Federer - Schwartzman practice points (court
International Symposium on Biomechanics in Sports. level view) Retrieved from https://www.youtube.com/
Auckland, New Zealand watch?v=vkGwyke5jDU
Connaghan, D., Conaire, C. Ó., Kelly, P., & Connor, N. E. Top Tennis Training - Pro Tennis Lessons. (2014, 28 Aug.
O. (2010). Recognition of tennis strokes using key 2020). Tsonga vs Anderson training match 2014-court
postures. In ISSC, 21st Irish Signals and Systems level view. Retrieved from https://www.youtube.com/
Conference. Dublin, Ireland. watch?v=RHokxoEsFsc
Connaghan, D., Kelly, P., O’Connor, N., Gaffney, M., TV Tennis Pro. (2020, 28 Aug. 2020). Alexander Zverev
Walsh, M., & O’Mathuna, C. (2011). Multisensor practice match vs Andrey Rublev court level view
classification of tennis strokes. In IEEE Sensors, tennis. Retrieved from https://www.youtube.com/
Limerick, Ireland. watch?v=mcR3d9jnWaI
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term Xia, K., Wang, H., Xu, M., Li, Z., He, S., & Tang, Y.
memory. Neural computation, 9(8). doi: (2020). Racquet sports recognition using a hybrid
https://doi.org/10.1162/neco.1997.9.8.1735 clustering model learned from integrated wearable
Kos, M., Ženko, J., Vlaj, D., & Kramberger, I. (2016). sensor. Sensors, 20(6). doi:https://doi.org/10.3390/
Tennis stroke detection and classification using s20061638
miniature wearable IMU device. In IWSSIP, Zhao, Y., Yang, R., Chevalier, G., Xu, X., & Zhang, Z.
International Conference on Systems, Signals and (2018). Deep residual Bidir-LSTM for human activity
Image Processing. Bratislava, Slovakia. recognition using wearable sensors. Mathematical
Liu, J., Shahroudy, A., Xu, D., Kot, A., & Wang, G. (2018). Problems in Engineering, 2018. doi:https://doi.org/
Skeleton-based action recognition using spatio- 10.1155/2018/7316954
temporal LSTM network with trust gates. IEEE
Transactions on Pattern Analysis and Machine
Intelligence, 40(12). doi:https://doi.org/10.1109/TPA
MI.2017.2771306
Lu, J., Shen, J., Yan, W. Q., & Bačić, B. (2018). An
empirical study for human behavior analysis.
International Journal of Digital Crime and Forensics.
IGI Global. http://doi.org/10.4018/IJDCF.2017070102
Martin, P.-E., Benois-Pineau, J., Peteri, R., & Morlier, J.
(2018). Sport action recognition with Siamese spatio-
temporal CNNs: Application to table tennis. In CBMI,
International Conference on Content-Based
Multimedia Indexing. La Rochelle, France.
Page, S. (2020). Tennis practice match points - NTRP 4.5
vs 5.0. Retrieved from https://www.youtube.com/
watch?v=dfrec4pjnI0
Petkovic, M., Jonker, W., & Zivkovic, Z. (2001).
Recognizing strokes in tennis videos using hidden
Markov models. In IASTED, International Conference
on Visualization, Imaging and Image Processing.
Marbella, Spain.
Ramasinghe, S., Chathuramali, K. G. M., & Rodrigo, R.
(2014). Recognition of badminton strokes using dense
trajectories. In 7th International Conference on
Information and Automation for Sustainability.
Colombo, Sri Lanka.
Shah, H., Chockalingam, P., Paluri, B., Pradeep, S., &
Raman, B. (2007). Automated stroke classification in
tennis. In ICIAR, 4th international conference on Image
Analysis and Recognition (Vol. 4633). Springer.
Taghavi, S., Davari, F., Tabatabaee Malazi, H., & Abin, A.
A. (2019). Tennis stroke detection using inertial data of
a smartwatch. In ICCKE, 9th International Conference
on Computer and Knowledge Engineering. Mashhad,
Iran.
Tenfitmen Tennis Impulse. (2020). Tennis match - player
vs coach (Tenfitmen - episode 123). Retrieved from
https://www.youtube.com/watch?v=uSoD2yyzRgY

404

You might also like