Auto - Fit: Workout Tracking Using Pose-Estimation and DNN
Auto - Fit: Workout Tracking Using Pose-Estimation and DNN
Auto - Fit: Workout Tracking Using Pose-Estimation and DNN
Abstract— Lack of physical fitness increases the risk of From figure 1 we can conclude that the death rate due to
adverse health conditions including coronary heart obesity in India is less as compared to the world average
diseases, high blood pressure, stroke, metabolic syndrome, but it’s consistent growth is a reason for concern. Around
type 2 diabetes which leads to a decrease in the life 80% of adolescents are not physically active. Scarce of
expectancy of humans. In our work, we have introduced physical activity is a key factor for noncommunicable
Auto_fit, an application that suggests the workouts and diseases like cardiovascular diseases, cancer, diabetes, and
tracks it. Auto_fit uses Postnet for doing pose estimation to many more. In the U.S. more than 80% of adults and
find 17 body keypoints followed by using the DNN classifier adolescents do not meet the guidelines for the physical
to identify the state of exercise and then counts the activity mentioned by the Department of Health & Human
repetitions performed. We collected the videos of trained Services. It is seen that urban, richer, and middle-aged
professionals performing the exercise and then used it to population is more prone to be obese in India. Women who
train Auto_fit. Auto_fit takes live video feed and counts the aged around 30 are more likely to be overweight as
repetitions of exercise performed. It works on two common compared to men at the same age. This is happening due to
exercises and can also be run on low single-board computers the social customs which confines the agility and physical
like Raspberry pi. Auto_fit helps in improving physical activity for women.
fitness and thus enables a person to live a longer and This is due to the sedentary lifestyle in the modern world.
healthier life. Physical fitness can be improved by sticking to a regular
workout routine. And significant results can be seen by
Keywords— DNN, Posenet, Raspberry pi, Pose estimation, only doing basic workouts. Many people are also willing to
computer vision. start but due to a lack of knowledge of exercises and proper
guidance, they are unable to inculcate this in their daily
I. INTRODUCTION routine. And due to busy schedules, they are also unable to
Lack of physical fitness increases the risk of adverse go to gyms or fitness centers to get proper guidance for
health conditions including coronary heart diseases, high blood their workout. In this paper, we are proposing a system that
pressure, stroke, metabolic syndrome, type 2 diabetes which suggests the correct form of exercise to the person and also
leads to a decrease in the life expectancy of humans. keeps track of the exercises performed by an individual
using pose estimation(action recognition). This enables the
person to indulge in physical activities at their own pace of
time without having dependencies on others.
Auto_fit is particularly helpful for those people who can
not or don't want to go out of their house for workouts.
Auto_fit provides a complete workout plan for an
individual which involves different types of activities
which are divided into 6 days of a week. Auto_fit also
checks for correct posture and notify if the posture is
wrong. It also records the count of sets and repetitions
performed for every exercise and keeps records of all of
them to provide better suggestions for workouts in
subsequent weeks. It can log and track data of at most 10
people at a time.
Fig 1. The death rate from obesity In the first part, it creates the skeleton of the user with 17
points like wrist, knees, ankles, etc. these points are
167
International Journal of Engineering Applied Sciences and Technology, 2020
Vol. 5, Issue 1, ISSN No. 2455-2143, Pages 167-173
Published Online May 2020 in IJEAST (http://www.ijeast.com)
identified using the TensorFlow Postnet model which uses formulated the idea of a spectral-spatial feature learning (SSFL)
Mobilenet_V1 under the hood. And in the second part of method to gather important features of hyperspectral images
the application, The identified points are then passed to a (HSIs). LeCun et al (2015) [13] describes useful concepts for
deep neural network model which is trained to identify the deep supervised learning, unsupervised learning, reinforcement
correct pose and count the repetitions of an exercise learning & evolutionary computation, and indirect search for
through live camera feed. Both parts of the application are short programs encoding deep and large networks.
combined to provide a complete solution for doing
workouts at home. IV. TECHNICAL APPROACH
This application is designed in such a way that it can be run
on low powered computing devices like Raspberry Pi, Pipeline overview, talking about Auto_fit’s technical aspects
Jetson nano, etc. to provide a compact and low-cost device consists of two phases training and testing phase in which each
that can be easily installed at home. phase is divided into multiple stages. Which are shown in figure
2 and figure 3. Here the process starts from giving input for
II. PROBLEM STATEMENT training and getting output for the identified pose.
The problem consists of identifying the 17 body key points in a
video frame and then classifying the pose by giving input to a 1.Training Phase
deep neural network that identifies the action performed by the
person in that frame. The purpose is to train a deep neural
network to identify the correct forms of the exercise.
In [11] the author has used convolutional neural networks for Fig 3. Testing Phase of Deep Neural Network
doing image classification with TensorFlow.Paper [12] has
168
International Journal of Engineering Applied Sciences and Technology, 2020
Vol. 5, Issue 1, ISSN No. 2455-2143, Pages 167-173
Published Online May 2020 in IJEAST (http://www.ijeast.com)
For testing, the video stream is taken live from the camera and The following image is passed on to the model and the result is
fed into the posenet model which generates the coordinates of generated with marked key points on the image.
body key-points in that frame. Those keypoint coordinates are
then fed into the DNN classifier which we have trained in the
first phase. Then the DNN classifier identifies the pose as the
initial or final state of the exercise. This output is used for
further calculations like counting the reparations of an exercise.
V. POSE ESTIMATOR
169
International Journal of Engineering Applied Sciences and Technology, 2020
Vol. 5, Issue 1, ISSN No. 2455-2143, Pages 167-173
Published Online May 2020 in IJEAST (http://www.ijeast.com)
coordinates. This array is fed to the DNN classifier which State A - initial position, in which feet are joined together
predicts whether the given set of coordinates belongs to state 0 and hands are down.
or 1 and gives 0 or 1 as output. The output is then converted to State B - from the initial position we do a jump, to move our feet
the respective state of the exercise and displayed onto the apart and move hands up.
screen. We have used Binary-Cross Entropy as a loss function From state B we again take a jump and move hands down
and RMSprop as an optimizer. Our DNN consists of 4 layers. and feet close. This cycle keeps on repeating.
In the first layer, there are 128 neurons and then there are 2
hidden layers with 64 and 32 neurons respectively and the
output layer has 1 neuron which gives 0,1 as result here relu is
used as the activation function. the model was trained on 7660
examples and validation was done on 1916 examples and it
was later tested on 2394 examples
VII. RESULT
The coordinates from both videos are merged and labeled. This
labeled dataset is then split into 3 sets:-
1. Training dataset (7661 examples)
2. Validation dataset (1916 examples)
3. Testing dataset (2395 test examples)
We have trained the model for 100 epochs and gained training
accuracy 96% and validation accuracy 95%, training loss 0.12,
validation loss 0.28 for loss function binary cross-entropy.
Fig 6. Jumpjack exercise
170
International Journal of Engineering Applied Sciences and Technology, 2020
Vol. 5, Issue 1, ISSN No. 2455-2143, Pages 167-173
Published Online May 2020 in IJEAST (http://www.ijeast.com)
b) Shoulder lateral raise For creating a dataset for this exercise we have used videos of
Lateral raise is an isolation exercise that targets deltoid trained professionals performing the exercise. We have split
muscles. It is part of a strength workout that focuses on that video into 2 different videos containing the state A in one
muscle growth. This exercise significantly focuses on video and state B in the other. Then those videos are passed on
the lateral or medial head of the deltoid, creating them to the posenet which generates the coordinates for all 17 key
seem wider and additional developed. Strength points
workouts have several advantages like muscle growth,
improved bone health, controlled body fat, and The coordinates from both videos are merged and labeled. This
minimized Risk of Injury. Due to these benefits, it must labeled dataset is then split into 3 sets:-
be included in a workout regimen. 1. Training dataset (1937 examples)
2. Validation dataset (485 examples)
This exercise can be divided into 2 states : 3. Testing dataset (606 test examples)
State A: hands down and feet shoulder-width apart
State B: both hands raised till the shoulder level We have trained the model for 100 epochs and got Training
accuracy 91% and validation accuracy 86%, training loss 0.18,
and validation loss 0.15 for binary cross-entropy as a loss
function.
171
International Journal of Engineering Applied Sciences and Technology, 2020
Vol. 5, Issue 1, ISSN No. 2455-2143, Pages 167-173
Published Online May 2020 in IJEAST (http://www.ijeast.com)
Auto_fit does not require very powerful hardware and hence can
be easily run on low powered devices like Raspberry pi and
Nvidia jetson nano. This enables it’s installation in less space
and can be produced in a compact form factor.
IX. REFERENCE
[1] Reimers, Carl & Knapp, G & Reimers, Anne. (2012). Does
Physical Activity Increase Life Expectancy? A Review of the
Literature. Journal of aging research. 2012. 243958.
10.1155/2012/243958.
[3] Pedersoli, Fabrizio & Benini, Sergio & Adami, Nicola &
Fig 13. Training loss graph
Leonardi, Riccardo. (2014). XKin: an Open Source Framework
for Hand Pose and Gesture Recognition Using Kinect. The
While testing our model on a test data set we got 86%
Visual Computer: International Journal of Computer Graphics.
accuracy and 0.55.
10.1007/s00371-014-0921-x.
VIII. CONCLUSION AND FUTURE WORK
[4] Islam, Muhammad Usama & Mahmud, Hasan & Ashraf,
Faisal & Hossain, Iqbal & Hasan, Md. (2017). Yoga posture
In this paper, we introduced Auto_fit, an application that uses
recognition by detecting human joint points in real time using
pose estimation and deep learning to provide effective workout
microsoft kinect. 668-673. 10.1109/R10-HTC.2017.8289047.
logging and tracking workouts. We have used posenet for pose
estimation to evaluate videos of exercises and generate body key
points, these are again fed to DNN classifier to identify the state [5] Shih-En Wei, Ramakrishna. V, Kanade .T and Sheikh.Y,
of the exercise. The state information is in turn used for counting "Convolutional Pose Machines," 2016 IEEE Conference on
the repetitions and sets performed. Computer Vision and Pattern Recognition (CVPR), Las Vegas,
NV, 2016,doi: 10.1109/CVPR.2016.511.
We have worked with 2 different exercises, connecting training
videos for each, and use both pose estimation and Deep learning [6] Cohen .I and Li .H, Inference of human postures by
to provide repetitions count on a specific exercise, as well as classification of 3d human body shape, in Analysis and
machine learning algorithms to automatically determine the Modeling of Faces and Gestures, 2003. AMFG 2003. IEEE
state of exercise in the live camera feed. International Workshop on. IEEE, 2003,( pp. 74–81).
172
International Journal of Engineering Applied Sciences and Technology, 2020
Vol. 5, Issue 1, ISSN No. 2455-2143, Pages 167-173
Published Online May 2020 in IJEAST (http://www.ijeast.com)
[8] Cao .Z, Simon .T, Wei .S and Sheikh .Y, "Realtime Multi-
person 2D Pose Estimation Using Part Affinity Fields," 2017
IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), Honolulu, HI, 2017, doi: 10.1109/CVPR.2017.143.
[9] Newell, A., Yang, K., & Deng, J. (2016). Stacked hourglass
networks for human pose estimation. In B. Leibe, J. Matas, M.
Welling, & N. Sebe (Eds.), Computer Vision - 14th European
Conference, ECCV 2016, Proceedings (pp. 483-499).
[14]https://www.tensorflow.org/lite/models/pose_estimation/o
verview?hl=ru
[15]https://blog.tensorflow.org/2018/05/real-time-human-pose-
estimation-in.html
173