IEEE Access (Using CNN)
IEEE Access (Using CNN)
30, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.3030086
ABSTRACT Gait analysis is widely used in clinical practice to help in understanding the gait abnormalities
and its association with a certain underlying medical condition for better diagnosis and prognosis. Several
technologies embedded in the specialized devices such as computer-interfaced video cameras to measure
patient motion, electrodes placed on the surface of the skin to appreciate muscle activity, force platforms
embedded in a walkway to monitor the forces and torques produced between the ambulatory patient and the
ground, Inertial Measurement Unit (IMU) sensors, and wearable devices are being used for this purpose.
All of these technologies require an expert to translate the data recorded by the said embedded specialized
devices, which is typically done by a medical expert but with the recent improvements in the field of Artificial
Intelligence (AI), especially in deep learning, it is possible now to create a mechanism where the translation of
the data can be performed by a deep learning tool such as Convolutional Neural Network (CNN). Therefore,
this work presents an approach where human pose estimation is combined with a CNN for classification
between normal and abnormal gait of a human with an ability to provide information about the detected
abnormalities form an extracted skeletal image in real-time.
INDEX TERMS Convolutional neural network, deep learning, gait analysis, pose estimation.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
191542 VOLUME 8, 2020
A. Rohan et al.: Human Pose Estimation-Based Real-Time Gait Analysis Using Convolutional Neural Network
unique features from gait representation. Linear discriminant the unavailability of a proper gait analysis mechanism for
analysis (LDA) was used in [16] to reduce the dimensionality the subjects where problematic constraints such as the view
of the gait representation data into lower dimensional space- angle, walking speed, clothing, surface, carrying status, shoe,
separated classes but the problem with these techniques is and elapsed time are considered and minimalized in an effi-
that PCA and LDA do not take advantage of 2-dimensional cacious way. Therefore, for further progress, this research
data. In PCA and LDA, an image or 2-D data has to be work proposes an approach where human pose estimation
converted to a 1-dimension vector and this sometimes results is combined with a deep neural network such as CNN to
in poor recognition performance. Some subspace learning classify the normal and abnormal gait of a person. The reason
approaches have also been proposed [17]–[20], which started to use human pose estimation for gait analysis is that in pose
to consider learning the features from an object by con- estimation, deep learning-based CNN is used to detect the
sidering the representation of higher-order tensors. Sub- body points of a person without the worry of any problematic
space learning approaches are widely used approaches for constraint. This gives us skeletal images where body points
gait recognition. In [21], the authors proposed matrix-based are joined to form a skeleton of a person. Only these obtained
sparse bilinear discriminant analysis (SBDA) as a sparse skeletal images are further used for classification and gait
learning method effective for gait recognition. Also, Locality analysis. Hence, giving the freedom from using any wearable
Preserving Projections (LPP) [22] and Local Fisher Discrim- sensor or device on the subject’s body with minimal effect of
inant Analysis (LEFDA) [23] were employed to generate the any problematic constraint.
gait features. This paper is divided into the following sections:
Recently, Convolutional Neural Networks (CNNs) have Section 1 provides an introduction, Section 2 describes the
achieved great results in different fields of pattern recog- basic architecture of the system; Section 3 contains the results
nition, detection, and classification, especially in computer and discussion, and Section 4 concludes the study.
vision. CNN is a class of deep neural networks and is mostly
applied to analyze visual imagery. In [24], an efficient deep II. BASIC ARCHITECTURE OF THE SYSTEM
neural network architecture for visual recognition is pro- The basic architecture of the proposed approach is shown
posed and it is named as GoogLeNet. It is a big network in the Fig.1. The first step involves the recording of a
that comprises of around 27 layers, and it uses max and live video of a person’s gait movement. Each frame in
average pooling, dropout method, and a softmax classifier. the video is processed through the human pose estima-
In [25], the authors proposed a method to recognize periodic tion algorithm to obtain a skeleton image comprising of
human actions using CNN. In [26], a large CNN trained using 25 body points. The obtained image is given as an input
1.2 million high-resolution images from the ImageNet dataset to a CNN trained to classify multiple classes. In this
was introduced. A CNN architecture to speed up the training work, we trained the CNN for five types of classes. These
time for largescale video classification was proposed in [27]. classes are Normal, Abnormal Left Toe, Abnormal Left Foot,
One million YouTube videos belonging to 487 classes of Abnormal Right Toe, and Abnormal Right Foot. Once the
sports were trained using this architecture. trained CNN receives the input skeletal image of a person,
All these above-mentioned studies have shown some the output is given as the image with a label of predicted
advancement in the process of gait analysis but there is class.
FIGURE 2. Overall pipeline. (a) The method takes the entire image as the input for a CNN to jointly predict (b) confidence maps for body part detection
and (c) PAFs for part association. (d) The parsing step performs a set of bipartite matchings to associate body part candidates. (e) Finally assemble
them into full body poses for all people in the image.
1) CNN TRAINING
The training of the CNN was performed offline by collecting
and labeling a dataset comprising of multiple videos of a
person walking under different scenarios. The data collection
is one of the vital parts in achieving a good classification
accuracy. We collected the data by segregating the classes into
five categories; Normal, Abnormal Left Toe, Abnormal Left
Foot, Abnormal Right Toe, and Abnormal Right Foot. For
each category, the walking style of the person was made to
change to mimic some injury or problem in feet. For each
category we collected 50 videos, each video was recorded
using a camera at 60 Frame Per Second (FPS) with a res-
olution of 1280 × 720 for a duration of 50 seconds. These
videos were processed through the human pose estimation TABLE 2. PC Specifications.
algorithm to obtain skeletal images which were further used
for training and testing the CNN. 80% of these 50 videos
for each category was used for training the CNN while 20%
were used for testing the CNN. The network was trained for
45,000 iterations on a PC with specifications mentioned in
Table 2. The accuracy achieved for classification is 97.3%.
To create a problematic gait, we mimicked the walking style
by using some object tied with the toe of the person under
experiment, this gave us the mimicked data for the categories
of Abnormal Left Toe and Abnormal Right Toe. On the other category label. It can be seen the Fig. 5-9 that all of this
hand, to mimic some problems in feet we made the person data was collected inside our laboratory and the person was
walk in slippers with a foot cast and it gave us the data for walking on a treadmill.
the categories of Abnormal Left foot and Abnormal Right
Foot. For category Normal, the person walked without any III. RESULTS AND DISCUSSION
foot cast or object attached to the toe. Fig. 5-9 shows some The implementation of the proposed approach presented
example images extracted from the recorded videos with their in this work was done in real-time by distributing the
FIGURE 6. Example of the recorded data for category Abnormal Left Foot.
FIGURE 9. Example of the recorded data for category Abnormal Right Toe.
was 47ms. The FPS achieved during this process was almost
20 FPS. We compared the computational time and FPS of
our network with a CNN of ResNet50 architecture. For
ResNet50, the time it took to process a single frame was
almost 12ms higher than proposed CNN with FPS of 15FPS.
The results obtained are shown in Fig. 10-13. The
results shown in these figures are respective to the
Fig. 6-9. Fig. 10-13 shows the 25 body points skeletal images
extracted using human pose estimation. These extracted body
points are conjoined to form a respective skeletal image
which is further processed through the CNN-based classifier
for classification of the categories. It should be noticed that in
this work 25 body points are used to classify a gait abnormal-
FIGURE 7. Example of the recorded data for category Abnormal Left Toe.
ity problem related to just human feet. The reason to extract
and use 25 body points rather than just footpoints is due to
computational load on CPU and GPU, working in parallel the fact that the trained CNN achieved more accurate results
to reduce the computational time of the CNN. The recorded when more body points information was provided with a very
time to process a single frame and classifying the category low confusion rate.
FIGURE 12. Extracted skeletal image using pose estimation for Fig. 8.
FIGURE 10. Extracted skeletal image using pose estimation for Fig. 6.
FIGURE 13. Extracted skeletal image using pose estimation for Fig. 9.
FIGURE 11. Extracted skeletal image using pose estimation for Fig. 7.
(3) Frames-Per-Second (FPS): FPS is the rate at which a
classifier is capable of processing incoming camera frames.
In order to compute the overall performance of the clas-
To evaluate the performance of the classifier we applied the sifier, we define a composite score metric [30]. The rea-
following three metrics: son to create such matric is to measure overall performance
(1) Sensitivity: This metric is defined as the proportion including the FPS which is a very important parameter to
of true positives images that are correctly classified by a analyze computer vision and image processing applications.
classifier. This metric is calculated by taking into account Whereas, the only accuracy was used to determine the perfor-
the True Positives (Tpos) and False Negatives (Fneg) of the mance of the CNN-based classifier. This metric consists of a
detected class as given by (1). linear combination of Sensitivity and Precision together with
Tpos the achieved FPS. We parametrize the score with respect to
Sensitivity = (1) a vector of weights w ∈ [0, 1]3 as given by (3). We priori-
Tpos + Fneg
tized FPS with a weight of 0.4 over the other two accuracy-
(2) Precision: This metric is a widely used metric and is related metrics because FPS is a more prominent factor in
defined as the proportion of True Positives among all the performance evaluation of the overall system, whereas other
detected classes of the system and is given by (2). parameters were equally weighted with 0.2.
Tpos Score (w) = w1 × FPS + w2 × Sensitivity + w3 × Precision
Precision = (2)
Tpos + Fpos (3)
experimental results are promising in solving some of the typ- [19] D. Xu, S. Yan, D. Tao, L. Zhang, X. Li, and H.-J. Zhang, ‘‘Human gait
ical problem constraints involved in the gait analysis system. recognition with matrix representation,’’ IEEE Trans. Circuits Syst. Video
Technol., vol. 16, no. 7, pp. 896–903, Jul. 2006.
The accuracy achieved for classifying normal and abnormal [20] D. Tao, X. Li, X. Wu, and S. J. Maybank, ‘‘General tensor discriminant
gait is 97.3% that proves the applicability of the proposed analysis and Gabor features for gait recognition,’’ IEEE Trans. Pattern
mechanism. Furthermore, in the future, the system can be Anal. Mach. Intell., 29, no. 10, pp. 1700–1715, Oct. 2007.
[21] Z. Lai, Y. Xu, Z. Jin, and D. Zhang, ‘‘Human gait recognition via sparse
used to distinguish different people’s gait by adding more data discriminant projection learning,’’ IEEE Trans. Circuits Syst. Video Tech-
in the training process of the classifier. nol., vol. 24, no. 10, pp. 1651–1662, Oct. 2014.
[22] X. He and P. Niyogi, ‘‘Locality preserving projections,’’ in Proc. NIPS,
vol. 16, 2003, pp. 153–160.
REFERENCES [23] M. Sugiyama, ‘‘Dimensionality reduction of multimodal labeled data
[1] W. Pirker and R. Katzenschlager, ‘‘Gait disorders in adults and the elderly: by local Fisher discriminant analysis,’’ J. Mach. Learn. Res., vol. 8,
A clinical guide,’’ Wiener klinische Wochenschrift, vol. 129, nos. 3–4, pp. 1027–1061, May 2007.
pp. 81–95, Feb. 2017, doi: 10.1007/s00508-016-1096-4. [24] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
[2] J. R. Jørgensen, D. T. Bech-Pedersen, P. Zeeman, J. Sørensen, V. Vanhoucke, and A. Rabinovich, ‘‘Going deeper with convolutions,’’
L. L. Andersen, and M. Schönberger, ‘‘Effect of intensive outpatient in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015,
physical training on gait performance and cardiovascular health in people pp. 1–9.
with hemiparesis after stroke,’’ Phys. Therapy, vol. 90, no. 4, pp. 527–537, [25] E. P. Ijjina and C. K. Mohan, ‘‘One-shot periodic activity recognition using
2010. convolutional neural networks,’’ in Proc. 13th Int. Conf. Mach. Learn.
[3] R. D. Sanders and P. M. Gillig, ‘‘Gait and its assessment in psychiatry,’’ Appl., Dec. 2014, pp. 388–391.
Psychiatry (Edgmont), vol. 7, no. 7, pp. 38–43, 2010. [26] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘Imagenet classification
[4] W. Khan and A. Badii, ‘‘Pathological gait classification and segmen- with deep convolutional neural networks,’’ in Proc. Adv. Neural Inf. Pro-
tation by processing the hip joints motion data to support mobile cess. Syst., 2012, pp. 1097–1105.
gait rehabilitation,’’ Res. C Med. Eng. Sci., vol. 7, no. 3, 2019, [27] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and
doi: 10.31031/RMES.2019.07.000662. L. Fei-Fei, ‘‘Large-scale video classification with convolutional neural
networks,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014,
[5] K. K. Haussler, ‘‘Equine manual therapies in sport horse practice,’’ Veteri-
pp. 1725–1732.
nary Clinics North America: Equine Pract., vol. 34, no. 2, pp. 375–389,
[28] Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh, ‘‘Realtime multi-
Aug. 2018.
person 2D pose estimation using part affinity fields,’’ CoRR 2016,
[6] P. Gottipati, S. Fatone, T. Koski, P. A. Sugrue, and A. Ganju,
arXiv:1611.08050. [Online]. Available: https://arxiv.org/abs/1611.08050
‘‘Crouch gait in persons with positive sagittal spine alignment resolves
[29] Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, ‘‘OpenPose:
with surgery,’’ Gait Posture, vol. 39, no. 1, pp. 372–377, 2014,
Realtime multi-person 2D pose estimation using part affinity fields,’’ 2018,
doi: 10.1016/j.gaitpost.2013.08.012.
arXiv:1812.08008. [Online]. Available: https://arxiv.org/abs/1812.08008
[7] M. J. Kennedy, M. Lamontagne, and P. E. Beaulé, ‘‘Femoroacetabular [30] A. Rohan, M. Rabah, and S.-H. Kim, ‘‘Convolutional neural
impingement alters hip and pelvic biomechanics during gait: walking network-based real-time object detection and tracking for parrot
biomechanics of FAI,’’ Gait Posture, vol. 30, no. 1, pp. 41–44, Jul. 2009, AR drone 2,’’ IEEE Access, vol. 7, pp. 69575–69584, 2019,
doi: 10.1016/j.gaitpost.2009.02.008. doi: 10.1109/ACCESS.2019.2919332.
[8] M. J. Marin-Jimenez, F. M. Castro, N. Guil, F. de la Torre, and
R. Medina-Carnicer, ‘‘Deep multi-task learning for gait-based biometrics,’’
in Proc. IEEE Int. Conf. Image Process. (ICIP), Beijing, China, Sep. 2017, ALI ROHAN (Associate Member, IEEE) received
pp. 106–110, doi: 10.1109/ICIP.2017.8296252. the B.S. degree in electrical engineering from
[9] T. M. Parker, L. R. Osternig, P. van Donkelaar, and L.-S. Chou, The University of Faisalabad, Pakistan, in 2012,
‘‘Balance control during gait in athletes and non-athletes following and the M.S. and Ph.D. degrees in electrical,
concussion,’’ Med. Eng. Phys., vol. 30, no. 8, pp. 959–967, 2008, electronics, and control engineering from Kunsan
doi: 10.1016/j.medengphy.2007.12.006. National University, South Korea, in 2018 and
[10] Z. Liu and S. Sarkar, ‘‘Simplest representation yet for gait recognition: 2020, respectively. From 2012 to 2013, he worked
Averaged silhouette,’’ in Proc. 17th Int. Conf. Pattern Recognit., vol. 4, as a Development Engineer at the Niagara Group
Aug. 2004, pp. 211–214. of Industries, Pakistan. From 2013 to 2015, he
[11] J. Han and B. Bhanu, ‘‘Individual recognition using gait energy image,’’ worked as a Project Engineer for Circle Club,
IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 2, pp. 316–322, Pakistan. From 2015 to 2016, he worked as a Project Manager for Steam
Feb. 2006. Masters, Pakistan, and also as a Lecturer at the Department of Electri-
[12] K. Bashir, T. Xiang, and S. Gong, ‘‘Gait recognition using gait entropy cal and Telecommunication Engineering, Government College University
image,’’ in Proc. 3rd Int. Conf. Imag. Crime Detection Prevention (ICDP), Faisalabad, Pakistan. From 2016 to 2020, he worked as a Research Associate
2009, pp. 1–6. with the Factory Automation and Intelligent Control Lab., Kunsan National
[13] M. Jeevan, N. Jain, M. Hanmandlu, and G. Chetty, ‘‘Gait recognition University. He is currently working as an Assistant Professor with the
based on gait pal and pal entropy image,’’ in Proc. IEEE Int. Conf. Image Department of Mechanical, Robotics, and Energy Engineering, Dongguk
Process., Sep. 2013, pp. 4195–4199. University, South Korea. His research interests includes machine learning,
[14] H. Iwama, M. Okumura, Y. Makihara, and Y. Yagi, ‘‘The ou-isir gait AI, UAV’s, power electronics, fuzzy logic, EV systems, image processing,
database comprising the large population dataset and performance evalua-
computer vision, and prognostics and health management (PHM).
tion of gait recognition,’’ IEEE Trans. Inf. Forensics Security, vol. 7, no. 5,
pp. 1511–1521, Oct. 2012.
MOHAMMED RABAH received the B.A. degree
[15] M. Turk and A. Pentland, ‘‘Eigenfaces for recognition,’’ J. Cogn. Neurosci.,
in electronics and communication engineering
vol. 3, no. 1, pp. 71–86, 1991.
from the AL-SAFWA High Institute of Engi-
[16] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, ‘‘Eigenfaces vs.
Fisherfaces: Recognition using class specific linear projection,’’ IEEE neering, Egypt, in 2015, and the M.S. degree in
Trans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 711–720, Jul. 1997. electronics and information engineering and the
[17] D. Xu, S. Yan, L. Zhang, S. Lin, H.-J. Zhang, and T. S. Huang, ‘‘Recon- Ph.D. degree from Kunsan National University,
struction and recognition of tensor-based objects with concurrent sub- South Korea, in December 2017 and August 2020,
spaces analysis,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 1, respectively. He is currently working as a Research
pp. 36–47, Jan. 2008. Engineer with the Turku University of Applied
[18] S. Yan, D. Xu, Q. Yang, L. Zhang, X. Tang, and H.-J. Zhang, ‘‘Discriminant Sciences, Turku, Finland. His research interests
analysis with tensor representation,’’ in Proc. IEEE Comput. Soc. Conf. includes automation, control and intelligent systems. Furthermore, he is also
Comput. Vis. Pattern Recognit. (CVPR), vol. 1, Jun. 2005, pp. 526–532. interested in UAV’s applications, fuzzy systems, and deep learning.
TAREK HOSNY received the B.S., M.S., and SUNG-HO KIM received the B.S. degree in elec-
Ph.D. degrees in communication engineering from trical engineering from Korea University, in 1984,
the Shoubra Faculty of Engineering, Cairo, Egypt, the M.S. and Ph.D. degrees in electrical engi-
in 2006, 2014, and 2019, respectively. He is cur- neering from Korea University, in 1986 and
rently a Lecturer with the Al-Safwa High Insti- 1991, respectively, and the post-doc from Japan
tute of Engineering, Communication Engineering Hiroshima University, in 1996. He is currently a
Department, Cairo, Egypt. His research interests Professor with Kunsan National University. His
include LI-FI, visible light communication sys- research interests includes fuzzy logic, sensor net-
tems, machine learning, and optics. works, neural networks, intelligent control sys-
tems, renewable energy systems, fault diagnosis
systems.