A Transfer Learning Based Cross-Subject Generic Model For Continuous Estimation of Finger Joint Angles From A New User
A Transfer Learning Based Cross-Subject Generic Model For Continuous Estimation of Finger Joint Angles From A New User
4, APRIL 2023
Abstract—Continuous estimation of finger joints based Index Terms—Surface electromyography (sEMG), finger
on surface electromyography (sEMG) has attracted much joint angles, motor intent estimation, generic model,
attention in the field of human-machine interface (HMI). multiuser, cross-subject, transfer learning.
A couple of deep learning models were proposed to es-
timate the finger joint angles for specific subject. When
applied onto a new subject, however, the performance of I. INTRODUCTION
the subject-specific model would degrade significantly due
STIMATION of human motor intent based on surface
to the inter-subject differences. Therefore, a novel cross-
subject generic (CSG) model was proposed in this study
to estimate continuous kinematics of finger joints for new
E electromyogram (sEMG) plays an important role in the
intuitive human-machine interface (HMI). This technique has
users. Firstly, a multi-subject model based on the LSTA-
Conv network was built by using sEMG and finger joint recently been increasingly applied in a couple of scenarios, such
angles data from multiple subjects. Then, the subjects ad- as space teleoperation, industrial robots, intelligent prosthesis,
versarial knowledge (SAK) transfer learning strategy was and neurorehabilitation robots. However, the state-of-art litera-
adopted to calibrate the multi-subject model with the train- ture has mostly focused on the subject-specific model for motor
ing data from a new user. With the updated model param- intent identification so far. Namely, this model requires data from
eters and the testing data from the new user, multiple fin-
ger joint angles could be estimated afterwards. The overall a specific subject for model training. The model performance
performance of the CSG model for new users was vali- would significantly degrade when applied to a new subject
dated on three public datasets from Ninapro. The results independent of model training or even to the same subject on
showed that the newly proposed CSG model significantly a different day due to inter-subject and inter-day differences
outperformed five subject-specific models and two transfer and the nonstationary characteristics of sEMG. Thus, building
learning models in terms of Pearson correlation coefficient,
root mean square error, and coefficient of determination. a generic model for motor intent estimation across multiuser
Comparison analysis showed that both the long short-term becomes essential to facilitate the application of sEMG-based
feature aggregation (LSTA) module and the SAK transfer HMI.
learning strategy contributed to the CSG model. Moreover, Recently, the framework of feature engineering and pattern
increasing number of subjects in training set improved the recognition (FE-PR) have been mostly investigated towards
generalization capability of the CSG model. The novel CSG
model would facilitate the application of robotic hand con- building a cross-subject generic myoelectric interface. Specif-
trol and other HMI settings. ically, several feature transformation methods were proposed
aiming at reducing the discrepancies between the target-domain
set and the source-domain set [1], [2], [3], [4]. For example,
Khushaba et al. implemented a style-independent feature trans-
Manuscript received 13 September 2022; revised 30 November 2022;
accepted 4 January 2023. Date of publication 6 January 2023; date formation method named canonical correlation analysis (CCA),
of current version 5 April 2023. This work was supported in part in which different users’ data is projected onto a unified-style
by the National Natural Science Foundation of China under Grants space [2]. To realize multiuser hand gesture recognition, Xue
U1913601 and 82161160341, in part by the Natural Science Foundation
of Guangdong Province under Grant 2021A1515011892, and in part by et al. proposed another novel framework. They used the CCA
the Basic Research Program of Shenzhen Scientific Plan under Grant to extract inherent user-independent properties from multiuser
JCYJ20210324101607022. (Yucheng Long and Yanjuan Geng con- EMG signals, then an optimal transport was applied to reduce
tributed equally to this work.) (Corresponding authors: Yanjuan Geng;
Guanglin Li.) the discrepancies in data distribution between the transformed
Yucheng Long, Yanjuan Geng, and Guanglin Li are with the feature matrix from the training and the testing sets [3]. Simi-
Shenzhen Institute of Advanced Technology, Chinese Academy of larly, Sheng et al. presented a common spatial-spectral analysis
Sciences, Shenzhen 518055, China, and also with the University
of Chinese Academy of Sciences, Beijing 101400, China (e-mail: framework to overcome the cumbersome training and retraining
[email protected]; [email protected]; [email protected]). procedures that is required across multiday and multiuser [4].
Chenyun Dai is with the Center for Intelligent Medical Electron- They used a linear projection to minimize the objective function,
ics, School of Information Science and Technology, Fudan University,
Shanghai 200433, China (e-mail: [email protected]). which was formulated to measure the diversity of spatial-spectral
Digital Object Identifier 10.1109/JBHI.2023.3234989 EMG signals among multiple days and multiple users. All these
2168-2194 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Indian Institute of Technology Patna. Downloaded on June 17,2023 at 13:21:30 UTC from IEEE Xplore. Restrictions apply.
LONG et al.: TRANSFER LEARNING BASED CROSS-SUBJECT GENERIC MODEL 1915
Authorized licensed use limited to: Indian Institute of Technology Patna. Downloaded on June 17,2023 at 13:21:30 UTC from IEEE Xplore. Restrictions apply.
1916 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 4, APRIL 2023
Fig. 2. The overall structure of the newly proposed cross-subject generic (CSG) model. It was composed of two main parts, the LSTA-conv
network and the subject adversarial knowledge (SAK) based transfer learning strategy. The multi-subject model was firstly built based on the sEMG
and finger joint angles from multiple subjects, and then calibrated by the SAK transfer learning strategy and the training data from a new subject.
With the updated multi-subject model, the finger joint angles from the new user could be estimated. Herein, the LSTA-conv network was composed
of long short-term feature aggregation (LSTA) module, multi-scale convolution [11], and logistic regressor. Four LSTA modules (N = 4) constitute
the backbone of the multiuser model. Multi-scale convolution was designed to aggregate features across different time frames, where DC-n denotes
the dilated convolution [12] with a dilation rate of n. A logistic regressor was then applied for finger joint angle estimation, where conv denotes
convolution, mean denotes dimension mean operation, and MLP denotes full-connect layer.
for 5 seconds and then rest for 3 seconds for DB2 and DB7, acceptable accuracy and controller delay. While for DB1, the
while repeated 10 times for DB1. During movement execution, sliding window size was set as 1600 ms, and the increasement
10-12 channels of sEMG were recorded with different sampling was set as 100 ms because of its lower sampling frequencies
rates and different devices. Meanwhile, the Cyber Glove II (100 Hz). The last joint angle value within each window was
data-glove was used to recording 22 finger joint angles with used to construct the joint angle feature matrix.
a sampling rate of 20 Hz. Herein, a total of 10 finger joint 4) Normalization: Next, the sEMG was normalized to [−1,
angles (Fig. 1(b)), namely the proximal interphalangeal point 1]. Specifically, for each window SF , the sEMG were normal-
and Metacarpophalangeal point were included in this study, ized to [−Sr , +Sr ] to keep appropriate amplitude range, and
because they are the main active joints in the grasping movement. then linearly mapped to [−1, 1]. The value of Sr was chosen
according to formula:
B. Data Preprocessing
Sr = max (max (SF ) , abs (min (SF ))) + α (2)
1) Denoising: A fourth-order Butterworth bandpass filter
with cutoff frequency of 5–450 Hz was used to remove the direct where α is an artificially set parameter.
current component and high frequency domain noise of sEMG,
and a north filter was used to attenuate the power line interference C. Transfer Learning Based Cross-Subject Generic
(50 Hz). As for the joint angle signals, a second-order low-pass Model
filter was used to smooth the curve.
2) Amplification: Given existing small amplitudes for several With the preprocessed EMG and finger joint angles, the
sEMG channels, a logarithmic scaling algorithm called the u-law performance of the transfer learning based CSG model could
transformation was used to amplify the sEMG signals [13], [14], be evaluated. Fig. 2 illustrates the overall block diagram and
which is formulated as: the structure of the newly proposed CSG model. It was com-
posed of two main parts, the LSTA-Conv network and the SAK
ln (1 + μ |xt |)
F (xt ) = sign (xt ) (1) transfer learning strategy. The LSTA-Conv network could serve
ln (1 + μ) as a multi-subject model or a subject-specific model, wherein
where t is the time, xt is the sEMG signal, and μ is an artificially four LSTA modules were used to extract sEMG features on
set parameter. In this experiment, μ was set as 256. different temporal scales. The SAK transfer learning strategy
3) Window Segmentation: Considering the requirement of was proposed to map the data from multiple subjects to a new
data sufficiency for deep learning, the amplified sEMG and the user and calibrate the multi-subject model. In this section, the
smoothed kinematics data were converted into multi-channel construction of the LSTA module and the SAK transfer learning
long exposure by means of window segmentation. For DB2 strategy will be mainly described as follows.
and DB7, we used a sliding window with time length of 1000 1) Long Short-Term Feature Aggregation Module: As shown
ms and increasement of 50 ms, given the tradeoff between in Fig. 3(a), a reconstruct operation at the beginning of the LSTA
Authorized licensed use limited to: Indian Institute of Technology Patna. Downloaded on June 17,2023 at 13:21:30 UTC from IEEE Xplore. Restrictions apply.
LONG et al.: TRANSFER LEARNING BASED CROSS-SUBJECT GENERIC MODEL 1917
was used in the long-term branch for its low computational cost
[16]. For the short-term branch, the Cross Stage Partial (CSP)
structure was employed for high computational efficiency [17].
Herein, the feature receptive fields on the long-term branch
and the short-term branches of LSTA were set as 25 and 3,
respectively. Then, the outputs from two parallel branches were
concatenated together, followed by a residual connection. At
last, the multi-level features were obtained via convolution.
Fig. 3(b) shows the structure of the Chunked Transformer
in the long-term branch. At first, the standard multi-head self-
attention (MSA) was used to divide the sEMG features into
individual non-overlapped chunks and compute self-attention
within each chunk, followed by a two-layer channel Multilayer
Perceptron (MLP). Then the MSA with shifted windows was
used to extract the cross-chunk connection information across
every two consecutive chunks. The output of Shifted MSA went
through the channel MLP to give the output of the Chunked
Transformer. Note a Layer Norm (LN) layer and a residual con-
nection were applied after each MSA and channel MLP. In this
work, the depth-wise separable convolution was used to replace
the linear mapping and position embedding in MSA, because it
could improve the computational effectiveness of model [18],
[19], [20], [21]. In addition, one-dimensional convolution was
used in the MSA to fit the temporal information from sEMG
instead of the two-dimensional convolution. All receptive fields
of MSA were set to 25 in this work. The LeakyRelu nonlinearity
was applied to the output of MLP for gradient optimization
during model training [22].
Fig. 3(c) illustrates the structure of the CSP block in the
short-term branch. The input of the CSP block was fed into two
parallel branches. The input of the former branch was linearly
encoded by the convolution with filters size of 1 and then directly
linked to the end of the CSP block. The input of the latter branch
went through encoded convolution with filter size of 1, two
convolutions with filter size of 3, and then a residual connection
Fig. 3. The construction of the long short-term feature aggregation
[23]. Then, the output went through a convolution layer with
(LSTA) module. (a) Two parallel branches, the long-term branch and the filter size of 1, and was concatenated with the output from
short-term branch comprise the LSTA module. The long-term branch the former branch. Note after each convolution, the obtained
mainly consists of n transformer blocks in stacks, while the core struc-
ture of the short-term branch is the cross stage partial (CSP_N) block.
data were normalized by using batch normalization [24], and
The size of the input sequence is c ∗ l, where c denotes the number followed by rectified linear unit activation using LeakyRelu
of features, and l denotes length of the features in the time dimension. function.
Conv denotes convolution, and concat denotes concatenate operation.
(b) The structure of a transformer block, where MSA denotes multi-head
2) SAK Transfer Learning Strategy: Transfer learning strate-
self-attention, LN denotes layer norm, and MLP denotes multilayer per- gies were proved effective to transfer prior information from
ceptron. (c) The structure of the CSP_N block, where N denotes that source domain to target domain [25], [26], [27]. In this study, the
the cross stage partial (CSP) block has N residual convolutions. Conv-1
and Conv-3 denote convolution operations with filter sizes of 1 and 3,
SAK transfer learning strategy was proposed to transfer infor-
respectively. mation from multi-subjects to new users by means of adversarial
learning [28], the parameters of the multi-subject model could be
calibrated in the meanwhile. Fig. 4(a) illustrates the calibration
was used to reduce the input sequence X ∈ Rc∗l by half in process when using the SAK transfer learning strategy, which
the time dimension and double the channel dimension, yield- was comprised of a multiple-subject source domain network
l
ing the output X̃ ∈ R2c∗ 2 without information lost [15]. The (Mnet-s), a new-subject target domain network (Nnet-t), and a
reconstructed data was then encoded by means of convolution, domain feature discriminator (DFD). The Mnet-s was used to
and split into two equal parts, which were fed into two parallel extract features from the source domain (multi-subject training
branches separately. The long-term branch could capture the data), the Nnet-t was used to extract features from the target
sparse long-term relationships from all the temporal signals, domain (new subject training data), and the DFD was used to
while the short-term branch could extract local information from minimize the feature distance between the source domain and
neighboring patches. In this work, the Chunked Transformer the target domain.
Authorized licensed use limited to: Indian Institute of Technology Patna. Downloaded on June 17,2023 at 13:21:30 UTC from IEEE Xplore. Restrictions apply.
1918 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 4, APRIL 2023
Authorized licensed use limited to: Indian Institute of Technology Patna. Downloaded on June 17,2023 at 13:21:30 UTC from IEEE Xplore. Restrictions apply.
LONG et al.: TRANSFER LEARNING BASED CROSS-SUBJECT GENERIC MODEL 1919
Authorized licensed use limited to: Indian Institute of Technology Patna. Downloaded on June 17,2023 at 13:21:30 UTC from IEEE Xplore. Restrictions apply.
1920 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 4, APRIL 2023
III. RESULTS
The closer the CC value is to 1, the closer the predicted
finger motion trajectory is to the actual trajectory, and the A. The Effect of LSTA on the LSTA-Conv Network
higher estimation accuracy of the method can reach. Fig. 6 illustrates the continuous estimation performance of
2) Root mean square error (RMSE) (◦ ): RMSE was used to finger joint angles when using the LSTA-Conv network for each
evaluate the deviation between the estimated and mea- specific subject, wherein the newly proposed LSTA module
sured finger joint angles in degrees (◦ ) [39]. RMSE is was compared with the OLT module and OST module. The
calculated as: results show that the LSTA module outperformed the other two
modules, with averaged CC of 0.825 ± 0.015, averaged RMSE
N (pi − gi )2 of 7.926 ± 0.325, and averaged R2 of 0.727 ± 0.041. These
RM SE = (9)
i=1 N values were 3.9% higher, 1.072 lower, and 6.3% higher when
compared with OLT, while 12.8% higher, 1.659 lower, and 6.6%
3) Coefficient of determination (R2): As a comprehensive higher when compared with OST in terms of CC, RMSE, and R2,
evaluation metric to measure the model’s overall accuracy respectively. The statistical analysis indicated that the proposed
[40], it is defined as the percent variability in the true LSTA module outperformed the OLT in terms of RMSE (p-value
values explained by the estimated values. The larger the < 0.05), and the LSTA outperforms the OST in terms of CC
R2 value is, the better the estimation performance would (p-value < 0.01) and RMSE (p-value < 0.01). These findings
Authorized licensed use limited to: Indian Institute of Technology Patna. Downloaded on June 17,2023 at 13:21:30 UTC from IEEE Xplore. Restrictions apply.
LONG et al.: TRANSFER LEARNING BASED CROSS-SUBJECT GENERIC MODEL 1921
Authorized licensed use limited to: Indian Institute of Technology Patna. Downloaded on June 17,2023 at 13:21:30 UTC from IEEE Xplore. Restrictions apply.
1922 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 4, APRIL 2023
TABLE II
SUMMARY OF THE RESULTS WHEN APPLYING EACH ALGORITHM TO THREE DIFFERENT DATASETS, RESPECTIVELY
Authorized licensed use limited to: Indian Institute of Technology Patna. Downloaded on June 17,2023 at 13:21:30 UTC from IEEE Xplore. Restrictions apply.
LONG et al.: TRANSFER LEARNING BASED CROSS-SUBJECT GENERIC MODEL 1923
strategies significantly improved the generalization ability of the receptive fields on temporal scales could provide more infor-
LSTA-Conv based multi-subject model. It also outperformed the mative features to the model [43], [44]. In this work, the newly
other two transfer learning strategies, the QTF and the XTF. proposed LSTA module extracted sEMG features by using both
Statistical analysis showed that compared with the other five long temporal scale and short temporal scale. This might explain
algorithms (RNN, LSTM, SPGP, CNN-Attention, and LSTA- why our LSTA module achieved better performance than the
Conv) which served as subject-specific models, the newly pro- OLT and OST.
posed CSG model (LSTA-Conv+SAK) achieved significantly How to utilize the training data from multiple subjects to
better performance in terms of CC (p-value < 0.05) and RMSE improve the generalization capability of the CSG model is a
(p-value < 0.05) when using DB2. Similar results were found crucial step. Therefore, how the increasing number of sub-
when using DB1 and DB7. These outcomes suggest that the jects affect the generalization ability of the CSG model was
novel CSG model could provide sound generalization perfor- studied. Results shown in Fig. 7 illustrated linearly increased
mance for new users, and yield higher accuracy for continuous accuracy of the CSG model with increasing number of subjects
estimation of finger kinematics. that was used to build the multi-subject model. This finding
Among all three datasets, DB1 brought the worst estimation confirmed previous conclusion that the diversity of subjects’
performance, while DB7 brought the best estimation accuracy, data plays an important role in improving the generalization
which suggests that the estimation performance of finger kine- performance of the CSG model [6]. However, due to the lim-
matics was also dependent on the dataset quality. Note although ited number of subjects in DB2 (N = 35), we only observed
DB7 brought the best estimation accuracy, the improvement increase of CC, RMSE, and R2 in Fig. 7, but not turning
of estimation performance with DB2 was consistently higher points to reach the plateau. Therefore, the optimal number of
by comparing the newly proposed CSG model with other two subjects to reach maximum generalization needs to be further
transfer learning based CSG models and five subject-specific examined.
models. We further investigated the contribution of the SAK transfer
learning strategy on the CSG model by means of comparison.
Results in Fig. 8 showed that in terms of CC and RMSE, the
IV. DISCUSSION SAK based transfer learning strategy significantly improved the
Aiming at establishing a CSG model to estimate the finger generalization ability of the CSG model than the QTL and XTL.
kinematics for new users, a novel transfer learning based model The QTL only updated the parameters of the MLP layer of the
was proposed in this study. Firstly, a novel LSTA module was multi-subject model [14], while SAK updated all parameters of
used to extract richer temporal features of sEMG when to the multi-subject model as a whole. Better parameter optimiza-
build the multi-subject model. Then, the SAK transfer learning tion might account for the superiority of the SAK over QTL. On
strategy was used to update the parameters of the multi-subject the other hand, a L2-regularisation penalty function was adopted
model, which was used to estimate the finger kinematics of new to map the training data from multiple subjects to a new subject
users afterwards. The overall performance of the newly proposed when using the XTL. While the SAK was designed to work in an
CSG model was validated on three public datasets. Compared adversarial fashion by following the idea of adversarial learning.
with five subject-specific models, as well as two transfer learning It could overcome the problem of domain shifting in comparison
based approaches, our CSG model brought not only higher to the XTL. More importantly, it could exploit the information
accuracy for subject-specific finger kinematics estimation, but from source domain to deal with the problems of insufficient
also better generalization performance for new users. We also training data in the target domain [45]. These reasons might
investigated the effect of LSTA on the LSTA-Conv network, the explain the better performance of the SAK transfer learning
effect of number of subjects in training set on the CSG model, strategy in comparison to XTL.
and the effect of SAK transfer learning strategy on the CSG Finally, we conducted overall validation of the CSG model
model. This work made an important step towards promoting based on three datasets, DB1, DB2, and DB7. Unsurprisingly,
the deep learning approaches into the robotic hand control and once applied as the multi-subject model for each algorithm
other HMI scenarios. (RNN, LSTM, SPGP, CNN-Attention, or LSTA-Conv), the
To examine if the newly proposed LSTA module could ex- performance would degrade significantly in comparison to the
tract richer sEMG features, the finger kinematics estimation subject-specific model. Especially, the performance degraded
performance when using LSTA was compared with than when dramatically when tested with new subjects independent of the
using OLT, and OST. Our results in Fig. 6 showed that the training set. However, the adoption of SAK transfer learning
LSTA achieved the best performance. Moreover, the OLT out- strategy improved the generalization capability of the LAST-
performed OST, which suggests that the long-term features con- Conv network significantly regardless of the datasets, and even
tribute more than the short-term features for accurate estimation outperformed all the five subject-specific models. Among all
of finger joint angles. The possible reason is that larger temporal three datasets, the DB1 brought the worst performance in com-
scale could provide more representative sEMG features, while parison to the DB2 and DB7. The possible reason is that the
little useful information could be obtained from short temporal sampling rate in DB1 was as low as 100 Hz, in contrast, DB2
scales, especially for sEMG, which are easily contaminated by and DB7 both have a sampling rate of 2000 Hz. The low
multiple factors during acquisition [41], [42]. This finding is sampling rate resulted in insufficient ability to represent the
also consistent with previous conclusions that filters with larger complicated neuromuscular activities, and then caused lower
Authorized licensed use limited to: Indian Institute of Technology Patna. Downloaded on June 17,2023 at 13:21:30 UTC from IEEE Xplore. Restrictions apply.
1924 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 4, APRIL 2023
estimation performance of finger kinematics. As for the higher [11] B. Qian et al., “Dynamic multi-scale convolutional neural network for
generalization performance achieved by the LSTA-Conv +SAK time series classification,” IEEE Access, vol. 8, pp. 109732–109746, 2020,
doi: 10.1109/ACCESS.2020.3002095.
model in comparison to the LSTA-Conv network when using [12] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolu-
DB2, the possible reason is that with DB2, more subjects were tions,” in Proc. 4th Int. Conf. Learn. Representations, (Conference Track
used to build the multi-subject model, which could bring better Proceedings), San Juan, Puerto Rico, May 2-4, 2016. [Online]. Available:
http://arxiv.org/abs/1511.07122
generalization performance. [13] E. Rahimian, S. Zabihi, S. F. Atashzar, A. Asif, and A. Mohammadi,
There were also some limitations in this work. First off, the “XceptionTime: Independent time-window xceptiontime architecture for
newly proposed CSG model is essentially a supervised algo- hand gesture classification,” in Proc. IEEE Int. Conf. Acoust., Speech
Signal Process., 2020, pp. 1304–1308.
rithm, i.e., to calibrate the built multi-subject model requires [14] E. Rahimian, S. Zabihi, A. Asif, D. Farina, S. F. Atashzar, and A. Moham-
data from the new user during transfer learning. Unsupervised madi, “Hand gesture recognition using temporal convolutions and attention
transfer learning without data from the new user is a promising mechanism,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process.,
2022, pp. 1196–1200, doi: 10.1109/ICASSP43922.2022.9746174.
research direction in the future work. Secondly, during window [15] J. Wang, T. Xiao, Q. Gu, and Q. Chen, “YOLOv5_CSL_F: YOLOv5’s
segmentation, we used sliding window with time length of 1000 loss improvement and attention mechanism application for remote sensing
ms and increasement of 50 ms for DB2 and DB7. While for image object detection,” in Proc. Int. Conf. Wireless Commun. Smart Grid,
2021, pp. 197–203, doi: 10.1109/ICWCSG53609.2021.00045.
DB1, the sliding window size was set as 1600 ms, and the [16] Z. Liu et al., “Swin transformer: Hierarchical vision transformer using
increasement was set as 100ms because of its lower sampling shifted windows,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021,
frequencies (100 Hz). The selection of these window parameters pp. 9992–10002, doi: 10.1109/ICCV48922.2021.00986.
[17] C.-Y. Wang, H.-Y. Mark Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and I.-H.
was based on the tradeoff between acceptable accuracy and Yeh, “CSPNet: A new backbone that can enhance learning capability of
controller delay. In our ongoing work, the optimal window CNN,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Work-
configurations which would affect the online performance needs shops, 2020, pp. 1571–1580, doi: 10.1109/CVPRW50498.2020.00203.
[18] H. Wu et al., “CvT: Introducing convolutions to vision transform-
to be intensively investigated. Lastly, all algorithms were only ers,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 22–31,
implemented offline in this work. The newly proposed CSG doi: 10.1109/ICCV48922.2021.00009.
model took less than 2ms to calculate each finger joint angle [19] K. Yuan, S. Guo, Z. Liu, A. Zhou, F. Yu, and W. Wu, “Incorporating con-
volution designs into visual transformers,” in Proc. IEEE/CVF Int. Conf.
sample point. In practical scenarios, evaluating the real-time Comput. Vis., 2021, pp. 559–568, doi: 10.1109/ICCV48922.2021.00062.
performance of this algorithm and accelerating its computation [20] X. Chu et al., “Conditional positional encodings for vision transform-
will be conducted in our future work. ers,” vol. abs/2102.10882, 2021. [Online]. Available: https://arxiv.org/abs/
2102.10882
[21] F. Chollet, “Xception: Deep learning with depthwise separable convo-
REFERENCES lutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017,
pp. 1800–1807, doi: 10.1109/CVPR.2017.195.
[1] T. Matsubara and J. Morimoto, “Bilinear modeling of EMG signals to [22] H.-C. Shin et al., “Deep convolutional neural networks for computer-aided
extract user-independent features for multiuser myoelectric interface,” detection: CNN architectures, dataset characteristics and transfer learn-
IEEE Trans. Biomed. Eng., vol. 60, no. 8, pp. 2205–2213, Aug. 2013, ing,” IEEE Trans. Med. Imag., vol. 35, no. 5, pp. 1285–1298, May 2016,
doi: 10.1109/tbme.2013.2250502. doi: 10.1109/TMI.2016.2528162.
[2] R. N. Khushaba, “Correlation analysis of electromyogram signals for [23] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
multiuser myoelectric interfaces,” IEEE Trans. Neural Syst. Rehabil. Eng., recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016,
vol. 22, no. 4, pp. 745–755, Jul. 2014, doi: 10.1109/tnsre.2014.2304470. pp. 770–778.
[3] B. Xue et al., “Multiuser gesture recognition using sEMG signals [24] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network
via canonical correlation analysis and optimal transport,” Comput. training by reducing internal covariate shift,” in Proc. Int. Conf. Mach.
Biol. Med., vol. 130, Mar. 2021, Art. no. 104188, doi: 10.1016/j. Learn., 2015, pp. 448–456.
compbiomed.2020.104188. [25] H.-C. Shin et al., “Deep convolutional neural networks for computer-aided
[4] X. Sheng, B. Lv, W. Guo, and X. Zhu, “Common spatial-spectral anal- detection: CNN architectures, dataset characteristics and transfer learn-
ysis of EMG signals for multiday and multiuser myoelectric interface,” ing,” IEEE Trans. Med. Imag., vol. 35, no. 5, pp. 1285–1298, May 2016,
Biomed. Signal Process. Control, vol. 53, Aug. 2019, Art. no. 101572, doi: 10.1109/TMI.2016.2528162.
doi: 10.1016/j.bspc.2019.101572. [26] H. Shan et al., “3-D convolutional encoder-decoder network for
[5] L. Pan, D. Crouch, and H. Huang, “Myoelectric control based on a generic low-dose CT via transfer learning from a 2-D trained network,”
musculoskeletal model: Towards a multi-user neural-machine interface,” IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1522–1534, Jun. 2018,
IEEE Trans. Neural Syst. Rehabil. Eng., vol. 26, no. 7, pp. 1435–1442, doi: 10.1109/TMI.2018.2832217.
Jul. 2018, doi: 10.1109/TNSRE.2018.2838448. [27] A. van Opbroek, M. A. Ikram, M. W. Vernooij, and M. de Bruijne,
[6] X. Jiang, B. Bardizbanian, C. Dai, W. Chen, and E. A. Clancy, “Data man- “Transfer learning improves supervised image segmentation across imag-
agement for transfer learning approaches to elbow EMG-torque model- ing protocols,” IEEE Trans. Med. Imag., vol. 34, no. 5, pp. 1018–1030,
ing,” IEEE Trans. Biomed. Eng., vol. 68, no. 8, pp. 2592–2601, Aug. 2021, May 2015, doi: 10.1109/TMI.2014.2366792.
doi: 10.1109/TBME.2021.3069961. [28] I. Goodfellow et al., “Generative adversarial networks,” Commun. ACM,
[7] C. Lin, X. Chen, W. Guo, N. Jiang, D. Farina, and J. Su, “A BERT based vol. 63, no. 11, pp. 139–144, 2020.
method for continuous estimation of cross-subject hand kinematics from [29] Y. Luo, L. Zheng, T. Guan, J. Yu, and Y. Yang, “Taking a closer look at
surface electromyographic signals,” IEEE Trans. Neural Syst. Rehabil. domain shift: Category-level adversaries for semantics consistent domain
Eng., early access, Oct. 21, 2022, doi: 10.1109/TNSRE.2022.3216528. adaptation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
[8] M. Atzori et al., “Characterization of a benchmark database for myoelectric 2019, pp. 2502–2511, doi: 10.1109/CVPR.2019.00261.
movement classification,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 23, [30] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely
no. 1, pp. 73–83, Jan. 2015, doi: 10.1109/TNSRE.2014.2328495. connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis.
[9] M. Atzori et al., “Electromyography data for non-invasive naturally- Pattern Recognit., 2017, pp. 2261–2269, doi: 10.1109/CVPR.2017.243.
controlled robotic hand prostheses,” Sci. Data, vol. 1, no. 1, pp. 1–13, [31] Z. Qin, S. Stapornchaisit, Z. He, N. Yoshimura, and Y. Koike, “Multi–joint
2014. angles estimation of forearm motion using a regression model,” Front.
[10] A. Krasoulis, I. Kyranou, M. S. Erden, K. Nazarpour, and S. Vijayakumar, Neurorobot., vol. 103, 2021, Art. no. 685961.
“Improved prosthetic hand control with concurrent use of myoelectric and [32] L. Xuhong, Y. Grandvalet, and F. Davoine, “Explicit inductive bias for
inertial measurements,” J. Neuroeng. Rehabil., vol. 14, no. 1, pp. 1–14, transfer learning with convolutional networks,” in Proc. Int. Conf. Mach.
2017. Learn., 2018, pp. 2825–2834.
Authorized licensed use limited to: Indian Institute of Technology Patna. Downloaded on June 17,2023 at 13:21:30 UTC from IEEE Xplore. Restrictions apply.
LONG et al.: TRANSFER LEARNING BASED CROSS-SUBJECT GENERIC MODEL 1925
[33] Y. Geng et al., “A CNN-attention network for continuous esti- [40] S. Holm, “A simple sequentially rejective multiple test procedure,” Scand.
mation of finger kinematics from surface electromyography,” IEEE J. Statist., vol. 6, pp. 65–70, 1979.
Robot. Automat. Lett., vol. 7, no. 3, pp. 6297–6304, Jul. 2022, [41] M. Z. Jamal, “Signal acquisition using surface EMG and circuit de-
doi: 10.1109/LRA.2022.3169448. sign considerations for robotic prosthesis,” Comput. Intell. Electromyogr.
[34] M. Xiloyannis, C. Gavriel, A. A. C. Thomik, and A. A. Faisal, “Gaussian Anal.-A Perspective Curr. Appl. Future Challenges, vol. 18, pp. 427–448,
process autoregression for simultaneous proportional multi-modal pros- 2012.
thetic control with natural hand kinematics,” IEEE Trans. Neural Syst. [42] J. Li and Q. Wang, “Multi-modal bioelectrical signal fusion analysis based
Rehabil. Eng., vol. 25, no. 10, pp. 1785–1801, Oct. 2017. on different acquisition devices and scene settings: Overview, challenges,
[35] E. Snelson and Z. Ghahramani, “Sparse Gaussian processes using and novel orientation,” Inf. Fusion, vol. 79, pp. 229–247, 2022.
pseudo-inputs,” in Proc. Adv. Neural Inf. Process. Syst., 2006, vol. 18, [43] L. Wu, X. Zhang, K. Wang, X. Chen, and X. Chen, “Improved high-density
pp. 1257–1264. myoelectric pattern recognition control against electrode shift using data
[36] K. Englehart and B. Hudgins, “A robust, real-time control scheme for augmentation and dilated convolutional neural network,” IEEE Trans.
multifunction myoelectric control,” IEEE Trans. Biomed. Eng., vol. 50, Neural Syst. Rehabil. Eng., vol. 28, no. 12, pp. 2637–2646, Dec. 2020,
no. 7, pp. 848–854, Jul. 2003. doi: 10.1109/TNSRE.2020.3030931.
[37] M. G. Asogbon et al., “Appropriate feature set and window parameters [44] A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural Inf.
selection for efficient motion intent characterization towards intelligently Process. Syst., 2017, vol. 30, pp. 6000–6010.
smart EMG-PR system,” Symmetry, vol. 12, no. 10, 2020, Art. no. 1710. [45] W.-C. Hung, Y.-H. Tsai, Y.-T. Liou, Y.-Y. Lin, and M.-H. Yang, “Ad-
[38] W. Guo et al., “Long exposure convolutional memory network for accurate versarial learning for semi-supervised semantic segmentation,” in Proc.
estimation of finger kinematics from surface electromyographic signals,” Brit. Mach. Vis. Conf., Newcastle, U.K., Sep. 3-6, 2018, p. 65. [Online].
J. Neural Eng., vol. 18, no. 2, 2021, Art. no. 026027. Available: http://bmvc2018.org/contents/papers/0200.pdf
[39] H. Mao, P. Fang, and G. Li, “Simultaneous estimation of multi-finger
forces by surface electromyography and accelerometry signals,” Biomed.
Signal Process. Control, vol. 70, 2021, Art. no. 103005.
Authorized licensed use limited to: Indian Institute of Technology Patna. Downloaded on June 17,2023 at 13:21:30 UTC from IEEE Xplore. Restrictions apply.