0% found this document useful (0 votes)
69 views18 pages

Deep Learning For Inertial Positioning A Survey

Uploaded by

abdelhadilammini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views18 pages

Deep Learning For Inertial Positioning A Survey

Uploaded by

abdelhadilammini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

10506 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO.

9, SEPTEMBER 2024

Deep Learning for Inertial Positioning: A Survey


Changhao Chen , Member, IEEE, and Xianfei Pan

Abstract— Inertial sensors are widely utilized in smart-


phones, drones, vehicles, and wearable devices, playing a
crucial role in enabling ubiquitous and reliable localization.
Inertial sensor-based positioning is essential in various appli-
cations, including personal navigation, location-based security,
and human-device interaction. However, low-cost MEMS inertial
sensors’ measurements are inevitably corrupted by various error
sources, leading to unbounded drifts when integrated doubly
in traditional inertial navigation algorithms, subjecting inertial
positioning to the problem of error drifts. In recent years, with
the rapid increase in sensor data and computational power, deep
learning techniques have been developed, sparking significant
research into addressing the problem of inertial positioning.
Relevant literature in this field spans across mobile computing,
robotics, and machine learning. In this article, we provide a
comprehensive review of deep learning-based inertial positioning
and its applications in tracking pedestrians, drones, vehicles, Fig. 1. Inertial sensors are ubiquitous in modern platforms such as smart-
phones, drones, intelligent vehicles, and VR/AR devices. They play a critical
and robots. We connect efforts from different fields and discuss
role in enabling completely egocentric motion tracking and positioning,
how deep learning can be applied to address issues such as making them essential for a range of applications.
sensor calibration, positioning error drift reduction, and multi-
sensor fusion. This article aims to attract readers from various
backgrounds, including researchers and practitioners interested
in the potential of deep learning-based techniques to solve inertial stations) [1], supporting security and safety services (e.g., aid-
positioning problems. Our review demonstrates the exciting ing first-responders) [2], enabling smart city/infrastructure, and
possibilities that deep learning brings to the table and provides facilitating human-device interaction [3]. Compared to other
a roadmap for future research in this field. positioning solutions such as vision or radio, inertial position-
Index Terms— Inertial navigation, deep learning, inertial ing is completely ego-centric, works indoors and outdoors,
sensor calibration, pedestrian dead reckoning, sensor fusion, and is less affected by environmental factors such as complex
visual-inertial odometry. lighting conditions and scene dynamics.
Unfortunately, the measurements obtained from low-cost
I. I NTRODUCTION
MEMS IMUs are subject to several error sources such as

T HE inertial Measurement Unit (IMU) is widely used


in smartphones, drones, vehicles, and VR/AR devices.
It continuously measures linear velocity and angular rate
bias error, temperature-dependent error, random sensor noise,
and random-walk noise. In classical inertial navigation mech-
anisms, angular rates are integrated into orientation, and
and tracks the motion of these platforms, as illustrated in based on the acquired attitude, acceleration measurements are
Figure 1. With the advancements in Micro-Electro-Mechanical transformed into the navigation frame. Finally, the transformed
Systems (MEMS) technology, today’s MEMS IMUs are small, accelerations are doubly integrated into locations [4], [5].
energy-efficient, and cost-effective. Inertial positioning (nav- Traditional inertial navigation algorithms are designed and
igation) calculates attitude, velocity, and position based on described using concrete physical and mathematical rules.
inertial measurements, making it a crucial element in various Under ideal conditions, sensor errors are small enough to
location-based applications, including locating and navigating allow hand-designed inertial navigation algorithms to produce
individuals in transportation infrastructures (e.g., airports, train accurate and reliable pose estimates. However, in real-world
Manuscript received 3 July 2023; revised 29 August 2023 and 9 March applications, inevitable measurement errors cause significant
2024; accepted 12 March 2024. Date of publication 4 April 2024; date of problems for inertial positioning systems without constraints,
current version 29 August 2024. This work was supported by the National which can fail within seconds. In this process, even a minor
Natural Science Foundation of China (NFSC) under Grant 62103427, Grant
62073331, Grant 62103430, and Grant 62103429. The work of Changhao error can be amplified exponentially, resulting in unbounded
Chen was supported by the Young Elite Scientist Sponsorship Program error drifts.
by China Association for Science and Technology (CAST) under Grant Previous researchers have attempted to address the prob-
YESS20220181. The Associate Editor for this article was D. F. Wolf.
(Changhao Chen and Xianfei Pan are co-first authors.) (Corresponding lem of error drifts in inertial navigation by incorporating
author: Changhao Chen.) domain-specific knowledge or other sensor. In the context
The authors are with the College of Intelligence Science and Technology, of pedestrian tracking, exploiting the periodicity of human
National University of Defense Technology, Changsha 410073, China (e-mail:
[email protected]). walking is important, and the process of pedestrian dead reck-
Digital Object Identifier 10.1109/TITS.2024.3381161 oning (PDR) involves detecting steps, estimating step length
1558-0016 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: EURECOM. Downloaded on September 06,2024 at 00:07:59 UTC from IEEE Xplore. Restrictions apply.
CHEN AND PAN: DEEP LEARNING FOR INERTIAL POSITIONING: A SURVEY 10507

Fig. 2. An overview of our survey structure.

and heading, and updating the user’s location to mitigate • At the algorithm level, deep neural networks are con-
error drifts from exponential to linear increase [6]. Zero- structed to partially or completely replace modules in
velocity update (ZUPT) involves attaching the IMU to the traditional inertial navigation mechanisms that enhance
user’s foot and detecting the zero-velocity phase, which is and correct IMU integration in general. Additionally, deep
then used in Kalman filtering to correct inertial navigation learning serves as a powerful tool for fusing inertial data
states [7]. Platforms such as drones or robots equipped with with other sensor modalities such as cameras and LiDAR.
other sensors such as cameras or LiDAR can significantly Deep learning based IMU integration and fusion methods
improve the performance of pure inertial solutions by effec- enable improved positioning accuracy and reliability.
tively integrating inertial sensors with these modalities through • At the application level, we delve into specific use
filtering or smoothing [8], [9], [10]. However, these solutions cases where deep learning methods can be applied to
have limitations in specific application domains and are unable inertial positioning for pedestrians, vehicles, drones, and
to address the fundamental problem of inertial navigation. robots. For instance, we explore how learned motion
Recently, deep learning has shown impressive performance patterns can enhance pedestrian dead reckoning (PDR)
in various fields, including computer vision, robotics, and sig- and zero-velocity update (ZUPT) algorithms.
nal processing [11]. It has also been introduced to address the Finally, we thoroughly discuss the advantages and limita-
challenges of inertial positioning. Deep neural network models tions of existing works in this domain. We also identify the
have been leveraged to calibrate inertial sensor noises, reduce key challenges and future opportunities that lie ahead in this
the drifts of inertial navigation mechanisms, and fuse inertial research direction.
data with other sensor information. These research works
have attracted significant attention, as they show potential for B. Comparison With Other Surveys
exploiting massive data to generate data-driven models instead Compared with other deep learning surveys, such as those
of relying on concrete physical or mathematical models. With focused on object detection [12], semantic segmentation [13],
the rapid development of deep learning techniques, learning- and robotics [14], survey on deep learning based inertial
based inertial solutions have become even more promising. positioning is relatively scarce and hard to find. While a
broader survey on machine learning enhanced inertial sensing
does exist [15], our survey narrows the focus to deep learn-
A. Taxonomy ing based inertial positioning, providing deeper insights and
This survey aims to provide a comprehensive review of analysis of the fast-evolving developments in this area over
deep-learning-based approaches to inertial positioning, includ- the past five years (2018-2022). Other relevant surveys, such
ing measurement calibration, inertial positioning algorithms, as those focused on inertial pedestrian positioning [6], indoor
and sensor fusion. To achieve this, we establish a taxonomy of positioning [16], step length estimation [17], and pedestrian
existing deep learning-based inertial positioning approaches, dead reckoning [18], do not cover recent deep learning based
and conduct an analysis of their effectiveness at three levels: solutions. To the best of our knowledge, this article is the first
sensor level, algorithm level, and application level, as illus- survey that discusses deep learning based inertial positioning
trated in Figure 2. thoroughly and deeply.
• At the sensor level, deep learning is employed as a
calibration method for inertial sensors. It effectively mod- C. Survey Organization
els error sources in inertial measurements and implicitly The rest of this survey is organized as follows: Section II
eliminates corrupted measurement errors and noise. provides a brief overview of classical inertial navigation

Authorized licensed use limited to: EURECOM. Downloaded on September 06,2024 at 00:07:59 UTC from IEEE Xplore. Restrictions apply.
10508 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2024

mechanisms. Sections III, IV, and V delve into methods and angular rates of body frame with respect to the navigation
algorithms pertaining to sensor calibration based on deep frame. To simplify inertial motion model, this article assumes
learning, inertial navigation algorithms, and sensor fusion, that the biases and noises of sensor in Equation 1 and 2
respectively. Sections VI, VII, VIII, and IX explore spe- have been removed in the stage of inertial sensor calibration.
cific applications of deep learning techniques within the (R, p) are defined orientation and position variables. From the
realm of pedestrian inertial positioning, inertial positioning for kinematic model of IMU, we can have
vehicles, UAVs, and robots, IMU-integrated positioning, and
R (t + 1) = Rnb (t)Rbbtt+1
 n
human motion and activity recognition, respectively. Section X  b


presents representative public datasets, evaluation metrics, vn (t + 1) = vn (t) + an (t)dt (3)
and a performance comparison between learning-based and  pn (t + 1) = pn (t) + vn (t)dt + 1 an (t)dt 2


traditional inertial positioning models. Finally, Section XI 2
concludes by discussing the benefits, challenges, and oppor- where an , vn , pn are acceleration, velocity and position in the
tunities. navigation frame, Rnb represents the rotation from the body
frame to the navigation frame.
II. C LASSICAL I NERTIAL NAVIGATION M ECHANISMS Firstly, orientation is updated by inferring the rotation
This section provides an overview of classical inertial nav- matrix (t) via Rodriguez formula:
igation mechanisms and highlights their limitations. It begins (t) = Rbbtt+1
by presenting the inertial measurement model and classical
[σ ×] [σ ×]2
strapdown inertial navigation method. Subsequently, two solu- = I + sin(σ ) + (1 − cos(σ )) , (4)
tions that aim to reduce the drifts of inertial navigation system, σ σ2
namely pedestrian dead reckoning (PDR) and zero-velocity where rotation vector σ = ω(t)dt.
update (ZUPT), are discussed, with a specific focus on their To update velocity, the accelerations in navigation frame can
applicability in pedestrian tracking scenarios. The section be expressed as a function of measured accelerations, i.e.
finally introduces sensor fusion approaches that integrate iner-
an (t) = Rnb (t − 1)ab (t) − gn (5)
tial data with information from other sensors.
Then, the accelerations in navigation frame an (t) are inte-
A. Inertial Measurement Model grated into the velocity in the navigation frame vn (t), and the
location pn (t) is finally updated by integrating the velocity via
Inertial measurements acquired from low-cost MEMS IMUs
Equation 4.
are often corrupted by various types of error sources, resulting
As we can see, in this process, even a small measurement
in unbounded error drifts when integrated in strapdown iner-
error can be exponentially amplified, leading to the problem
tial navigation systems (SINS). These error sources can be
of inertial error drifts. In the past, high-precision inertial
classified into two categories: deterministic errors and random
sensors such as laser or fiber inertial sensors could keep the
errors [19]. Deterministic errors comprise bias error, non-
measurement error small enough to maintain the accuracy
orthogonality error, misalignment error, scale-factor error, and
of INS. However, due to the size and cost limitations of
temperature-dependent error. On the other hand, random errors
current MEMS IMUs, compensation methods are necessary
include random sensor noise and random-walk noise resulting
to mitigate the corresponding error drifts. One approach
from long-term operation, which are challenging to model and
is to introduce domain-specific knowledge or other sensor
eliminate.
information.
Raw IMU measurements, i.e. accelerations â and angular
rates ω̂, can be formulated by
C. Domain Specific Knowledge
â = a + ba + na (1)
1) Pedestrian Dead Reckoning: Pedestrian dead reckoning
ω̂ = ω + bω + nω (2) (PDR) is a method that leverages domain-specific knowledge
where ba and bω are acceleration bias and gyroscope bias, na about human walking to track pedestrian motion. PDR com-
and nω are additive noises above accelerometer and gyroscope. prises three main steps: step detection, heading and stride
Traditionally, it is important to calibrate inertial sensors length estimation, and location update [6]. In step detection,
before running an inertial navigation algorithm that involves PDR uses the threshold of inertial data to identify step peaks or
integrating inertial data into system states. One effective tool stances and segment the corresponding inertial data. Dynamic
for achieving this is the Allan variance method [20], which stride length estimation is then achieved via an empirical
models the random process of inertial sensor errors. formula, known as the Weinberg formula [21], which considers
the segmented accelerations and user’s height. Similar to
SINS, heading estimation is done by integrating gyroscope sig-
B. Strapdown Inertial Navigation System nals into orientation changes and adding orientation changes
Inertial sensor measures linear accelerations ab (t) and angu- to the initial orientation to obtain the current heading. Finally,
lar rates ωnb (t) of attached user body at the timestep t. the estimated heading and stride length are used to update
b represents the body frame, while n denotes the navigation the pedestrian’s location. By avoiding double integration of
(world) frame, i.e. the navigation frame. ωnb (t) means that the accelerations and incorporating a reliable stride estimation

Authorized licensed use limited to: EURECOM. Downloaded on September 06,2024 at 00:07:59 UTC from IEEE Xplore. Restrictions apply.
CHEN AND PAN: DEEP LEARNING FOR INERTIAL POSITIONING: A SURVEY 10509

model, PDR effectively reduces inertial positioning drifts. III. L EARNING TO C ALIBRATE I NERTIAL S ENSOR
However, inaccurate step detection and stride estimation can Inertial measurements obtained from low-cost IMUs are
still occur, leading to large system error drifts. Moreover, often affected by various sources of noise, making it challeng-
PDR is limited to pedestrian navigation as it depends on the ing to distinguish the true values from the sources of error.
periodicity of human walking. The error sources are a complex interplay of deterministic
2) Zero-Velocity Update: The Zero-velocity update (ZUPT) and random factors, further complicating the issue. To address
algorithm is designed to compensate for the errors of SINS the impact of measurement errors, the powerful nonlinear
by identifying the still phase of human walking and using approximator capabilities of deep neural networks can be
zero-velocity as observations in a Kalman filter [7]. To facil- exploited. A natural approach is to develop a deep neural
itate the detection of the still phase, the IMU is typically network that receives the raw inertial measurements as input
attached to the user’s foot, as it undergoes significant motion and produces the calibrated inertial measurements as output,
and reflects walking patterns well. Techniques such as peak- representing the actual platform motion. By training this neural
detection [22], zero-crossings [23], or auto-correlation [24] can model on labeled datasets using stochastic gradient descent
be used to analyze the inertial data and segment the zero- (SGD) [35], the inertial measurement errors can be implicitly
velocity phase. Once the still phase is detected, zero-velocity is learned and corrected by the neural network. It is important
used as pseudo-measurements in the filtering process, thereby to note that the quality of the collected training dataset has a
limiting the error drifts of open-loop integration. However, significant impact on the performance of the model.
the effectiveness of ZUPT depends on the assumption that Before the age of deep learning, attempts were made to
the user’s foot remains completely still, and any incorrect use neural networks to learn the measurement errors of iner-
still phase detection or small motion disturbances can cause tial sensors. For example, a 1-layer artificial neural network
navigation system drifts. Additionally, ZUPT is limited to (ANN) [36] is proposed to model the distribution of gyro
pedestrian tracking. drifts, and is able to successfully approximate gyro drifts with
such a ‘shallow’ network [27]. This method has an advantage
over Kalman filtering (KF) based calibration methods in that
D. Integrating IMU With Other Sensors
it does not require setting hyper-parameters before use, such
Integrating the IMU with other sensors, such as camera [9], as the sensor noise matrix in KF.
LiDAR [10], UWB [25], and magnetometer [26], can provide In recent years, there has been increasing interest in using
promising results as it allows for exploiting their complemen- deep neural networks (DNN) with multiple layers to solve the
tary properties. By fusing the data from multiple sensors, the inertial sensor calibration problem. With the addition of more
accuracy and robustness of pose estimation can be significantly layers, neural networks become more expressive and can learn
improved, making it a general solution for all platforms. complex relationships between the raw inertial measurements
However, in some scenarios, certain sensors, such as visual and the true motion of the vehicle. One approach, proposed
perception, may not be available or highly dependent on by [28], uses a Convolutional Neural Network (ConvNet)
the environment, which can negatively affect the egocentric to remove error noises from inertial measurements. They
property of inertial positioning. Additionally, in sensor fusion collected inertial data from two grades of IMU under given
approaches, it is essential to consider various factors such as constant accelerations and angular rates. The ConvNet frame-
sensor calibration, initialization, and time-synchronization. work takes raw inertial measurements (from low-precision
IMU) as inputs and tries to output acceleration and angular rate
references (from high-precision IMU). Their experiment shows
E. Discussion that deep learning can remove some of the sensor error and
As previously mentioned, classical inertial navigation meth- improve test accuracy. However, this work has not been vali-
ods are designed to solve specific problems within their dated in a real navigation setup, and thus it cannot demonstrate
respective domains. However, their performance is often lim- how learning-based sensor calibration reduces error drifts in
ited due to real-world issues such as imperfect modeling, inertial navigation. Both of the mentioned methods require
measurement errors, and environmental influences, resulting reference data from high-precision IMUs as labels to train
in inevitable error drifts. Researchers in the field of inertial the networks, as shown in Figure 3 (a). However, acquiring
navigation are therefore constantly searching for ways to build reference data from high-precision IMUs can be costly.
models that can tolerate measurement errors and mitigate In addition to directly learning from pseudo ground-truth
system drifts. In addition to relying on Newtonian physical IMU labels, another approach is to enable neural network-
rules, it has been observed that domain-specific knowledge, based calibration models to produce inertial data that can
whether it be an experienced human walking model or scene be integrated into more accurate orientation estimation. This
geometry, can serve as a useful constraint in reducing the error is illustrated in Figure 3 (b). By producing more accurate
drifts of inertial positioning systems. One potential approach orientation values, the neural network implicitly removes the
to improving inertial positioning accuracy and robustness is corrupted noises above inertial data. For example, OriNet [29]
to exploit massive inertial data to extract domain-specific inputs 3-dimensional gyroscope signals into an LSTM net-
knowledge and construct a data-driven model. In the next work [70] to obtain calibrated gyroscope signals, which are
sections, we will delve deeper into this problem and explore then integrated with the orientation at the previous timestep to
potential solutions. generate orientation estimates at the current timestep. A loss
Authorized licensed use limited to: EURECOM. Downloaded on September 06,2024 at 00:07:59 UTC from IEEE Xplore. Restrictions apply.
10510 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2024

TABLE I
A S UMMARY OF E XISTING M ETHODS ON D EEP L EARNING BASED I NERTIAL S ENSOR C ALIBRATION

Fig. 3. An overview of existing deep learning based inertial sensor calibration


methods.

function between orientation estimates and real orientation


is defined and minimized for model training. OriNet has
been evaluated on a public drone dataset, demonstrating an
improvement in orientation performance of approximately
80%. A similar approach is [31], who calibrates gyroscope
using ConvNet, reporting good attitude estimation accuracy.
Calib-Net [34] is another ConvNet framework that denoises
gyroscope data by extracting effective spatio-temporal features
from inertial data. Calib-Net is based on dilation ConvNet [71]
to compensate the gyro noise, as illustrated in Figure 4.
This model is able to significantly reduce orientation error
compared to raw IMU integration. When this learned iner-
Fig. 4. An example of gyro calibration results (reprint from Calib-Net [34]).
tial calibration model is incorporated into a visual-inertial Compared with raw IMU integration, deep learning based calibration models
odometry (VIO), it further improves localization performance significantly reduce attitude drifts.
and outperforms representative VIOs such as VINS-mono [9].
Other efforts in this direction include works by [32] and [33].
Instead of directly calibrating inertial sensors with DNNs, not require human intervention and can automatically learn
some researchers have explored using DNNs to gener- error models. However, it is important to note that the learned
ate parameters that improve classical calibration algorithms, error model is typically dependent on the specific sensor or
as shown in Figure 3 (c). One example is the work by [30], platform used. Therefore, a change in sensor or user can result
who models inertial sensor calibration as a Markov Decision in different data distributions, leading to reduced performance
Process and proposes to use deep reinforcement learning [72] of the learned model. Additionally, further analysis is needed
to learn the optimal calibration parameters. The authors to determine which types of noise can be effectively removed
demonstrated the effectiveness of their approach in calibrating by learning-based calibration methods.
inertial sensors for a visual-inertial odometry (VIO) system.
As discussed above, deep learning-aided inertial sensor IV. L EARNING TO C ORRECT IMU I NTEGRATION
calibration methods (listed in Table I) have shown promising In addition to sensor calibration, researchers are exploring
results in removing corrupted sensor noises and improving the various methods for using deep learning to construct inertial
accuracy of inertial positioning systems. These methods do positioning models that can either partially or completely

Authorized licensed use limited to: EURECOM. Downloaded on September 06,2024 at 00:07:59 UTC from IEEE Xplore. Restrictions apply.
CHEN AND PAN: DEEP LEARNING FOR INERTIAL POSITIONING: A SURVEY 10511

replace classical inertial navigation mechanisms. This section a local gravity-aligned frame. The learned displacements and
provides an overview of how deep learning can be used to covariances are then incorporated into an extended Kalman
correct IMU integration in general. Next sections will discuss filter as observation states that estimate full-states of orienta-
deep learning approaches for pedestrian tracking applications, tion, velocity, location, and IMU bias. In a 3-7 minute human
and present deep inertial solutions for vehicles, UAVs, and motion scenario, the localization error of TLIO is within
robots. A summary of existing works and their contributions 3 meters 90% of the time.
is provided in Table II. Another category of deep learning models aims to leverage
In deep learning-based inertial positioning approaches, learned velocity to correct accelerations, as illustrated in
a user’s absolute velocity can be inferred from a sequence Figure 6(b). A prominent example is RIDI [38], which trains
of IMU data using a deep neural network. This velocity a deep neural network to predict velocity vectors from inertial
information can then be used as a key constraint to reduce data, which are then used to correct linear accelerations by
the drifts in IMU double integration. Figure 5 provides an subtracting gravity, aligning with the constraints of learned
example of velocity learning from IMU sequence, where the velocities. The corrected linear accelerations are then doubly
periodicity of human walking makes it easy to infer the user’s integrated to estimate positions. To enhance the accuracy of
moving velocity. Similar observations have been made for inertial accelerations, RIDI leverages human walking speed as
vehicles, UAVs, and robotic platforms, which will be discussed a prior, which compensates for the drifts in inertial positioning,
in Section VII. Existing works on applying learned v elocity to effectively constraining them to a lower level. RoNIN [49]
correct IMU integration can generally be divided into three cat- improves upon RIDI by transforming inertial measurements
egories, as shown in Figure 6, and will be discussed as follows. and learned velocity vectors into a heading-agnostic coor-
One category of deep learning models aims to learn location dinate frame and introducing several novel velocity losses.
displacement, which is the average velocity multiplied by To minimize the impact of orientation estimation, RoNIN
a fixed period of time, as illustrated in Figure 6(a). The employs device orientation to transform inertial data into a
approach proposed by [37] formulates inertial positioning as frame with its Z-axis aligned with gravity. However, a lim-
a sequential learning problem, where 2D motion displace- itation of RoNIN is its reliance on orientation estimation.
ments in the polar coordinate, also known as polar vectors, NILoc [59] is an intriguing trial based on RoNIN, which
are learned from independent windows of segmented inertial tackles the neural inertial localization problem, aiming to infer
data. This is because the frequency of platform vibrations is global location from inertial motion history only. This work
relevant to the absolute moving speed, which can be measured recognizes that human motion patterns are unique in different
by IMU, when tracking human or wheeled configurations. locations, which can be utilized as a “fingerprint” to determine
Based on this observation, they propose IONet, an LSTM- the location, similar to WiFi or magnetic-field fingerprinting.
based framework for end-to-end learning of relative poses. NILoc first calculates a sequence of velocity from inertial data
Trajectories are generated by adding motion displacements and then employs a Transformer-based DNN framework [76]
together with initial locations. To train neural models, a large to transform the velocity sequence into location. However, one
collection of data was collected from a smartphone-based fundamental limitation of NILoc is that in some areas, such as
IMU in a room with a high-precision visual motion tracking open spaces, symmetrical or repetitive places, there may not
system (i.e., Vicon) to provide ground-truth pose labels. Once be a unique motion pattern.
the model is trained, the IONet model can be used in areas An alternative approach involves incorporating learned
outside the data-collection room. In a two-minute random velocity into the updating process of a Kalman filter (KF),
pedestrian walking scenario, the localization error of IONet as shown in Figure 6 (c). Reference [39] uses a ConvNet
is within 3 meters 90% of the time, when evaluating across to infer current speed from IMU sequences and incorporates
users, devices, and attachments, outperforming some classical this speed into the Kalman filter as a velocity observation
PDR algorithms. In tracking trolley, IONet shows comparable to constrain the drifts of SINS-based inertial positioning.
performance over representative visual-inertial odometry and This approach is similar to the zero-velocity update (ZUPT)
is even more robust in featureless areas. However, supervised method, which detects and uses zero-velocity in KF as obser-
learning-based IONet requires high-precision pose as training vations, but instead uses full speeds as observations in KF.
labels. When testing with data different from those in the Incorporating learned velocity allows the KF to handle more
training set, there will be performance degradation. To improve complex human motion. A similar trial is [52], that is based
the generalization ability, [41] proposes MotionTransformer, on a DNN that infers walking velocity in the body frame
which allows the inertial positioning model to self-adapt into and combines it with an extended KF. In addition to the
new domains via generative adversarial network (GAN) [73] learned velocity, [52] produces a noise parameter for KF to
and domain adaptation [74], without the need for labels in dynamically update parameters, rather than setting a fixed
new domains. To encourage more reliable inertial positioning, noise parameter.
[75] is able to produce pose uncertainties along with poses, Inertial positioning heavily relies on accurately estimat-
offering the belief in the extent to which the learned pose can ing the device’s attitude. Several methods aim to improve
be trusted. To allow full 3D localization, TLIO [47] proposes orientation estimation to enhance the performance of deep
to learn 3D location displacements and covariances from a learning based inertial odometry. RIDI, RoNIN, and TLIO
sequence of gravity-aligned inertial data. To avoid the impacts still depend on device orientation to rotate inertial data into a
from initial orientation, the inertial data are transformed into suitable frame. To address this problem, IDOL [54] proposes

Authorized licensed use limited to: EURECOM. Downloaded on September 06,2024 at 00:07:59 UTC from IEEE Xplore. Restrictions apply.
10512 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2024

TABLE II
A S UMMARY OF E XISTING M ETHODS ON D EEP L EARNING BASED I NERTIAL P OSITIONING

Fig. 5. The velocity of attached platform can be inferred from a sequence of inertial measurements via deep neural networks. (reprint from L-IONet [48]).

a two-stage process that first learns orientation from data and reduce positioning drifts while minimizing reliance on device
then rotates inertial data into the appropriate frame, followed orientation.
by learning the position. Reference [61] estimates orientation Figure 7 showcases several examples of deep learning based
using magnetic data and combines it with learned odometry to inertial positioning results.

Authorized licensed use limited to: EURECOM. Downloaded on September 06,2024 at 00:07:59 UTC from IEEE Xplore. Restrictions apply.
CHEN AND PAN: DEEP LEARNING FOR INERTIAL POSITIONING: A SURVEY 10513

Fig. 6. An overview of existing methods on learning to correct IMU


integration.

Fig. 8. An overview of existing methods on deep learning based sensor


fusion for visual-inertial positioning.

cannot recover the scale metric. IMU provides metric scale


and improves motion tracking in featureless areas, complex
lighting conditions, and motion blur. However, a pure inertial
solution can only last for a short period. Therefore, an effective
fusion of these two complementary sensors is necessary for
accurate pose estimation.
Traditional VIO methods integrate visual and inertial infor-
mation based on filtering [8], [88], fixed-lag smoothing [89],
or full smoothing [90]. Recently, deep learning-based VIO
models have emerged, directly constructing a mapping func-
tion from images and IMU to pose in a data-driven manner.
VINet [78] is an end-to-end deep VIO model consisting of a
ConvNet-based visual encoder to extract visual features from
two images and an LSTM-based inertial encoder to extract
inertial features from a sequence of inertial data between the
Fig. 7. Sample results of deep learning based inertial positioning from (a) VR
device for pedestrian tracking (reprint from TLIO [47]) (b) smartphone for
two images. As shown in Figure 8 (a), the visual and inertial
trolly tracking (reprint from IONet [37]). features are concatenated together as one tensor, followed by
an LSTM and fully-connected layer that finally maps features
into a 6-dimensional pose. VINet is trained on public driving
V. L EARNING BASED IMU I NTEGRATED P OSITIONING datasets such as the KITTI dataset [91] and a public drone
Integrating inertial sensors with other sensors as a multisen- dataset such as the EuroC dataset [92]. The learned VIO
sor navigation system has been an area of research for several model is generally more robust to sensor noises compared to
decades. Nowadays, platforms such as robots, vehicles, and traditional VIO methods, although its model performance still
VR/AR devices are equipped with GNSS, cameras, IMUs, and cannot compete with state-of-the-art VIO methods.
LIDAR sensors. Hence, it is natural to consider introducing To effectively integrate visual and inertial information, [80]
multimodal learning techniques [77] and designing learning proposes a selective sensor fusion mechanism that learns to
models capable of fusing multimodal information to construct choose important features conditioned on sensor observations,
a mapping function from sensor data to pose. as demonstrated in Figure 8 (b). Specifically, this work pro-
poses two types of fusion: soft fusion, which is based on an
attention mechanism and generates a soft mask to reweight
A. Learning Based Visual-Inertial Positioning features based on their importance, and hard fusion, which
Visual-inertial odometry (VIO) has garnered attention as is based on Gumbel Soft-max and generates a hard mask
a means of integrating low-cost, complementary camera and consisting of either 1 or 0 to either propagate or ignore a
IMU sensors that are widely deployed. Monocular vision feature. Experimental evaluation on the KITTI dataset demon-
can capture the appearance and geometry of a scene, but strates that compared with directly concatenating features [78],

Authorized licensed use limited to: EURECOM. Downloaded on September 06,2024 at 00:07:59 UTC from IEEE Xplore. Restrictions apply.
10514 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2024

selective fusion enhances the performance of deep VIO by In scenarios where GNSS signals are absent, [97] pro-
5%-10%. An interesting observation is that the number of poses a multi-task learning method. It involves denoizing
useful features is relevant to the amount of linear/rotational inertial data through a convolutional autoencoder, followed by
velocity, with inertial features contributing more to rotation temporal convolutional network (TCN) processing to address
rate (e.g., turning), while more visual features are used to GNSS gaps. The resulting aiding data significantly con-
increase linear velocity. tributes to deriving an accurate navigation solution within the
Both [78] and [80] are trained in a supervised learning Kalman filtering (KF) framework. [98] introduces the Self-
manner using datasets with high-precision ground-truth poses Learning Square-Root Cubature Kalman Filter (SL-SRCKF).
as training labels. However, obtaining high-precision poses This method employs an LSTM-based framework to con-
can be difficult or costly in certain cases. Consequently, self- tinuously obtain observation vectors during GNSS outages,
supervised learning-based VIOs, which do not require pose learning the relationship between observation vectors and
labels, have attracted attention. Self-supervised VIOs leverage internal filter parameters. The SL-SRCKF’s error prediction
the multi-view geometry relation of consecutive images, such ability is notably enhanced by introducing long short-term
as novel view synthesis, as a supervision signal [79], [81], [84], memory (LSTM) network, outperforming other neural net-
[86]. The task of novel view synthesis involves transforming a works under various GPS outage conditions.
source image into a target view and comparing the differences To refine the parameters used in filtering, [99] introduces
between the synthesized target images and real target images temporal convolutional neural (TCN) based framework that not
as loss. In VIOLearner [79] and DeepVIO [81], as shown in only learns the parameters of measurement noise covariances
Figure 8 (c), the pose transformation is generated from an but also the parameters of process noise covariances, resulting
inertial data sequence and used in the novel view synthesis in higher position accuracy compared to classical INS/GNSS
process. In UnVIO [84] and SelfVIO [86], inertial data is integrated positioning systems. Additionally, [100] proposes a
integrated with visual data via an attention module applied residual neural network with an attention mechanism to predict
to the concatenated visual and inertial features extracted from individual velocity elements of the noise covariance matrix.
the images and IMU sequence. They show that incorporating Through experiments, this work demonstrates that adjust-
inertial data with visual data improves the accuracy of pose ing the non-holonomic constraint uncertainty during dynamic
estimation, particularly rotation estimation. vehicle motions improves positioning accuracy, particularly
under large dynamic motions.
B. Learning Based IMU/GNSS Integrated Positioning
Kalman filtering (KF) serves as a conventional solution C. Learning to Fuse IMU With Other Sensors
for merging IMU and GPS data in GNSS/IMU integrated The use of learning-based sensor fusion extends beyond
navigation systems. However, recent developments leverage visual-inertial odometry (VIO) and IMU/GNSS integrated
deep learning to enhance the accuracy of filtering algorithms navigation to include other sensor modalities such as
and mitigate IMU/GNSS positioning drifts. LiDAR-inertial odometry (LIO), thermal-inertial odometry,
Reference [93] introduces a tight-integration of GNSS and radar-inertial odometry [82], [83], [87]. DeepTIO [82] and
and INS by converting INS information into Doppler data MilliEgo [83] employ attention-based selective fusion mecha-
and integrating it with GNSS tracking loops to mitigate nisms, similar to soft fusion [80], to reweight and fuse features
Doppler effects on the GNSS signal. It incorporating radial from inertial and visual data, resulting in improved pose
basis function neural network and Adaptive Network-Based accuracy. In addition, unsupervised learning-based LiDAR-
Fuzzy Inference System, effectively bridges GNSS outages, inertial odometry [87] generates motion transformation from
contributing to overall system robustness. After that, [94] IMU sequence and uses it for LIDAR novel view synthesis
introduces a GNSS/INS/odometer integrated system for land to facilitate self-supervised learning of egomotion, similar to
vehicle navigation, utilizing a fuzzy neural network (FNN) VIOLearner [79]. In all these cases, the inclusion of IMU data
to refine resolution accuracy during prolonged GNSS outages. in deep neural networks enhances pose estimation accuracy
Their experimental results validate its effectiveness in improv- and robustness.
ing position, velocity, and attitude accuracy, particularly under
extended GPS signal loss. VI. L EARNING TO C ORRECT P EDESTRIAN
Moving beyond the neural network methods before deep I NERTIAL P OSITIONING
learning age, [95] proposes a deep learning-based Kalman The previous subsections addressed the general problems of
filtering approach. It incorporates a modeling step alongside applying deep learning to correct inertial positioning drifts and
the prediction and update steps of the extended KF, addressing sensor fusion for IMU integrated positioning. This subsection
IMU errors and correcting positioning drifts with the help focuses on the specific use of deep learning to address par-
of GNSS measurements. Expanding on the dual optimization ticular aspects of pedestrian navigation algorithms, namely
concept, [96] introduces two neural networks to optimize Pedestrian Dead Reckoning (PDR) and Zero-Velocity Update
INS/GNSS navigation during GNSS outages. The first net- (ZUPT).
work compensates for inertial navigation system drifts, while
the second corrects errors generated by a filtering process, A. Learning to Correct Pedestrian Dead Reckoning
employing a radial basis function network for accurate position Pedestrian dead reckoning (PDR) error drifts often stem
data. from inaccurate stride and heading estimates. To address these

Authorized licensed use limited to: EURECOM. Downloaded on September 06,2024 at 00:07:59 UTC from IEEE Xplore. Restrictions apply.
CHEN AND PAN: DEEP LEARNING FOR INERTIAL POSITIONING: A SURVEY 10515

issues, researchers have incorporated deep learning techniques classification capabilities of deep learning to classify whether
into the process of step detection, dynamic step length esti- the user is in the ZUPT phase. For instance, [40] proposes
mation, and walking heading estimation. a six-layer long short-term memory (LSTM) network to
To estimate walking stride more robustly, researchers have detect zero-velocity. The LSTM inputs a sequence of IMU
sought to solve it in a data-driven way. One such method is data, typically 100 consecutive data points, and outputs the
SmartStep [101], a deep learning-based step detection frame- probability of whether the user is still or in motion at the
work that achieves 99% accuracy in step detection tasks across current timestep. The results from the LSTM-based zero-
various motion modes. Compared to peak/valley detection- velocity detection are then fed into a ZUPT-based INS. The
based methods, data-driven methods do not require IMUs to be proposed approach achieves a reduction in localization error
fixed in position, specific motion modes, or pre-calibration and by over 34% compared to fixed threshold-based ZVDs and was
threshold setting. Another approach involves using LSTM to shown to be more robust during a mixed variety of motions,
regress walking stride from raw inertial data [44]. This method such as walking, running, and climbing stairs. Similarly, [46]
has demonstrated effectiveness in various human motions, designs an adaptive ZUPT using convolutional neural net-
such as walking, running, jogging, and random movements. works (ConvNet) to classify ZVDs based on IMU sequences.
Additionally, StepNet [51] learns to estimate step length Deep learning approaches, such as LSTM and ConvNet, have
dynamically, i.e., the change in distance, which achieves an demonstrated excellent performance in extracting robust and
impressive performance with only a 2.1%-3.2% error rate useful features for zero-velocity identification, irrespective of
when compared to traditional static step length estimation. The different users, motion modes, and attachment places.
attachment mode of the device, such as in hand or in pocket,
can also influence walking stride estimation. To address this VII. L EARNING TO C ORRECT I NERTIAL P OSITIONING ON
problem, Bo et al. [65] employed domain adaptation [74] to V EHICLES , UAV AND ROBOTIC P LATFORMS
extract domain-invariant features for stride estimation, which As previously mentioned, deep learning methods have
enhanced the performance in new domains, such as holding, shown great potential in addressing the challenges of pedes-
calling, pocket, and swinging. trian inertial navigation. However, these techniques can also
Accurate heading estimation is crucial for updating position be applied to other platforms, such as vehicles, UAVs, robots,
in the right direction in PDR. To achieve more accurate and and more.
robust heading estimation, Wang et al. [45] utilize a Spa- These platforms share similarities with pedestrians, such
tial Transformer Network [102] and LSTM to learn heading as the ability to infer movement velocity from inertial data.
direction from the inertial sensor attached to an unconstrained This is because inertial data contains vibration information that
device. However, one problem that arises is the misalignment reflects the fundamental frequency proportional to the vehicle
between the device heading and pedestrian heading, making it speed. Building on the success of IONet [37], [42] proposes
difficult to estimate the real walking heading based on sensor AbolDeepIO, an improved triple-channel LSTM network that
data. To address this misalignment issue, [103] introduces predicts polar vectors for drone localization from inertial data
a deep neural network to estimate walking direction in the sequences. AbolDeepIO has been evaluated on a public drone
sensor’s frame. They derive a geometric model to convert dataset and has shown competitive performance compared to
walking direction from the sensor’s frame into a reference traditional visual-inertial odometry methods like VINS-mono.
frame (i.e., north and east coordinates) by exploiting accelera- When deploying deep learning-based inertial navigation
tion and magnetic data. This geometric model is combined on real-world devices, prediction accuracy and model effi-
with a learning framework to produce heading estimates. ciency must be considered. To address this, TinyOdom [62]
When tested on unseen data, this work reports a median aims to deploy neural inertial odometry models on resource-
heading error of 10◦ . PDRNet [55] follows the process of constrained devices. It proposes a lightweight model based
a traditional PDR algorithm but replaces the step length and on temporal convolutional networks (TCN) [104] to learn
heading estimation modules with deep neural networks. Their position displacement and optimizes the model through neural
experiments indicate that learning step length and heading architecture search (NAS) [105] to reduce model size between
together outperforms regressing them separately. 31 and 134 times. TinyOdom was extensively evaluated on
tracking pedestrians, animals, aerial, and underwater vehicles.
B. Learning to Correct Zero-Velocity Update Within 60 seconds, its localization error is between 2.5 and
In pedestrian inertial navigation systems (INS) based on 12 meters.
zero-velocity update (ZUPT), the zero-velocity phase is uti- Learning-based inertial odometry has also been extended
lized to correct inertial positioning errors through Kalman to legged robots by [56]. The learned location displacement
filtering. Therefore, the accuracy of zero-velocity detection is combined with kinematic motion models to estimate robot
is crucial in determining when to update the system states. system states at high frequencies (400 Hz). In this work, the
However, traditional threshold-based zero-velocity detection is robot successfully navigated a field experiment, where a legged
complicated by the mixed variety of motions experienced by robot walked around for 20 minutes in a mine with poor
humans, making it challenging to set a reliable threshold when illumination and visual feature tracking failures.
the user is still. In the realm of inertial positioning for vehicles, researchers
To address this issue, researchers have explored data-driven have proposed various methods to mitigate error drifts and
approaches that utilize the powerful feature extraction and improve accuracy. One such method is presented in [50],

Authorized licensed use limited to: EURECOM. Downloaded on September 06,2024 at 00:07:59 UTC from IEEE Xplore. Restrictions apply.
10516 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2024

where error covariances are learned from inertial data and neural networks in feature learning. End-to-end training in
incorporated into Kalman filtering for updating system states. self-supervised learning-based visual-inertial positioning [79],
This approach has been shown to improve inertial positioning [81], [84] leverages multi-view geometry relations between
performance. Similar to ZUPT-based pedestrian positioning, consecutive images, providing a supervision signal with-
zero-velocity-update (ZUPT) can also be used for car-equipped out requiring high-precision pose labels. This self-supervised
inertial navigation systems. The zero-velocity phase provides approach maximizes the use of large amounts of data, simul-
valuable context information to correct system error drifts via taneously offering depth estimates crucial for scene perception
Kalman filtering. OdoNet, presented in [66], is an example in self-driving vehicles and mobile robots.
of a system that learns and utilizes car speed along with a For pedestrian navigation in indoor environments, vision or
zero-velocity detector to reduce error drifts in car-equipped LiDAR-aided inertial positioning corrects drifts by exploit-
IMU systems. Deep learning techniques have also been ing feature associations between images or point clouds.
explored for detecting zero-velocity phases in vehicle nav- However, this integrated system demands more energy and
igation. For example, [43] proposes a deep learning-based computation compared to IMU-only solutions. To enhance
method for detecting zero-velocity phases in vehicle navi- the efficiency of deep learning-based visual-inertial position-
gation. In another study, [57] derives a model with motion ing for pedestrians, [106] employs knowledge distillation to
terms that are relevant only to the IMU data sequence. This compress a large teacher network into a lightweight version,
model provides theoretical guidance for learning models to transferring learned knowledge effectively. In scenarios with
infer useful terms and has been evaluated on a drone dataset, significant image degradations, such as smoke or fog, inte-
where it outperformed TLIO and other learning methods. grating thermal or mmWave radar sensors with IMU proves
Overall, these studies demonstrate the potential of deep beneficial. Approaches like DeepTIO [82] and MilliEgo [83]
learning-based methods in improving inertial navigation for utilize generative model-based frameworks to extract features
various platforms, including pedestrians, vehicles, drones, and from thermal images or noisy mmWave radar point clouds,
robots. By leveraging the rich information contained within constructing multisensory positioning systems for accurate
IMU data, deep learning models can effectively mitigate pose estimation, particularly in smoke-filled environments for
error drifts and improve the accuracy of inertial positioning firefighters.
systems. Furthermore, by optimizing the model efficiency and
considering deployment on resource-constrained devices, these IX. L EARNING BASED H UMAN M OTION A NALYSIS AND
techniques can be applied in real-world scenarios. ACTIVITY R ECOGNITION
Inertial sensors have diverse applications beyond position-
VIII. L EARNING S ENSOR F USION FOR IMU I NTEGRATED ing, such as motion tracking, activity recognition, and more.
P OSITIONING A PPLICATIONS Although these tasks are not the primary focus of this survey,
This section explores the applications of learning-based sen- this section provides a brief yet comprehensive overview of
sor fusion for IMU-integrated positioning in vehicles, robots, how deep learning is utilized in these domains.
and pedestrian navigation. Compared to pure inertial position-
ing, IMU-integrated systems demonstrate enhanced robustness A. Human Motion Analysis
and accuracy in complex dynamic scenarios, enabling sus- Data-driven approaches are utilized to reconstruct human
tained operation over extended periods. pose and motion using either a single IMU or multiple
In the realm of vehicle navigation, establishing a robust IMUs attached to the body. These models primarily focus on
tight-integrated IMU/GNSS positioning system is crucial for analyzing human motion rather than localizing users, which
providing accurate global positioning, particularly in challeng- differentiates them from inertial positioning. Several studies
ing environments like tunnels or streets with tall buildings. have applied machine learning to gait and pose analysis, such
Deep learning techniques play a pivotal role in addressing as knee angle estimation for human walking using supervised
challenges such as compensating positioning drifts and esti- support vector regression in [107] and probabilistic parame-
mating filtering parameters. Studies like [95] and [96] utilize ter learning for human gesture recognition in [108] through
deep learning to model error drifts in IMU/GNSS systems, handcrafted motion features extracted from inertial data.
while others such as [99] and [100] focus on learning essential In addition, machine learning methods, such as multi-layer
filtering parameters like measurement noises, process noises, perceptrons (MLPs), have been utilized in IMU data to learn
or velocity for effective fusion in Kalman filtering. sensor displacement for human motion reconstruction in [109],
In the domain of intelligent unmanned platforms like vehi- [110], and [111].
cles, robots, and UAVs, which often operate in complex Recently, deep learning has shown promising performance
and dynamic scenarios, robust perception is vital for reliable in human pose reconstruction. For example, [112] proposed
planning, decision-making, and control. Multisensory posi- Deep Inertial Poser, a recurrent neural network (RNN)-based
tioning supports these objectives, but issues such as camera framework that can reconstruct full-body pose from six IMUs
occlusions, image degradations, and complex illumination attached to the user’s body. TransPose [113], another RNN-
changes can make visual-inertial positioning systems fragile. based framework, enables real-time human pose estimation
Deep learning interventions, as seen in studies like [78] and using six body-attached IMUs. Furthermore, [114] combines
[80], enhance the robustness of visual-inertial positioning by a neural kinematics estimator with a physics-aware motion
extracting more reliable features through the efficacy of deep optimizer to improve the accuracy of human motion tracking.

Authorized licensed use limited to: EURECOM. Downloaded on September 06,2024 at 00:07:59 UTC from IEEE Xplore. Restrictions apply.
CHEN AND PAN: DEEP LEARNING FOR INERTIAL POSITIONING: A SURVEY 10517

TABLE III
A S UMMARY OF E XISTING M ETHODS ON D EEP L EARNING BASED S ENSOR F USION

TABLE IV
P UBLIC DATASETS FOR DATA -D RIVEN I NERTIAL P OSITIONING

B. Human Activity Recognition (HAR) A. The Inertial Positioning Datasets


Deep learning can be utilized to exploit inertial information In the realm of vehicle navigation, the KITTI dataset [91]
from body-worn IMUs for human activity recognition. For serves as a widely adopted benchmark. The sensors are rigidly
instance, [115] published a popular public dataset of human affixed to the car chassis, making it conducive for study-
activity recognition and successfully classified current activity ing vehicle movements. Specifically, the KITTI Odometry
among six classes, including walking, standing still, sitting, Dataset encompasses data collected from car-driving scenarios,
walking downstairs, walking upstairs, and laying down, using including visual images, LIDAR point clouds, IMU, and
support vector machines (SVM). In addition, [116] presents ground truth. The high-precision GPS/IMU integrated system
an LSTM-based HAR model that inputted a sequence of provides ground truth, with raw unsynchronized data packages
inertial data and outputted class probability. Moreover, [117] containing high-frequency inertial data at 100 Hz, and images
introduces a ConvNet-based HAR model that achieved a and ground truth from GPS at 10 Hz.
classification accuracy of 97%, outperforming an accuracy of In the domain of drone and robotic research, the EuRoC
96% from SVM-based HAR models. To reduce onboard com- MAV datasets [92] feature tightly synchronized video streams
putational requirements, [118] presents a learning framework from a Micro Aerial Vehicle (MAV) equipped with a stereo
that exploited both features automatically extracted by DNN camera and an IMU. Comprising 11 flight trajectories across
and hand-crafted features to achieve accurate and real-time two environments, this dataset captures complex motion. The
human activity recognition on low-end devices. images, captured at 20 frames per second (FPS), and IMU
Learning from inertial data can also benefit sports and health measurements recorded at 200 Hz span the MAV’s station-
applications. For instance, [119] shows that deep learning is ary state, takeoff, flight, and landing on its initial position.
effective in detecting Parkinson’s disease by assessing the A consistent IMU (MEMS IMU ADIS16448) operating at
patient’s daily activity through the analysis of inertial infor- 200 Hz, along with Vicon Motion Capture and Leica MS50
mation from wearable sensors. Additionally, [120] provides laser tracker, is used to produce accurate ground truth.
instructions for athletes’ sports training based on sensor data For pedestrian navigation, the Oxford Inertial Odome-
and activity information. try Dataset (OxIOD), The Robust Neural Inertial Naviga-
tion Dataset (RONIN), and Smartphone Inertial Measurement
X. DATASETS AND E VALUATION M ETRICS Dataset (SIMD) collect IMU data from mobile devices to
In this section, we present five prominent public datasets reflect human motion in everyday life.
widely utilized in deep learning-based inertial positioning • The OxIOD dataset [48] consists of inertial measurements
research, as outlined in Table IV. Additionally, we introduce collected with IMUs attached in various ways (handheld,
the evaluation metrics and compare several representative in the pocket, in a handbag, and on a trolley/stroller).
methods using two well-known datasets. It encompasses different motion modes, four types of

Authorized licensed use limited to: EURECOM. Downloaded on September 06,2024 at 00:07:59 UTC from IEEE Xplore. Restrictions apply.
10518 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2024

TABLE V on accelerometer peaks. Displacement is updated via Wein-


T HE P ERFORMANCE C OMPARISON OF R EPRESENTATIVE T RADITIONAL berg’s stride length estimation model, and the heading is
AND L EARNING BASED I NERTIAL P OSITIONING S OLUTIONS ON T WO
P UBLIC DATASETS . M ETRIC DATA I S F ROM [62]
computed from the gyroscope readings. Traditional strapdown
inertial navigation systems (SINS) integrate IMU measure-
ments directly into position, velocity, and orientation based
on Newtonian mechanics. However, due to error propagation
from measurement noises, SINS quickly drifts and fails to pro-
vide reasonable results. IONet regresses heading and location
displacement from data, demonstrating superior performance
in relative trajectory estimation on both datasets. Notably, PDR
exhibits good performance on the OxIOD dataset, character-
off-the-shelf consumer phones, and data from five users. ized by simpler and smoother pedestrian trajectories. However,
With 158 sequences, the dataset covers a total walking IONet outperforms PDR on the RoNIN dataset, highlighting
distance and recording time of 42.5 km and 14.72 h. the effectiveness of learning-based motion modeling in com-
• The RONIN dataset [49], containing inertial motion data plex scenarios. RoNIN employs a heading-agnostic coordinate
from 100 human subjects, enables users to handle smart- frame aligned with gravity, assuming correct orientation for
phones naturally as in day-to-day activities. It supports learning inertial motion. In comparison, IONet and PDR do
unrestricted phone orientation and placement, presenting not make such an assumption. Consequently, RoNIN out-
a challenging task for developing inertial models invariant performs IONet and PDR significantly. TinyOdom represents
to device orientation or placement. Additionally, trajec- a lightweight learning-based inertial positioning model with
tories in the RONIN dataset have a larger spatial span significantly fewer neural network weights than RoNIN. While
compared to those in the OxIOD dataset. TinyOdom’s performance is comparable to RoNIN on the
• The SIMD dataset [69] encompasses over 4500 walking OxIOD dataset, it lags behind RoNIN on the RoNIN dataset.
trajectories, totaling approximately 190 hours and cover-
ing more than 700 km. It includes diverse scenarios in XI. C ONCLUSION AND D ISCUSSIONS
four cities, both indoors and outdoors, seven phone atti-
In recent years, there has been a growing interest in using
tudes, and involves more than 150 volunteers with their
deep learning to address the problem of inertial positioning.
smartphones. The inertial data, collected by embedded
This article provides a comprehensive review of the area of
smartphone IMU sensors, have synchronized timestamps
deep learning-based inertial positioning. The rapid advances in
and include specific force, angular rates, magnetic fields,
this field have already provided promising solutions to address
and GPS-derived location information. Movement read-
problems such as inertial sensor calibration, the compensation
ings are also calculated by internal algorithms.
of error drifts in inertial positioning, and multimodal sensor
fusion. This section concludes and discusses the benefits that
B. Evaluation Metrics and Results
deep learning can bring to inertial navigation research, ana-
To assess the efficacy of inertial positioning, researchers lyzes the challenges that existing research faces, and highlights
commonly employ two evaluation metrics: Absolute Trajec- the future opportunities of this evolving field.
tory Error (ATE) and Relative Trajectory Error (RTE) in the
domain of deep learning-based inertial positioning.
• Absolute Trajectory Error (ATE): ATE is quantified as
A. Benefits
the average root-mean-squared-error (RMSE) between Unlike traditional geometric or physical inertial positioning
the actual and predicted locations throughout the entire models, the integration of deep learning into inertial posi-
trajectory. A lower ATE signifies superior performance. tioning has led to the development of a range of alternative
• Relative Trajectory Error (RTE): RTE is determined as solutions to address the issue of positioning error drifts. The
the average root-mean-squared-error (RMSE) between corresponding benefits can be summarized as follows:
the actual and predicted locations within a specified 1) Learn to Approximate Complex and Varying Function:
time interval. A lower RTE indicates more accurate The deep neural network has proven to be a powerful and
predictions. versatile nonlinear function that can approximate the complex
Here, we leverage the OxIOD and RoNIN datasets, which and variable factors involved in inertial positioning, which are
are widely used for pedestrian inertial navigation. Following difficult to model manually. For example, when calibrating
the methods and results outlined in [62], we selected two sensors, the corrupt noises that exist in inertial measurements
traditional inertial positioning solutions, namely PDR and can be modeled and eliminated in a data-driven way by train-
SINS, along with three representative learning-based models— ing on a large dataset using a DNN. Deep learning can also
IONet [37], RoNIN [49], and TinyOdom [62] — to compare directly generate absolute velocity and position displacement
their performance on these datasets. The performance compar- from data, without the need for IMU integration, thus reducing
ison is summarized in Table V, where ATE and RTE values positioning drifts. In pedestrian dead reckoning (PDR), deep
for each model on both datasets are presented. learning can estimate step length based on data, rather than
PDR, a mainstream classical solution for pedestrian inertial empirical equations, and implicitly remove the effects of
navigation, employs a threshold-based step detector based different users. These works demonstrate that using a large

Authorized licensed use limited to: EURECOM. Downloaded on September 06,2024 at 00:07:59 UTC from IEEE Xplore. Restrictions apply.
CHEN AND PAN: DEEP LEARNING FOR INERTIAL POSITIONING: A SURVEY 10519

dataset to build a data-driven model can produce more accurate application scenes. Thus, it is challenging to determine the real
motion estimates, as well as reduce and constrain the rapid performance of these models in open environments. To address
error drifts of inertial navigation systems. the generalization problem, new learning techniques such as
2) Learn to Estimate Parameters: Automatic identifica- transfer learning [121], lifelong learning, and contrastive learn-
tion of parameters through data-driven models contributes ing [122] can be introduced into inertial positioning systems,
to paving the way for next-generation intelligent navigation which is a promising direction. For instance, in the future,
systems that can actively exploit input data and evolve over by exploiting information from physical/geometric rules or
time without human intervention. In classical inertial navi- other sensors (e.g., GNSS, camera), the learning-based inertial
gation mechanisms, certain parameters or modules need to positioning model can be self-supervisedly trained and enable
be manually set and tuned before use. For instance, experts mobile agents to learn from data in a lifelong manner.
with experience need to settle parameters in Kalman filtering, 2) Black-Box and Explainability: Deep neural networks
such as observation noise, covariance, and process noise. Deep have been criticized as being a ‘black-box’ model due to their
learning has proven effective in automatically producing suit- lack of explainability and interpretability. As these models
able parameters for Kalman filtering based on input data [45], are often used to support real-world tasks, it is crucial to
[50], [85]. In sensor calibration, reinforcement learning algo- investigate what is learned inside deep nets before deploying
rithms are used to discover optimal parameters for inertial them to ensure their safety and reliability. Despite the good
calibration algorithms [30]. In ZUPT-based pedestrian inertial results shown by deep learning models in estimating important
positioning, deep learning is a viable solution for classifying terms such as location displacement, sensor measurement
zero-velocity phases and determining when to update system errors, and filtering parameters, these terms lack concrete
states. mathematical models, unlike traditional inertial navigation.
3) Learn to Self-Adapt in New Domains: Unforeseen or To determine whether these terms are trustworthy, uncer-
ever-changing issues in new application domains, such as tainties should be estimated in conjunction with the inertial
changes in motion mode, carrier, and sensor noise, can signif- positioning method [75] and used as indicators for users or
icantly impact the performance of inertial systems. Learning systems to understand the extent to which model predictions
models offer opportunities for inertial systems to adapt to new can be trusted. In future research, it is important to reveal
changes and overcome these influential factors implicitly by the governing mathematical or physical models behind the
discovering and exploiting the differences in data distribu- learned inertial positioning neural model and identify which
tions between domains. For instance, [41] leverages transfer parts of inertial positioning can be learned by deep nets.
learning to allow INS to extract domain-invariant features Introducing Bayesian deep learning into inertial positioning
from data, maintaining localization accuracy when sensor is also a promising direction that could offer interpretability
attachment is changed. The introduction of self-supervised for model predictions [123].
learning enables navigation systems to learn from data without 3) Efficiency and Real-World Deployment: When deploying
high-precision pose as training labels, allowing unlabelled deep positioning models on user devices, it is crucial to
inertial data to be effectively used for model performance consider the consumption of computation, storage, and energy
improvement. In visual-inertial odometry, [79], [81], [84] in system design, in addition to prediction accuracy. Compared
introduce novel view synthesis as a supervision signal to to classical inertial navigation algorithms, DNN-based inertial
train deep VIO in a self-supervised learning way. This positioning models have a relatively large computational and
self-adaptation ability is promising for mobile agents to memory burden, as they contain millions of neural parameters
continuously improve their localization performance in new that require GPUs for parallel training and testing. Therefore,
application scenes. online inference of learning models, especially on low-end
devices such as IoT consoles, VR/AR devices, and miniature
B. Challenges and Opportunities drones, requires lightweight, efficient, and effective models.
Despite the impressive and promising results that deep To achieve this goal, neural model compression techniques,
learning has already offered in inertial positioning, there are such as knowledge distillation [124], should be introduced to
still challenges in existing methods when they are applied and discover the optimal neural structure that balances prediction
deployed in real-world scenarios. To overcome these limita- accuracy and model size. References [48] and [66] have
tions, several opportunities and potential research directions conducted initial trials on minimizing the model size of inertial
are discussed below. odometry. Moreover, safety and reliability are also crucial
1) Generalization and Self-Learning: The generalization factors to consider. In the future, it is worth exploring the
problem is a major concern for deep learning-based methods optimal structure of learning-based inertial positioning mod-
because these models are trained on one domain (i.e., training els, considering model performance, parameter size, latency,
set) but need to be tested on other domains (i.e., testing safety, and reliability for real-world deployment.
set). The possible differences in data between domains can 4) Data Collection and Benchmark: Deep learning models’
lead to a degradation of prediction performance. Although performance relies heavily on data quality, including dataset
deep learning-based inertial navigation models have reported size, diversity, and consistency between training and testing
impressive results on the author’s own datasets, these works sets. Ideally, deep learning-based inertial positioning models
have not been evaluated in comprehensive experiments during should be trained on diverse data across users, platforms,
long-term operation and across various devices, users, and motion dynamics, and sensors to improve generalization.

Authorized licensed use limited to: EURECOM. Downloaded on September 06,2024 at 00:07:59 UTC from IEEE Xplore. Restrictions apply.
10520 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2024

However, acquiring such data can be costly and time- 8) Robustness and Reliability: In practical applications,
consuming, and obtaining accurate ground-truth labels can the challenges associated with handling unforeseen situations
be challenging. Previous research has varied in training and ensuring reliability become particularly critical, especially
data, model parameters, and evaluation metrics, hindering in safety-critical domains such as autonomous vehicles. The
fair comparisons. In visual navigation tasks, such as visual learning models employed may encounter difficulties in adapt-
odometry/SLAM, the KITTI dataset [91] is commonly used as ing to extreme conditions, thereby introducing risks to robust
a benchmark to train and evaluate learning-based VO models. and reliable positioning. To solve these problems, prospective
However, although published datasets for inertial navigation solutions such as diversifying training data, implementing
exist [48], [49], there is still a lack of a common benchmark adversarial testing and new learning models can contribute
that is adopted and recognized by mainstream methods in to the overall reliability of positioning system. In addition,
inertial positioning. In the future, a widely adopted dataset continuous monitoring, adaptive algorithms, and strict certi-
and benchmark, covering a variety of application scenarios, fication standards play pivotal roles in enhancing the overall
will greatly benefit and foster research in data-driven inertial trustworthiness of the positioning system.
positioning.
5) Failure Cases and Physical Constraints: Deep learn- R EFERENCES
ing has demonstrated its capability in reducing the drifts
[1] M. G. Puyol, D. Bobkov, P. Robertson, and T. Jost, “Pedestrian
of inertial positioning and contributing to various aspects simultaneous localization and mapping in multistory buildings using
of inertial navigation systems, as discussed in Section IV. inertial sensors,” IEEE Trans. Intell. Transp. Syst., vol. 15, no. 4,
However, DNN models are not always reliable and may pp. 1714–1727, Aug. 2014.
[2] J.-O. Nilsson, J. Rantakokko, P. Händel, I. Skog, M. Ohlsson,
occasionally produce large and abrupt prediction errors. and K. V. S. Hari, “Accurate indoor positioning of firefighters using
Unlike traditional inertial navigation algorithms that are dual foot-mounted inertial sensors and inter-agent ranging,” in Proc.
based on concrete physical and mathematical rules, DNN IEEE/ION Position, Location Navigat. Symp. (PLANS), May 2014,
pp. 631–636.
predictions lack constraints, and the failure cases must be [3] A. Bulling, U. Blanke, and B. Schiele, “A tutorial on human activity
considered in real-world applications with safety concerns. recognition using body-worn inertial sensors,” ACM Comput. Surveys,
To enhance the robustness of DNN predictions, possible solu- vol. 46, no. 3, pp. 1–33, Jan. 2014.
tions include imposing physical constraints on DNN models [4] P. G. Savage, “Strapdown inertial navigation integration algorithm
design—Part 1: Attitude algorithms,” J. Guid., Control, Dyn., vol. 21,
or combining deep learning with physical models as hybrid no. 1, pp. 19–28, Jan. 1998.
inertial positioning models. By doing so, the benefits from [5] P. G. Savage, “Strapdown inertial navigation integration algorithm
both learning and physics-based positioning models can be design—Part 2: Velocity and position algorithms,” J. Guid., Control,
Dyn., vol. 21, no. 2, pp. 208–221, Mar. 1998.
leveraged. [6] R. Harle, “A survey of indoor inertial positioning systems for pedes-
6) New Deep Learning Methods: Machine/deep learning trians,” IEEE Commun. Surveys Tuts., vol. 15, no. 3, pp. 1281–1293,
is one of the fastest growing areas of AI, and its advances 3rd Quart., 2013.
[7] I. Skog, P. Handel, J. O. Nilsson, and J. Rantakokko, “Zero-velocity
have influenced numerous fields such as computer vision, detection—An algorithm evaluation,” IEEE Trans. Biomed. Eng.,
robotics, natural language processing, and signal processing. vol. 57, no. 11, pp. 2657–2666, Nov. 2010.
There are significant opportunities for applying deep learning [8] M. Li and A. I. Mourikis, “High-precision, consistent EKF-based
visual-inertial odometry,” Int. J. Robot. Res., vol. 32, no. 6,
techniques to inertial navigation and analyzing their effective- pp. 690–711, May 2013.
ness and theoretical underpinnings. In the future, new model [9] T. Qin, P. Li, and S. Shen, “VINS-mono: A robust and versatile
structures such as transformer [76], diffusion models [125], monocular visual-inertial state estimator,” IEEE Trans. Robot., vol. 34,
and generative models [73], and new learning methods such no. 4, pp. 1004–1020, Aug. 2018.
[10] W. Xu, Y. Cai, D. He, J. Lin, and F. Zhang, “FAST-LIO2 : Fast
as transfer learning, reinforcement learning, contrastive learn- direct LiDAR-inertial odometry,” IEEE Trans. Robot., vol. 38, no. 4,
ing [122], unsupervised learning, and meta-learning [126], pp. 2053–2073, Aug. 2022.
all hold promise for enhancing inertial positioning systems. [11] Y. Bengio, I. Goodfellow, and A. Courville, Deep Learning, vol. 1.
Cambridge, MA, USA: MIT Press, 2017.
Furthermore, advances in other domains such as neural ren- [12] Z.-Q. Zhao, P. Zheng, S.-T. Xu, and X. Wu, “Object detection with deep
dering [127] and voice synthesis [128] may provide valuable learning: A review,” IEEE Trans. Neural Netw. Learn. Syst., vol. 30,
insights into developing more effective inertial positioning no. 11, pp. 3212–3232, Nov. 2019.
systems. Therefore, incorporating these rapidly-evolving deep [13] S. Hao, Y. Zhou, and Y. Guo, “A brief survey on semantic segmen-
tation with deep learning,” Neurocomputing, vol. 406, pp. 302–321,
learning methods into inertial navigation will be a significant Sep. 2020.
area of research in the future. [14] N. Sünderhauf et al., “The limits and potentials of deep learning for
7) Deep Sensor Fusion: Sensor fusion faces challenges robotics,” Int. J. Robot. Res., vol. 37, nos. 4–5, pp. 405–420, 2018.
[15] Y. Li et al., “Inertial sensing meets machine learning: Opportunity
like diverse sensor data formats, temporal sync issues, and or challenge?” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 8,
sensor calibration complexity. Real-time processing needs, pp. 9995–10011, Aug. 2022.
adapting to dynamic environments, and limited labeled data [16] P. S. Farahsari, A. Farahzadi, J. Rezazadeh, and A. Bagheri, “A sur-
vey on indoor positioning systems for IoT-based applications,” IEEE
for multi-sensory models add more complexity. Deep learning Internet Things J., vol. 9, no. 10, pp. 7680–7699, May 2022.
in sensor fusion improves accuracy by merging inertial sensor [17] L. E. Díez, A. Bahillo, J. Otegui, and T. Otim, “Step length estimation
data with others, learning fusion strategies, synchronization, methods based on inertial sensors: A review,” IEEE Sensors J., vol. 18,
and calibration from data. It adapts and customizes contin- no. 17, pp. 6908–6926, Sep. 2018.
[18] Y. Wu, H.-B. Zhu, Q.-X. Du, and S.-M. Tang, “A survey of the research
uously for better performance in various applications like status of pedestrian dead reckoning systems based on inertial sensors,”
vehicles, robots, pedestrians, and drones. Int. J. Autom. Comput., vol. 16, no. 1, pp. 65–83, Feb. 2019.

Authorized licensed use limited to: EURECOM. Downloaded on September 06,2024 at 00:07:59 UTC from IEEE Xplore. Restrictions apply.
CHEN AND PAN: DEEP LEARNING FOR INERTIAL POSITIONING: A SURVEY 10521

[19] X. Ru, N. Gu, H. Shang, and H. Zhang, “MEMS inertial sensor cali- [43] M. Brossard, A. Barrau, and S. Bonnabel, “RINS-W: Robust inertial
bration technology: Current status and future trends,” Micromachines, navigation system on wheels,” in Proc. IEEE/RSJ Int. Conf. Intell.
vol. 13, no. 6, p. 879, May 2022. Robots Syst. (IROS), Nov. 2019, pp. 2068–2075.
[20] N. El-Sheimy, H. Hou, and X. Niu, “Analysis and modeling of inertial [44] T. Feigl, S. Kram, P. Woller, R. H. Siddiqui, M. Philippsen, and
sensors using Allan variance,” IEEE Trans. Instrum. Meas., vol. 57, C. Mutschler, “A bidirectional LSTM for estimating dynamic human
no. 1, pp. 140–149, Jan. 2008. velocities from a single IMU,” in Proc. Int. Conf. Indoor Positioning
[21] H. Weinberg, “Using the ADXL202 in pedometer and personal Indoor Navigat. (IPIN), Sep. 2019, pp. 1–8.
navigation applications,” Analog Devices, Wilmington, MA, USA, [45] Q. Wang et al., “Pedestrian heading estimation based on spatial
Appl. Note AN-602, 2002, pp. 1–6, vol. 2. transformer networks and hierarchical LSTM,” IEEE Access, vol. 7,
[22] L. Fang et al., “Design of a wireless assisted pedestrian dead reckon- pp. 162309–162322, 2019.
ing system—The NavMote experience,” IEEE Trans. Instrum. Meas., [46] X. Yu et al., “AZUPT: Adaptive zero velocity update based on neural
vol. 54, no. 6, pp. 2342–2358, Dec. 2005. networks for pedestrian tracking,” in Proc. IEEE Global Commun.
[23] P. Goyal, V. J. Ribeiro, H. Saran, and A. Kumar, “Strap-down pedes- Conf. (GLOBECOM), Dec. 2019, pp. 1–6.
trian dead-reckoning system,” in Proc. Int. Conf. Indoor Positioning [47] W. Liu et al., “TLIO: Tight learned inertial odometry,” IEEE Robot.
Indoor Navigat., Sep. 2011, pp. 1–7. Autom. Lett., vol. 5, no. 4, pp. 5653–5660, Oct. 2020.
[24] B. Huang, G. Qi, X. Yang, L. Zhao, and H. Zou, “Exploiting cyclic [48] C. Chen, P. Zhao, C. X. Lu, W. Wang, A. Markham, and N. Trigoni,
features of walking for pedestrian dead reckoning with unconstrained “Deep-learning-based pedestrian inertial navigation: Methods, data set,
smartphones,” in Proc. ACM Int. Joint Conf. Pervasive Ubiquitous and on-device inference,” IEEE Internet Things J., vol. 7, no. 5,
Comput., 2016, pp. 374–385. pp. 4431–4441, May 2020.
[25] D. Feng, C. Wang, C. He, Y. Zhuang, and X.-G. Xia, “Kalman- [49] S. Herath, H. Yan, and Y. Furukawa, “RoNIN: Robust neural inertial
filter-based integration of IMU and UWB for high-accuracy indoor navigation in the wild: Benchmark, evaluations, new methods,” in Proc.
positioning and navigation,” IEEE Internet Things J., vol. 7, no. 4, IEEE Int. Conf. Robot. Autom. (ICRA), May 2020, pp. 3146–3152.
pp. 3133–3146, Apr. 2020.
[50] M. Brossard, A. Barrau, and S. Bonnabel, “AI-IMU dead-reckoning,”
[26] S. Yang, J. Liu, X. Gong, G. Huang, and Y. Bai, “A Robust heading IEEE Trans. Intell. Vehicles, vol. 5, no. 4, pp. 585–595, Dec. 2020.
estimation solution for smartphone multisensor-integrated indoor posi-
[51] I. Klein and O. Asraf, “StepNet—Deep learning approaches for step
tioning,” IEEE Internet Things J., vol. 8, no. 23, pp. 17186–17198,
length estimation,” IEEE Access, vol. 8, pp. 85706–85713, 2020.
Dec. 2021.
[27] C. Xiyuan, “Modeling random gyro drift by time series neural networks [52] Y. Wang, H. Cheng, and M. Q.-H. Meng, “Pedestrian motion tracking
and by traditional method,” in Proc. Int. Conf. Neural Netw. Signal by using inertial sensors on the smartphone,” in Proc. IEEE/RSJ Int.
Process., Dec. 2003, pp. 810–813. Conf. Intell. Robots Syst. (IROS), Oct. 2020, pp. 4426–4431.
[28] H. Chen, P. Aggarwal, T. M. Taha, and V. P. Chodavarapu, “Improving [53] X. Teng et al., “ARPDR: An accurate and robust pedestrian dead
inertial sensor by reducing errors using deep learning methodology,” reckoning system for indoor localization on handheld smartphones,”
in Proc. IEEE Nat. Aerosp. Electron. Conf. (NAECON), Jul. 2018, in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), Oct. 2020,
pp. 197–202. pp. 10888–10893.
[29] M. A. Esfahani, H. Wang, K. Wu, and S. Yuan, “OriNet: Robust 3- [54] S. Sun, D. Melamed, and K. Kitani, “IDOL: Inertial deep orientation-
D orientation estimation with a single particular IMU,” IEEE Robot. estimation and localization,” in Proc. AAAI Conf. Artif. Intell., vol. 35,
Autom. Lett., vol. 5, no. 2, pp. 399–406, Apr. 2020. 2021, pp. 6128–6137.
[30] F. Nobre and C. Heckman, “Learning to calibrate: Reinforcement [55] O. Asraf, F. Shama, and I. Klein, “PDRNet: A deep-learning pedes-
learning for guided calibration of visual–inertial rigs,” Int. J. Robot. trian dead reckoning framework,” IEEE Sensors J., vol. 22, no. 6,
Res., vol. 38, nos. 12–13, pp. 1388–1402, Oct. 2019. pp. 4932–4939, Mar. 2022.
[31] M. Brossard, S. Bonnabel, and A. Barrau, “Denoising IMU gyroscopes [56] R. Buchanan, M. Camurri, F. Dellaert, and M. Fallon, “Learning inertial
with deep learning for open-loop attitude estimation,” IEEE Robot. odometry for dynamic legged robot state estimation,” in Proc. 5th Conf.
Autom. Lett., vol. 5, no. 3, pp. 4796–4803, Jul. 2020. Robot Learn., vol. 164, 2022, pp. 1575–1584.
[32] X. Zhao, C. Deng, X. Kong, J. Xu, and Y. Liu, “Learning to compensate [57] M. Zhang, M. Zhang, Y. Chen, and M. Li, “IMU data process-
for the drift and error of gyroscope in vehicle localization,” in Proc. ing for inertial aided navigation: A recurrent neural network based
IEEE Intell. Vehicles Symp. (IV), Oct. 2020, pp. 852–857. approach,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2021,
[33] F. Huang, Z. Wang, L. Xing, and C. Gao, “A MEMS IMU gyroscope pp. 3992–3998.
calibration method based on deep learning,” IEEE Trans. Instrum. [58] J. Gong, X. Zhang, Y. Huang, J. Ren, and Y. Zhang, “Robust inertial
Meas., vol. 71, pp. 1–9, 2022. motion tracking through deep sensor fusion across smart earbuds
[34] R. Li, C. Fu, W. Yi, and X. Yi, “Calib-Net: Calibrating the low- and smartphone,” Proc. ACM Interact., Mobile, Wearable Ubiquitous
cost IMU via deep convolutional neural network,” Frontiers Robot. Technol., vol. 5, no. 2, pp. 1–26, Jun. 2021.
AI, vol. 8, Jan. 2022, Art. no. 772583. [59] S. Herath, D. Caruso, C. Liu, Y. Chen, and Y. Furukawa, “Neural
[35] S.-I. Amari, “Backpropagation and stochastic gradient descent method,” inertial localization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
Neurocomputing, vol. 5, nos. 4–5, pp. 185–196, 1993. Recognit. (CVPR), Jun. 2022, pp. 6594–6603.
[36] A. K. Jain, J. Mao, and K. M. Mohiuddin, “Artificial neural networks: [60] X. Cao, C. Zhou, D. Zeng, and Y. Wang, “RIO: Rotation-
A tutorial,” Computer, vol. 29, no. 3, pp. 31–44, Mar. 1996. equivariance supervised learning of robust inertial odometry,” in Proc.
[37] C. Chen, X. Lu, A. Markham, and N. Trigoni, “IONet: Learning to IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022,
cure the curse of drift in inertial odometry,” in Proc. Conf. Artif. Intell. pp. 6604–6613.
(AAAI), 2018, pp. 6468–6476. [61] Y. Wang, J. Kuang, Y. Li, and X. Niu, “Magnetic field-enhanced
[38] H. Yan, Q. Shan, and Y. Furukawa, “RIDI: Robust IMU double inte- learning-based inertial odometry for indoor pedestrian,” IEEE Trans.
gration,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 621–636. Instrum. Meas., vol. 71, pp. 1–13, 2022.
[39] S. Cortés, A. Solin, and J. Kannala, “Deep learning based speed estima- [62] S. S. Saha, S. S. Sandha, L. A. Garcia, and M. Srivastava, “TinyO-
tion for constraining strapdown inertial navigation on smartphones,” in dom: Hardware-aware efficient neural inertial navigation,” Proc. ACM
Proc. IEEE 28th Int. Workshop Mach. Learn. Signal Process. (MLSP), Interact. Mobile Wearable Ubiquitous Technol., vol. 6, no. 2, pp. 1–32,
Sep. 2018, pp. 1–6. 2022.
[40] B. Wagstaff and J. Kelly, “LSTM-based zero-velocity detection for [63] B. Rao, E. Kazemi, Y. Ding, D. M. Shila, F. M. Tucker, and L. Wang,
robust inertial navigation,” in Proc. Int. Conf. Indoor Position. Indoor “CTIN: Robust contextual transformer network for inertial navigation,”
Navig. (IPIN), Sep. 2018, pp. 1–8. in Proc. AAAI Conf. Artif. Intell., vol. 36, no. 5, 2022, pp. 5413–5421.
[41] C. Chen et al., “Motiontransformer: Transferring neural inertial track- [64] B. Zhou et al., “DeepVIP: Deep learning-based vehicle indoor posi-
ing between domains,” in Proc. Conf. Artif. Intell. (AAAI), vol. 33, tioning using smartphones,” IEEE Trans. Veh. Technol., vol. 71, no. 12,
2019, pp. 8009–8016. pp. 13299–13309, Dec. 2022.
[42] M. Abolfazli Esfahani, H. Wang, K. Wu, and S. Yuan, “AbolDeepIO: A [65] F. Bo, J. Li, and W. Wang, “Mode-independent stride length estima-
novel deep inertial odometry network for autonomous vehicles,” IEEE tion with IMUs in smartphones,” IEEE Sensors J., vol. 22, no. 6,
Trans. Intell. Transp. Syst., vol. 21, no. 5, pp. 1941–1950, May 2020. pp. 5824–5833, Mar. 2022.

Authorized licensed use limited to: EURECOM. Downloaded on September 06,2024 at 00:07:59 UTC from IEEE Xplore. Restrictions apply.
10522 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2024

[66] H. Tang, X. Niu, T. Zhang, Y. Li, and J. Liu, “OdoNet: Untethered [90] C. Forster, L. Carlone, F. Dellaert, and D. Scaramuzza, “On-manifold
speed aiding for vehicle navigation without hardware wheeled odome- preintegration for real-time visual–inertial odometry,” IEEE Trans.
ter,” IEEE Sensors J., vol. 22, no. 12, pp. 12197–12208, Jun. 2022. Robot., vol. 33, no. 1, pp. 1–21, Feb. 2017.
[67] Y. Wang, H. Cheng, and M. Q.-H. Meng, “A2DIO: Attention-driven [91] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics:
deep inertial odometry for pedestrian localization based on 6D IMU,” The KITTI dataset,” Int. J. Robot. Res., vol. 32, no. 11, pp. 1231–1237,
in Proc. Int. Conf. Robot. Autom. (ICRA), May 2022, pp. 819–825. Sep. 2013.
[68] Y. Wang, J. Kuang, X. Niu, and J. Liu, “LLIO: Lightweight [92] M. Burri et al., “The EuRoC micro aerial vehicle datasets,” Int. J.
learned inertial odometer,” IEEE Internet Things J., vol. 10, no. 3, Robot. Res., vol. 35, no. 10, pp. 1157–1163, Sep. 2016.
pp. 2508–2518, Feb. 2023. [93] D.-J. Jwo, C.-H. Chuang, J.-Y. Yang, and Y.-H. Lu, “Neural net-
[69] F. Liu, H. Ge, D. Tao, R. Gao, and Z. Zhang, “Smartphone-based work assisted ultra-tightly coupled GPS/INS integration for seamless
pedestrian inertial tracking: Dataset, model, and deployment,” IEEE navigation,” in Proc. 12th Int. Conf. ITS Telecommun., Nov. 2012,
Trans. Instrum. Meas., vol. 73, pp. 1–13, 2024. pp. 385–390.
[70] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural [94] Z. Li, J. Wang, B. Li, J. Gao, and X. Tan, “GPS/INS/odometer inte-
Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. grated system using fuzzy neural network for land vehicle navigation
[71] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated applications,” J. Navigat., vol. 67, no. 6, pp. 967–983, Nov. 2014.
convolutions,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2016, [95] S. Hosseinyalamdary, “Deep Kalman filter: Simultaneous multi-sensor
pp. 1–10. integration and modelling; a GNSS/IMU case study,” Sensors, vol. 18,
[72] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. no. 5, p. 1316, Apr. 2018.
Cambridge, MA, USA: MIT Press, 2018. [96] C. Shen, Y. Zhang, J. Tang, H. Cao, and J. Liu, “Dual-optimization for
[73] I. Goodfellow, “Generative adversarial networks,” Commun. ACM, a MEMS-INS/GPS system during GPS outages based on the cubature
vol. 63, no. 11, pp. 139–144, 2020. Kalman filter and neural networks,” Mech. Syst. Signal Process.,
vol. 133, Nov. 2019, Art. no. 106222.
[74] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discrim-
inative domain adaptation,” in Proc. IEEE Conf. Comput. Vis. Pattern [97] S. Lu, Y. Gong, H. Luo, F. Zhao, Z. Li, and J. Jiang, “Heteroge-
Recognit. (CVPR), Jul. 2017, pp. 7167–7176. neous multi-task learning for multiple pseudo-measurement estimation
to bridge GPS outages,” IEEE Trans. Instrum. Meas., vol. 70,
[75] C. Chen, C. X. Lu, J. Wahlström, A. Markham, and N. Trigoni, pp. 1–16, 2021.
“Deep neural network based inertial odometry using low-cost inertial
[98] C. Shen et al., “Seamless GPS/Inertial navigation system based on
measurement units,” IEEE Trans. Mobile Comput., vol. 20, no. 4,
self-learning square-root cubature Kalman filter,” IEEE Trans. Ind.
pp. 1351–1364, Apr. 2021.
Electron., vol. 68, no. 1, pp. 499–508, Jan. 2021.
[76] A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural Inf.
[99] F. Wu, H. Luo, H. Jia, F. Zhao, Y. Xiao, and X. Gao, “Predicting the
Process. Syst., vol. 30, 2017, pp. 1–9.
noise covariance with a multitask learning model for Kalman filter-
[77] D. Ramachandram and G. W. Taylor, “Deep multimodal learning: A based GNSS/INS integrated navigation,” IEEE Trans. Instrum. Meas.,
survey on recent advances and trends,” IEEE Signal Process. Mag., vol. 70, pp. 1–13, 2021.
vol. 34, no. 6, pp. 96–108, Nov. 2017.
[100] Y. Xiao et al., “Residual attention network-based confidence estima-
[78] R. Clark, S. Wang, H. Wen, A. Markham, and N. Trigoni, “VINet: tion algorithm for non-holonomic constraint in GNSS/INS integrated
Visual-inertial odometry as a sequence-to-sequence learning problem,” navigation system,” IEEE Trans. Veh. Technol., vol. 70, no. 11,
in Proc. Conf. Artif. Intell. (AAAI), 2017, pp. 3995–4001. pp. 11404–11418, Nov. 2021.
[79] E. J. Shamwell, K. Lindgren, S. Leung, and W. D. Nothwang, “Unsu- [101] N. A. Abiad, Y. Kone, V. Renaudin, and T. Robert, “Smartstep: A robust
pervised deep visual-inertial odometry with online error correction for STEP detection method based on SMARTphone inertial signals driven
RGB-D imagery,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, by gait learning,” IEEE Sensors J., vol. 22, no. 12, pp. 12288–12297,
no. 10, pp. 2478–2493, Oct. 2020. Jun. 2022.
[80] C. Chen et al., “Selective sensor fusion for neural visual-inertial [102] M. Jaderberg et al., “Spatial transformer networks,” in Proc. Adv.
odometry,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Neural Inf. Process. Syst., vol. 28, 2015, pp. 1–12.
(CVPR), Jun. 2019, pp. 10534–10543. [103] A. Manos, T. Hazan, and I. Klein, “Walking direction estimation using
[81] L. Han, Y. Lin, G. Du, and S. Lian, “DeepVIO: Self-supervised deep smartphone sensors: A deep network-based framework,” IEEE Trans.
learning of monocular visual inertial odometry using 3D geometric Instrum. Meas., vol. 71, pp. 1–12, 2022.
constraints,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), [104] C. Lea, M. D. Flynn, R. Vidal, A. Reiter, and G. D. Hager, “Temporal
Nov. 2019, pp. 6906–6913. convolutional networks for action segmentation and detection,” in
[82] M. R. U. Saputra et al., “DeepTIO: A deep thermal-inertial odometry Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017,
with visual hallucination,” IEEE Robot. Autom. Lett., vol. 5, no. 2, pp. 156–165.
pp. 1672–1679, Apr. 2020. [105] P. Ren et al., “A comprehensive survey of neural architecture search:
[83] C. X. Lu et al., “MilliEgo: Single-chip mmWave radar aided egomotion Challenges and solutions,” ACM Comput. Surv., vol. 54, no. 4,
estimation via deep sensor fusion,” in Proc. 18th Conf. Embedded Netw. pp. 1–34, 2021.
Sensor Syst., Nov. 2020, pp. 109–122. [106] M. R. U. Saputra, P. Gusmao, Y. Almalioglu, A. Markham, and
[84] P. Wei, G. Hua, W. Huang, F. Meng, and H. Liu, “Unsupervised N. Trigoni, “Distilling knowledge from a deep pose regressor net-
monocular visual-inertial odometry network,” in Proc. 29th Int. Joint work,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019,
Conf. Artif. Intell., Jul. 2020, pp. 2347–2354. pp. 263–272.
[85] C. Chen, C. X. Lu, B. Wang, N. Trigoni, and A. Markham, “DynaNet: [107] S. Ahuja, W. Jirattigalachote, and A. Tosborvorn, “Improving accuracy
Neural Kalman dynamical model for motion estimation and prediction,” of inertial measurement units using support vector regression,” Stanford
IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 12, pp. 5479–5491, Univ., CA, USA, Tech. Rep., Oct. 2016.
Dec. 2021. [108] A. Parate, M.-C. Chiu, C. Chadowitz, D. Ganesan, and E. Kalogerakis,
[86] Y. Almalioglu, M. Turan, M. R. U. Saputra, P. P. B. de Gusm ao, “RisQ: Recognizing smoking gestures with inertial sensors on a wrist-
A. Markham, and N. Trigoni, “SelfVIO: Self-supervised deep monoc- band,” in Proc. 12th Annu. Int. Conf. Mobile Syst., Appl., Services,
ular visual–inertial odometry and depth estimation,” Neural Netw., Jun. 2014, pp. 149–161.
vol. 150, pp. 119–136, Jun. 2022. [109] A. Mannini and A. M. Sabatini, “Machine learning methods for
[87] Y. Tu and J. Xie, “UnDeepLIO: Unsupervised deep LiDAR-inertial classifying human physical activity from on-body accelerometers,”
odometry,” in Proc. Asian Conf. Pattern Recognit. Cham, Switzerland: Sensors, vol. 10, no. 2, pp. 1154–1175, 2010.
Springer, 2022, pp. 189–202. [110] A. Valtazanos, D. K. Arvind, and S. Ramamoorthy, “Using wearable
[88] E. S. Jones and S. Soatto, “Visual-inertial navigation, mapping and inertial sensors for posture and position tracking in unconstrained envi-
localization: A scalable real-time causal approach,” Int. J. Robot. Res., ronments through learned translation manifolds,” in Proc. ACM/IEEE
vol. 30, no. 4, pp. 407–430, Apr. 2011. Int. Conf. Inf. Process. Sensor Netw. (IPSN), Apr. 2013, pp. 241–252.
[89] S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, and P. Furgale, [111] M. Yuwono, S. W. Su, Y. Guo, B. D. Moulton, and H. T. Nguyen,
“Keyframe-based visual–inertial odometry using nonlinear optimiza- “Unsupervised nonparametric method for gait analysis using a waist-
tion,” Int. J. Robot. Res., vol. 34, no. 3, pp. 314–334, 2015. worn inertial sensor,” Appl. Soft Comput., vol. 14, pp. 72–80, Jan. 2014.

Authorized licensed use limited to: EURECOM. Downloaded on September 06,2024 at 00:07:59 UTC from IEEE Xplore. Restrictions apply.
CHEN AND PAN: DEEP LEARNING FOR INERTIAL POSITIONING: A SURVEY 10523

[112] Y. Huang, M. Kaufmann, E. Aksan, M. J. Black, O. Hilliges, and [126] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for
G. Pons-Moll, “Deep inertial poser: Learning to reconstruct human fast adaptation of deep networks,” in Proc. Int. Conf. Mach. Learn.,
pose from sparse inertial measurements in real time,” ACM Trans. 2017, pp. 1126–1135.
Graph., vol. 37, no. 6, pp. 1–15, Dec. 2018. [127] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron,
[113] X. Yi, Y. Zhou, and F. Xu, “TransPose: Real-time 3D human translation R. Ramamoorthi, and R. Ng, “NeRF: Representing scenes as neural
and pose estimation with six inertial sensors,” ACM Trans. Graph., radiance fields for view synthesis,” Commun. ACM, vol. 65, no. 1,
vol. 40, no. 4, pp. 1–13, Aug. 2021. p. 99, Dec. 2021.
[114] X. Yi et al., “Physical inertial poser (PIP): Physics-aware real- [128] A. Oord et al., “Parallel WaveNet: Fast high-fidelity speech synthesis,”
time human motion tracking from sparse inertial sensors,” in Proc. in Proc. 35th Int. Conf. Mach. Learn., 2018, pp. 3918–3926.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022,
pp. 13167–13178.
[115] D. Anguita et al., “A public domain dataset for human activity
recognition using smartphones,” in Proc. 21st Int. Eur. Symp. Artif.
Neural Netw., Comput. Intell. Mach. Learn., 2013, pp. 437–442.
[116] G. Chevalier, “LSTMs for human activity recognition,” Laval Univ.,
Quebec, Canada, Tech. Rep., 2016. Changhao Chen (Member, IEEE) received the
[117] T. Zebin, P. J. Scully, and K. B. Ozanyan, “Human activity recognition B.Eng. degree from Tongji University, China, the
with inertial sensors using a deep learning approach,” in Proc. IEEE M.Eng. degree from the National University of
Sensors, Oct. 2016, pp. 1–3. Defense Technology, China, and the Ph.D. degree
[118] D. Ravì, C. Wong, B. Lo, and G.-Z. Yang, “A deep learning approach from the University of Oxford, U.K. He is currently
to on-node sensor data analytics for mobile or wearable devices,” IEEE an Assistant Professor with the College of Intelli-
J. Biomed. Health Informat., vol. 21, no. 1, pp. 56–64, Jan. 2017. gence Science and Technology, National University
[119] B. M. Eskofier et al., “Recent machine learning advancements in of Defense Technology. His research interests lie
sensor-based mobility analysis: Deep learning for Parkinson’s disease in robotics, computer vision, and cyber-physical
assessment,” in Proc. 38th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. systems.
(EMBC), Aug. 2016, pp. 655–658.
[120] J. Windau and L. Itti, “Inertial-based motion capturing and smart
training system,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst.
(IROS), Nov. 2019, pp. 4027–4034.
[121] K. Weiss, T. M. Khoshgoftaar, and D. Wang, “A survey of transfer
learning,” J. Big Data, vol. 3, no. 1, pp. 1–40, May 2016.
[122] Y. Tian, C. Sun, B. Poole, D. Krishnan, C. Schmid, and P. Isola, “What
makes for good views for contrastive learning?” in Proc. Adv. Neural Xianfei Pan received the Ph.D. degree in control sci-
Inf. Process. Syst., vol. 33, 2020, pp. 6827–6839. ence and engineering from the National University
[123] A. Kendall and Y. Gal, “What uncertainties do we need in Bayesian of Defense Technology, Changsha, China, in 2008.
deep learning for computer vision?” in Proc. Adv. Neural Inf. Process. Currently, he is a Professor with the College of Intel-
Syst., vol. 30, 2017, pp. 1–13. ligence Science and Technology, National University
[124] J. Gou, B. Yu, S. J. Maybank, and D. Tao, “Knowledge distillation: A of Defense Technology. His current research inter-
survey,” Int. J. Comput. Vis., vol. 129, no. 6, pp. 1789–1819, Jun. 2021. ests include inertial navigation systems and indoor
navigation systems.
[125] C. Saharia et al., “Photorealistic text-to-image diffusion models with
deep language understanding,” in Proc. Neural Inf. Process. Syst., 2022,
pp. 1–16.

Authorized licensed use limited to: EURECOM. Downloaded on September 06,2024 at 00:07:59 UTC from IEEE Xplore. Restrictions apply.

You might also like