Positioning and Navigation Using Machine Learning Methods Navigation
Positioning and Navigation Using Machine Learning Methods Navigation
Kegen Yu Editor
Positioning
and Navigation
Using Machine
Learning
Methods
Navigation: Science and Technology
Volume 14
This series Navigation: Science and Technology (NST) presents new developments
and advances in various aspects of navigation - from land navigation, marine
navigation, aeronautic navigation to space navigation; and from basic theories,
mechanisms, to modern techniques. It publishes monographs, edited volumes,
lecture notes and professional books on topics relevant to navigation - quickly, up
to date and with a high quality. A special focus of the series is the technologies
of the Global Navigation Satellite Systems (GNSSs), as well as the latest progress
made in the existing systems (GPS, BDS, Galileo, GLONASS, etc.). To help
readers keep abreast of the latest advances in the field, the key topics in NST
include but are not limited to:
– Satellite Navigation Signal Systems
– GNSS Navigation Applications
– Position Determination
– Navigational instrument
– Atomic Clock Technique and Time-Frequency System
– X-ray pulsar-based navigation and timing
– Test and Evaluation
– User Terminal Technology
– Navigation in Space
– New theories and technologies of navigation
– Policies and Standards
This book series is indexed in SCOPUS and EI Compendex databases.
Kegen Yu
Editor
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Over the past few decades, global navigation satellite systems (GNSS) have made
great advances especially due to governments’ strong support and funding in the
modernization and updating of the systems. GNSS has been dominantly used for
positioning, navigation, and timing (PNT) and associated location-based services
(LBS) in outdoor environments. To enable PNT in GNSS-denied environments such
as indoor environments and underground space, a range of local positioning systems
has been developed, which makes use of different types of signals such as radio
signals, acoustic signals, infrared, visible light, and magnetic field. However, it is
still a challenge to meet the increasing stricter requirements on PNT in areas such
as reliability, accuracy, integrability, and safety to enable better LBS especially in
complex environments.
Machine learning has been applied to a variety of fields with remarkable success.
It has also been leveraged by researchers and engineers to handle complex and
challenging problems in PNT to enhance performance and related LBS. Machine
learning can usually provide a better solution to PNT than physics-based approaches
in the case where the application/service is demanding, the environment is complex,
or both. This book completely focuses on positioning and navigation using machine
learning methods. Specifically, five chapters are related to GNSS positioning and
navigation, while nine chapters deal with local positioning and navigation.
Specifically, Chap. 1 focuses on the use of machine learning to purify pseudor-
ange measurements for GNSS positioning. The Gradient Boosting Decision Trees
method is used to correct pseudorange errors in static scenario, while a random
forest (RF)-based pseudorange correction method is used in the dynamic scenario.
Chapter 2 deals with the performance enhancement of inertial navigation system
(INS) especially during GNSS outage. A deep learning network is proposed to assist
the INS. The network extracts the spatial features from the inertial measurement
unit signals and track their temporal characteristics to mitigate the measurement
errors. Chapter 3 analyzes the demand for integrity monitoring in high-precision posi-
tioning, overviews the developments of integrity monitoring in civil aviation applica-
tions, discusses the challenges on the calculation procedure of a few parameters for
high-precision positioning, and provides preliminary studies on the generalization
v
vi Preface
ix
x Contents
Abstract GNSS signals are easily blocked and reflected by high buildings in urban
areas, causing non-line-of-sight (NLOS) and multipath errors. These errors deteri-
orate the accuracy of position. In this chapter, machine learning based correction
method is proposed to mitigate the NLOS/multipath errors in pseudorange. The
results of a static and a dynamic experiments demonstrate the effectiveness of the
proposed method. In the static experiment, the improvements of positioning accuracy
in horizontal were 75.6 and 75.6%, and in 3D were 71.4 and 70.9%, compared with
two conventional positioning methods. In the dynamic experiment, the two varia-
tions of pseudorange error correction model (PBC and GBC) are used to improve
positioning accuracy in urban environments. PBC model achieved positional accu-
racy improvements in horizontal of 42.9 and 41.1%, and in 3D accuracy of 60.1 and
45.7% compared with comparative methods 1 and 2. GBC achieved improvements in
horizontal of 40.8 and 38.9%, and in 3D 63.3 and 50.0%, compared with comparative
methods, respectively.
1.1 Introduction
Global navigation satellite systems (GNSS) are widely used to provide positioning
services in urban areas. Although GNSS work well in open environments, in complex
urban environments, the GNSS signal is easily to be blocked, diffracted, or reflected
by high-rise buildings, resulting in non-line-of-sight (NLOS) reception and multipath
interference, which reduces the accuracy of the positioning obtained through GNSS.
In this chapter, the NLOS and multipath are defined as follows. NLOS means at
least one reflected signal is received, and the direct signal is not available. Multipath
Q. Cheng
Department of Land Surveying and Geo-informatics, The Hong Kong Polytechnic University,
Hong Kong, China
e-mail: [email protected]
R. Sun (B)
College of Civil Aviation, Nanjing University of Aeronautics and Astronautics, Nanjing, China
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 1
K. Yu (ed.), Positioning and Navigation Using Machine Learning Methods, Navigation:
Science and Technology 14, https://doi.org/10.1007/978-981-97-6199-9_1
2 Q. Cheng and R. Sun
consists of a direct signal and at least one reflected signal. Both NLOS and multi-
path can deteriorate the positioning accuracy [1]. Multipath error is about several
meters, being positive or negative. NLOS error is always positive, reaching tens, or
even hundreds of meters in harsh urban canon scenarios. Unlike other GNSS errors,
including orbit and clock errors of satellites, and atmospheric errors, which can be
mitigated or eliminated by models or differencing, NLOS/multipath errors are hard to
be modelled (at least currently), and cannot be differenced, especially for dynamic
applications in challenging environments. This is because NLOS/multipath errors
significantly depend on the surrounding environments and change rapidly in different
location. Therefore, the mitigation of NLOS/multipath is essential to provide more
accurate positioning services for urban applications. In this chapter, machine learning
is investigated to correct the NLOS/multipath errors of pseudorange, then to improve
the accuracy of positioning of both static and dynamic receiver. Since no extra sensor
is needed, the proposed algorithm is suitable for low-cost applications, and easily be
accepted by public market.
Various technologies have been investigated to mitigate the range errors caused by
NLOS/multipath effects and to improve the positioning accuracy. These methods
mainly include hardware design, vision aided and measurement-based modelling.
Hardware design methods can be divided into two types, antenna design and
receiver design. Several kinds of antennas can often be used for NLOS/multipath
mitigation. A typical example is the choke-ring antenna. The special designed bottom
of this antenna can block reflected signals from the ground. Therefore, it is useful to
mitigate the reflected signals of satellites with a low elevation angle [2–6]. However,
the reflected signals in urban areas are mainly from high buildings, which disable the
ground-based choke-ring antenna. Another example is the dual-polarisation antenna,
which captures both right-hand circularly polarised (RHCP) and left-hand circularly
polarised (LHCP) signals. In general, direct signals are with RHCP, and reflected
signals are with LHCP in a high possibility, depending on the satellite elevation
and the reflection surface roughness. By analysing the GNSS data from the dual-
polarisation antenna, NLOS/multipath effect can be mitigated [7–11]. However,
reflected signals are sometimes also with RHCP, which deteriorates this mitigation,
since these reflected signals may be mistakenly considered as line-of-sight (LOS)
signals. An antenna array, which is with more than one antenna, may be able to
estimate the direction of incoming signals. Therefore, it can be used for multipath
mitigation [12–16]. A rotating antenna can also be used for NLOS identification, by
analysing the Doppler shift [17]. Nevertheless, the performance and bulky size limit
their wide application.
Signal processing design in a receiver has also been investigated by many
researchers. Narrow correlator was proposed to mitigate the influence of code multi-
path [18]. It performs well in mitigating long-delay multipath signals but is less
1 GNSS Pseudorange Correction Using Machine Learning in Urban Areas 3
to make the best use of these classified signals. Exclusion may deteriorate the geom-
etry of satellites and then reduce positioning accuracy. The suitable weights of each
signal are difficult to be estimated in weighting least squares method. To avoid these
problems, another important function of machine learning, regression, has attracted
increasing attentions on NLOS/multipath mitigation. In the regression, the detailed
pseudorange error is predicted using machine learning, instead of the type of GNSS
signal. The corrected pseudorange can be used for positioning directly, without dete-
riorating geometry configuration of satellites. The key issue affecting the positioning
performance, however, is whether the pseudorange is appropriately corrected. If we
could design a robust algorithm to obtain the pseudorange error accurately from
each observed satellite, it would potentially be possible to achieve a high accuracy
positioning solution based on the pseudorange error correction method.
1.3 Methodology
The NLOS/multipath errors of GNSS will repeat for a receiver in the same environ-
ments due to the periodicity of satellites’ orbits [60]. For example, a NLOS error
repeats when the positions of receiver, satellite and reflector do not change. This
characteristic has been used to build model between NLOS/multipath errors and
GNSS measurements in specific environments. Firstly, NLOS/multipath errors in
pseudorange are calculated with the known coordinates of the receiver and satellites.
Then the pesudorange error prediction and correction model is trained using machine
learning with these errors (outputs) and corresponding GNSS measurements (inputs).
When a user comes to the same routes (with the a priori data), the pseudorange error
corrections can be provided in real-time using the trained model.
The framework of the proposed algorithm is shown in Fig. 1.1, which describes
a machine learning based pseudorange error correction algorithm that can be used
to improve the accuracy of positioning in urban areas. In this algorithm, machine
learning is used to train the rules during the offline periods in advance. In the training
process, firstly, the pseudorange errors are calculated using the known position of
both receiver and satellite. Then, several related inputs from GNSS measurements
are determined. Lastly, machine learning is used to train the data to generate the
rules between these inputs and corresponding labelled pseudorange errors. During
the online period, these trained rules can be used to predict the pseudorange errors
of new coming GNSS signals. After the correction of these pseudoranges, more
accurate positioning results can be calculated.
The received GNSS raw measurements and their derivatives can be used to eval-
uate and estimate the pseudorange error. In general, using more types of inputs
6 Q. Cheng and R. Sun
Pseudorange Inputs
Inputs
Error Calculation
Selection
Labelling
Pseudorange
Correction Rules Positioning
can improve the performance of the training and prediction in machine learning,
but increase the computational load. Here, some common inputs are introduced,
including C/N0 , pseudorange residuals η, satellite elevation angle θe , and position of
receiver. Part of them will be used in the following static and dynamic experiments.
(1) C/N0 : C/N0 can reflect the strength of a signal. Under the same noise power, the
C/N0 of a reflected (i.e. NLOS) is often smaller than that of a direct signal (i.e.
LOS). As for multipath signals, the value of C/N0 may become larger or smaller
than that of a LOS signal. C/N0 is the most common and effective indicator of
the pseudorage error estimation.
(2) Pseudorange residual, η: the pseudorange residual is the inconsistency between
the measured and calculated pseudoranges, expressed as η. It is worth noting
that pseudorange residual is a different concept from pseudorange error. Pseu-
dorange error is the difference between measured pseudorange and ideal pseu-
dorange (without any noise or errors), which means pseudorange error cannot
be calculated in practice, due to the lack of true position of receiver. In contrast,
the pseudorange residual can be calculated through the following equation:
η =ρ−G·r (1.1)
1 GNSS Pseudorange Correction Using Machine Learning in Urban Areas 7
r = (G T G)(−1) G T ρ. (1.2)
Pseudorange residual has been used to detect the NLOS/multipath signals with
sufficient GNSS measurements [52]. Therefore, the pseudorange residual is
selected as one of the potential inputs for the pseudorange error prediction.
(3) Satellite elevation angle, θe : GNSS signals are less likely to be blocked or
reflected by the surrounding environment in higher elevation angles and there-
fore with less NLOS/multipath effects. Weighting the measurements based on
the elevation angle to reduce the NLOS/multipath effect is widely used in the
position calculation [61]. The satellite elevation angle is therefore also used for
the pseudorange error prediction.
(4) Positional information: Since the NLOS/multipath errors are environment
related, the initial positional information can help users to find corresponding
surroundings, which can be obtained using the single point positioning initially.
where, R is the geometric range between the observed satellite and the receiver;
(xsv , ysv , z sv ) and (xr , yr , z r ) are the coordinates of the satellite and receiver in an
earth centred earth fixed (ECEF) coordinate system; c is the velocity of light in a
vacuum; δt sv is the satellite clock offset time; δt r is the receiver clock offset; I and T
denotes the atmospheric delays; M is the NLOS/multipath error; ε represents receiver
noise, which is relatively small and negligible.
After the corrections of related errors, the corrected pseudorange ρ c is:
8 Q. Cheng and R. Sun
( )
ρ c = R + c Δ δt r − Δ δt sv + Δ I + Δ T + M + ε (1.5)
where, the geometric range R can be calculated based on the known positions between
the receiver and observed satellite (broadcast ephemeris); Δ δt r is the residual of the
receiver clock offset after correction, in which the calculated receiver clock error is
from the pseudorange positioning equations with the known ground truth; Δ δt sv is the
residual of the satellite clock offset after correction (broadcast ephemeris); Δ I + Δ T
are the residuals of the ionospheric and tropospheric delays after the corrections from
the Klobuchar and Saastamoinen models. The pseudorange error Δ ρ, dominated by
NLOS/multipath, can be calculated:
Δ ρ = ρ c − R = c(Δ δt r − Δ δt sv ) + Δ I + Δ T + M + ε. (1.6)
We can calculate the corresponding pseudorange error Δ ρ following above steps, for
every set of inputs from the GNSS measurement, containing part of carrier-to-noise
ratio C/N0 , pseudorange residuals η, satellite elevation θe , and positional information
in the offline labelling phase. After obtaining the labelled data, machine learning can
be used to train the pseudorange correction model. The details of dataset and training
for the static and dynamic experiments are discussed in the next sections.
The Gradient Boosting Decision Trees (GBDT) method is used to correct pseudo-
range errors in this section. The detailed framework of the proposed GBDT based
GNSS pseudorange prediction and correction algorithm is presented in Fig. 1.2.
In the offline phase, GNSS measurements are collected from known points in an
urban canyon and a reference station in an open area to generate the training dataset.
LOS, NLOS and multipath signals are all contained in this training dataset. The
corresponding pseudorange errors are computed as described in the part of labelling
process. C/N0 , θe and η are selected as the inputs and the corresponding pseudorange
error is labelled as the output. During the offline phase, the GBDT algorithm is used
to fit the calculated pseudorange error, thereby obtaining the rules, which can reflect
the relationship between the inputs and the output. The details for the GBDT based
training process are introduced further in the subsequent sections, training process
of GBDT.
In the online phase, inputs (C/N0 , η and the θe ) from new GNSS measurements
in the urban canyon are used together with the rules extracted from the offline phase
to predict the pseudorange errors. According to the predicted results, two variations
of positioning algorithm are tested: (1) positioning solutions based on corrected
1 GNSS Pseudorange Correction Using Machine Learning in Urban Areas 9
Offline Phase
Online Phase Variation 1
GNSS Measurements Collected
Positioning Based on
from the Known Points in Urban
Pseudorange Correction
Canyon
Pseudorange
Testing Dataset
Errors Training Dataset
Fig. 1.2 Framework of the GBDT pseudorange correction based positioning algorithm
GNSS data were collected from several points to generate the training and testing
dataset. To avoid a biased fitting, training dataset D1 contains GNSS measurements
from two points in an urban canyon and one reference station. The data from urban
areas mainly include multipath/NLOS signals, while the data from reference station
mainly contain LOS signals. The testing dataset was also from the urban canyon.
The details are in Fig. 1.3.
In this section, GBDT is used to train models for the pseudorange error prediction
and correction. GBDT uses gradient boosting regression technique to minimise the
10 Q. Cheng and R. Sun
decision tree training error [62]. The problem in this section can be defined as:
given a training sample {xi , Δ ρi }N1 of known (x, Δ ρ) values, the goal is to find a
function that maps x to Δ ρ, such that over the joint distribution of all (x, Δ ρ) values,
the expected value of some specified loss function L(Δ ρi , f (xi )) is minimised. In
particular, xi = (C/N 0i , ηi , θei ), i = 1, 2, 3, …, N. i is the sequence number of the
sample, and N is the total number of the samples. Δ ρi is the corresponding labelled
pseudorange error of xi . The GBDT based pseudorange error prediction algorithm
is described as follows.
∑
N
f0 (x) = argmin L(Δ ρi , γ ) (1.7)
γ
i=1
where, f0 (x) is a regression decision tree containing only one root node, and
γ is a constant value which is the output of f0 (x). In order to ensure that
the loss function L(Δ ρi , f (xi )) decreases in each iteration, the weak learner
hm (xi ; am ), m = 1, . . . , M , is created in the direction of steepest descent (i.e., a
negative gradient direction). m is the sequence number of iterations. hm (xi ; am )
is a decision tree with the parameter a, which determines the splitting variable,
split locations, and terminal node of the individual tree.
(2) For m = 1, . . . , M
where, the loss function L(Δ ρi , f (xi )) is the square loss function,
1
2
(Δ ρi − f (xi ))2 ;
(2.2) Create a new dataset based on replacing Δ ρi in the training dataset by ỹi .
The new dataset is expressed as:
Tm = {(x1 , ỹ1 ), (x2 , ỹ2 ), . . . , (xi , ỹi ), . . . , (xN , ỹN )}. (1.9)
1 GNSS Pseudorange Correction Using Machine Learning in Urban Areas 11
(2.3) Update the original predictor with the new predictor multiplied by
learning rate β to form a stronger predictor:
(3) Output fM (x) as the final predictor after the iteration termination:
∑
M
fM (x) = f0 (x) + βhm (x; am ) (1.12)
m=1
(4) Once the final predictor fM (x) (i.e. the rules of the GBDT method) is obtained,
the corresponding pseudorange errors of the newly collected GNSS measure-
ments can be predicted. The input x = (C/N0 , η, θe ) is used together with the
rules to predict the pseudorange errors for each observed satellite.
The detailed two variations of positioning method based on the predicted pseudorange
errors are described below.
∼
where ρic is the corrected pseudorange of the ith satellite, and Δ ρ i is the predicted
pseudorange error of the ith satellite. With the corrected pseudoranges, the
position coordinates are calculated based on the least squares method.
(2) Positioning based on NLOS/multipath signal exclusion or correction.
The framework of this algorithm is illustrated in Fig. 1.4. Instead of correcting
pseudorange directly, the predicted pseudorange errors are used to determine the
types of signals, by comparing their absolute values with a proposed threshold
∼
p. If a predicted absolute pseudorange errors |Δ ρ i | is less than the threshold p,
the corresponding pseudorange can be consider a LOS signal. Otherwise, it will
be considered a NLOS/multipath signal. Nevertheless, we do not remove all
these NLOS/multipath signals since this could degrade the satellite geometry
significantly. The value of the PDOP is calculated for all tracking satellites at
each epoch as a reference. Then, the candidate PDOPs are calculated for each
time excluding one satellite. If the exclusion of this satellite increases PDOP, the
pseudorange of this satellite will be corrected. Alternatively, if the removal of
this satellite does not cause the PDOP to increase, this satellite will be excluded
before positioning.
The static case was carried out at a street in Hong Kong, with high rise buildings on
one side, covered by glass. The training dataset (D1) contained two parts, including
urban data and open-sky data. The urban data were collected from two points (P1
and P2), for about 20 min with a sampling frequency of 5 Hz using a NovAtel
OEM6 geodetic receiver, seen in Fig. 1.5. The open-sky data were from the SatRef
HKSC station, for 4 h with an interval of five seconds using a LEICA GR50 geodetic
receiver. The testing dataset (D2) was formed using the other 30 min’ data collected
in P1 at the sampling rate 5 Hz. Table 1.1 provides a summary of the datasets used
in this case.
For better comparisons, four methods are tested in this section, including two
conventional positioning methods (positioning with standard outlier detection and
exclusion, and positioning with C/N0 and elevation angle-based NLOS/ multipath
The 3D positioning error distribution of the four methods are also depicted in
Fig. 1.7. With the corrections of pseudorange, the epochs with a positioning accu-
racy within 30 m have increased significantly, while for conventional methods, most
errors fall within 60–90 m. For the NLOS/multipath exclusion or correction based
positioning, most errors are in 30–60 m, which is also worse than the pseudorange
correction results.
The positioning performance is further analysed by comparing with conventional
method one (which showed a similar performance to the conventional method two).
In Table 1.3, the horizontal and 3D positioning results improved in 98% and 97% of
the epochs, respectively, with the pseudorange correction, while only around 3% of
the epochs got worse, due to the inaccurate predicted errors. The NLOS/multipath
exclusion or correction method, meanwhile, improved the positioning accuracy of
about 81% (3D) and 91% (horizontal) of the epochs, while the positioning accuracy
for 9% (3D and horizontal) of the epochs did not change. The worse epochs for
the horizontal positioning are only 0.4% but with 10% for 3D positioning, since the
height is more easily affected by the reduction of satellite number in challenging
urban areas.
In summary, the positioning results in the static case show that the pseudorange
correction using machine learning can perform better than not only the two traditional
16 Q. Cheng and R. Sun
Table 1.3 Algorithm performance evaluation with proportion of epochs in the static case
Proportion of epochs (%) 3D Horizontal
Better Equal Worse Better Equal Worse
Positioning based on pseudorange 96.50 0.00 3.50 97.80 0.00 2.20
corrections
Positioning based on NLOS/multipath signal 81.13 8.77 10.10 90.53 9.11 0.36
exclusion or correction
methods, but also the NLOS/multipath exclusion or correction method. The pseudor-
ange correction method does not reduce the satellite number but correct pseudorange
NLOS/multipath errors, hence finally improving the positioning accuracy.
In this section, a random forest (RF) based pseudorange correction method, with
two variations, grid-based correction (GBC) and point-based correction (PBC), is
used in the dynamic experiment. The related framework is in Fig. 1.8. The offline
training and online testing parts are similar to the static case. One difference is that
a priori data were collected at a specific area with the receiver on a vehicle, instead
of several static points. Another difference is that the inputs are changed due to the
moving receiver. For GBC, C/N0 and elevation angle are selected as the inputs, while
for PBC, C/N0, elevation angle, and positional information are used. The calculated
pseudorange error still serves as the output for the machine learning in both variations.
The training dataset are divided according to the constellation type of the data in this
section, including GPS and BDS.
RF is used to train the rules, which can map the relation between the selected
inputs and labelled output. For the PBC, all the a priori data are used together to
generate the rules for GPS and BDS respectively for the whole area. In contrast, for
GBC, hexagonal grids are first designed to cover this area. Then, the data in each
hexagonal grid are used to train the rules of this grid for GPS and BDS.
1 GNSS Pseudorange Correction Using Machine Learning in Urban Areas 17
Based on the trained rules using RF in the offline phase, the pseudorange errors
will be predicted and corrected in the online phase. For the PBC, an iteration process
is needed, since the accurate positional information (one of the inputs) is lacked. The
result of single point positioning (without NLOS/multipath correction) is used as the
positional information at the first iteration. Then, the positioning result at current
iteration will be one of the inputs for the next iteration, until the maximum number
of iterations. For the GBC, the initial position is used to determine the hexagonal
grid, in which the receiver is located. Then the corresponding rules of this grid will
be used to predict and correct the pseudorange error. The position coordinates can
then be calculated based on the corrected pseudoranges.
The inputs and output are determined as follows. Firstly, the a priori data were
collected by high-grade GNSS/IMU integration systems mounted on the roof of a
moving vehicle, along the routes in a target region between 14:00 and 18:00, Beijing
time, every day for almost two months. Reference trajectories were estimated using
the post-processing of the GNSS/INS integration in the tightly-coupled mode through
Inertial Explorer (IE) software, whose accuracy can reach cm to dm level. After
obtaining the reference trajectories of the receiver, the corresponding pseudorange
errors can be calculated as described in the previous section labelling process. The
raw GNSS measurements in this integration systems were also collected for the
18 Q. Cheng and R. Sun
training process. They include C/N0 , and elevation angle (θe ) for each satellite at
each epoch.
Accordingly, in PBC, every set of inputs from the raw GNSS measurements,
include C/N0 , θe , and (Er , Nr ) (positional information from reference trajectories),
while in GBC, the inputs only contain C/N0 and θe . Nevertheless, they have the same
output, the corresponding calculated pseudorange error Δ ρ.
Based on these collected data, the training dataset can then be constructed
according to their constellation type, i.e. BDS and GPS. In the PBC algorithm,
therefore, two training datasets, namely trainingdataset_BDS and trainingdataset_
GPS, can be generated to extract their respective rules for the target region. In the
GBC algorithm, meanwhile, the number of training datasets takes account of not
only the constellation type but also the number of grids generated in the target
region. For example, trainingdataset_BDS grid _n and trainingdataset_GPS grid _n are
constructed for grid n. Then, two sets of rules will be extracted respectively from
trainingdataset_BDS grid _n and trainingdataset_GPS grid _n .
After the generation of the training datasets, the RF is used to train the rules for
pseudorange correction. RF is an ensemble learning algorithm that combines bagging
with the idea of a feature subspace. It can avoid the problems of insufficient precision
and overfitting that may occur with a single decision tree [63]. Therefore, RF has
been widely used to solve nonlinear problems [64, 65].
To clearly describe the training process, the input vector of the training dataset is
expressed as x and the corresponding output as y. The input vector x in PBC can be
expressed as x = [(E, N ), C/N0 , θe ], and in GBC as x = [C/N0 , θe ]. When there
( )K
are K samples in a training dataset L, then it can be expressed as L = { xq , yq }q=1 .
Figure 1.9 illustrates the use of random forest for regression model training and
prediction of the pseudorange error. The specific steps are as follows:
(1) Note that there are K samples in the training dataset L. Assuming there are
Ks (Ks < K) sample in a subset, a set of training subsets Lt is generated by
randomly sampled with replacement from the L, with equal probability 1/K for
each sample:
( ) Ks
Lt = { xq , yq }q=1 , t = 1,2, . . . , m (1.14)
where m is the total number of the sample subsets. Each subset is trained using
a single regression tree.
(2) s (s is no more than the total number of input features) features are randomly
sampled from the input features and one of them is selected for node splitting in
1 GNSS Pseudorange Correction Using Machine Learning in Urban Areas 19
order to generate a regression tree. At the node of the decision tree, the calcu-
lation principle governing the feature selected for node splitting is to minimize
the mean square error; That is, for an arbitrary feature (represented by A), the
dataset will be divided into two datasets D1 and D2 at an arbitrary divide point
j corresponding to A. The aim here is to find the feature and the splitting point
j, that minimizes the mean square error (MSE) of D1 and D2 , respectively, and
the sum of them:
[ ∑ ∑ ]
(min) min (yq − c2 )2 + min (yq − c2 )2 . (1.15)
(A,j) c1 xq εD1 (A,j) c2 xq εD2 (A,j)
obtain the predicted pseudorange error value y of the RF output. Finally, the
Δ
1∑
m
Δ ρ̂ = ŷ = ŷt . (1.16)
m t=1
N0 and elevation angle are used together to predict pseudorange error Δ ρ k for
∼
the satellite k. Assuming that the observed pseudorange is ρ k , the corrected
pseudorange ρk' can be obtained as:
After determining these hexagonal grids, the training dataset needs to be divided
according to the hexagonal grid size, as will be described in detail below. At each
epoch, the training sample belongs to a hexagonal grid where the receiver is located
in, as shown in Fig. 1.10.
Assuming that there is a hexagonal grid with a center point o, and the receiver
is at a point B. To easier describe the steps, a rectangular plane coordinate system
is established, with a x-axis and y-axis, shown in Fig. 1.10. Let the radius of this
hexagon be a and the coordinates of B be (w, q). The determination of whether point
B is located inside the hexagon with the center point o is described as follows.
Otherwise, point B is located inside the outer rectangle (represented by the red
rectangle in Fig. 1.10) of the hexagon.
(2) Determine whether a−q ≥ √w3 . If this condition is false, point B does not belong
to the hexagon or is on the boundary of the hexagonal grid. If the condition is
true, point B is located inside the hexagon. After the division of the training set,
the pseudorange error correction models are trained grid-by-grid according to
the above division results.
The process of GBC based positioning is then as follows. An initial position is also
calculated to determine which hexagonal grid the user is located in. Then, the correc-
tion model of this corresponding hexagonal grid is used to correct received pseu-
doranges. Finally, the more accurate position can be obtained using these corrected
pseudoranges.
The iteration times of PBC have an influence on both the efficiency and accuracy.
Therefore, it is essential to analyse the performance of different iteration times. The
root mean square errors (RMSEs) of positioning under different number of iterations
are shown in Fig. 1.11. The RMSE is about 28 m for the initial positioning, using
CSPP 1 (without pseudorange error correction). It is reduced to about 12 m after one
iteration. This means that the iteration is effective to improve the positioning accuracy.
As the number of iterations increases, however, the effect of accuracy improvement
gradually becomes weaker and tends to converge. Too many iteration times will
increase the computational burden. The suitable number should be determined for
the compromise of efficiency and accuracy.
This chapter determine the iteration threshold using the proportion of convergent
epochs. If the RMSE of the jth iteration is equal to that of the (j + 1)th and (j + 2)th
iterations, this epoch is considered convergence at the jth iteration. The proportion
of convergent epochs under different number of iterations is shown in Table 1.4
and Fig. 1.12. It is indicated that after the fifth iteration, more than 80% epochs
converged. After the tenth iteration, this proportion was stable at 83.2%, so the
number of iterations was set as ten in this section.
For the GBC algorithm, the spatial resolution of GBC is set to 25 m, according to
Smolyakov et al. [67]. The heatmap in Fig. 1.13 is used to show the RMSEs of the
pseudoranges in each hexagonal grid. Since BDS has better visibility than GPS in
the observed area in China, the RMSE level of the pseudorange of BDS is generally
lower than that of GPS.
1 GNSS Pseudorange Correction Using Machine Learning in Urban Areas 23
For better comparison, CSPP 1 (introduced in Positioning with PBC) and CSPP 2
are used as the comparative methods. CSPP 2 is based on CSPP 1, with an extra
elevation angle based weighting strategy and a C/N0 threshold of 25 dB-Hz. The
weighting matrix is:
W = diag( sin(θ1e1 )2 · · · 1
sin(θen )2
) (1.18)
where, θen means the elevation angle of satellite n; diag() means diagonal matrix.
The positioning results of the four algorithms are shown in Fig. 1.14, with a
reference trajectory in black dots. The red and blue dots are the positioning results
of CSPPs 1 and 2, while the green and yellow dots are the positioning results using
the proposed PBC and GBC algorithms. The whole experiment can be divided into
three parts according to the environments, with narrow street, obstacles such as
tall buildings and overpasses. Specially, the details of each part are zoomed in, for
24 Q. Cheng and R. Sun
Fig. 1.13 BDS/GPS heatmap in the target urban region of the satellite pseudorange, where the
different colors mean different value of the root mean square error (m)
clearly comparison. It is indicated that the proposed algorithms deliver more accurate
positioning results in the NLOS/multipath contaminated areas.
Table 1.5 shows the positioning accuracy of the different methods, in terms of
RMSE, and the corresponding positioning error time series are shown in Figs. 1.15
and 1.16. It can be found that the performance of PBC and GBC is comparable, with
RMSEs of 5.6 and 5.8 m in horizontal, and 11.4 and 10.5 m in 3D. Compared with
the results of CSPPs 1 and 2, PBC can achieve improvements in accuracy by 37.3 and
36.5% in east, by 49.2 and 46.7% in north and by 63.2 and 47.3% in the up direction.
For GBC, the improvements are 34.7% and 33.8%, 50.8% and 48.3%, and 67.3%
and 53.2%, respectively. This can be validated by the results in Figs. 1.15 and 1.16,
where the errors of CSPPs 1 and 2 fluctuate violently. In contrast, both the proposed
algorithms can effectively reduce this fluctuation, although the performance of GBC
is slightly more stable. This is caused by their different dataset construction in the
training process. For PBC, the training set contains the prior data of the entire road
segment, while for GBC, the dataset is divided according to the hexagonal grids.
This means that the small individual grids in GBC reflect relatively more accurate
environments than the entire road segments that make up the PBC data.
1 GNSS Pseudorange Correction Using Machine Learning in Urban Areas 25
PART 1
PART 2
Start Point
PART 3
In Table 1.5, compared with CSPPs 1 and 2, the error reductions in the up direction
are 17.0 and 8.9 m for PBC, and 18.1 and 10.0 m for GBC. They are about 3 m in
east and north direction. This means that NLOS/multipath errors can cause larger up
direction error than the horizontal error.
The error in east is larger than that in north direction for all algorithms. This is
because that the overall orientation of the driving route in the test case is north–south,
while the east–west direction is cross-street at most epochs. Given that the satellite
visibility along the street is better than that in the cross-street direction, there are
large outliers in the eastward error in Fig. 1.15 compared to north direction, which
considerably increases the RMSE value. Even, after correcting NLOS/multipath
errors, the accuracy in east direction is still higher than that in north direction.
The horizontal and 3D error distributions of the four algorithms are shown in
Fig. 1.17. The shape of the error distribution after pseudorange correction changes
1 GNSS Pseudorange Correction Using Machine Learning in Urban Areas 27
1.6 Conclusion
In the static experiments, GBDT was used to predict and correct the pseudor-
ange errors. Based on the corrected pseudoranges, the improvements of positioning
accuracy in horizontal were 75.6% and 75.6%, and in 3D were 71.4% and 70.9%,
compared to two conventional positioning methods, respectively.
In the dynamic experiment, RF is used for the pseudorange correction in urban
environments with two variations, PBC and GBC. PBC model achieved improve-
ments in horizontal positioning accuracy of 42.9% and 41.1%, and in 3D accuracy
of 60.1% and 45.7% compared with CSPP 1 and 2. GBC achieved improvements in
horizontal of 40.8% and 38.9%, and in 3D 63.3% and 50.0%, compared with CSPPs
1 and 2, respectively.
The proposed machine learning based pseudorange correction methods do not
require additional sensors and can be used in real-time. Therefore, it is suitable for
the users with low-cost receivers in urban areas, such as smartphone. It can potentially
benefit the public widely.
Acknowledgements Authors thanks Guanyu Wang and Linxia Fu for data processing. The authors
would also like to thank these who collected the data. This work was jointly supported by the
sponsorship of the University Grants Committee of Hong Kong under the scheme Research Impact
Fund (Grant No. R5009-21), the Research Institute of Land and System, Hong Kong Polytechnic
University, the National Natural Science Foundation of China (Grant No. 41974033, 42174025),
and the Natural Science Foundation of Jiangsu Province (Grant No. BK20211569).
References
1. Groves PD, Jiang Z (2013) Height aiding, C/N0 weighting and consistency checking for GNSS
NLOS and multipath mitigation in urban areas. J Navig 66(5):653–669
2. Tranquilla JM, Carr JP, Al-Rizzo HM (1994) Analysis of a choke ring groundplane for multipath
control in global positioning system (GPS) applications. IEEE Trans Antenn Propag 42(7):905–
911
3. Blum R, Bischof R, Sauter UH, Foeller J (2016) Tests of reception of the combination of GPS
and GLONASS signals under and above forest canopy in the Black Forest, Germany, using
choke ring antennas. Int J Forest Eng 27(1):2–14
4. Taghdisi E, Ghaffarian S, Mirzavand R, Mousavi P (2020) Compact substrate integrated choke
ring ground structure for high-precision GNSS applications. In: IEEE international symposium
Antenn Propaga North American Radio Science meeting, pp 1705–1706
5. Lin D, Wang E, Wang J (2022) New choke ring design for eliminating multipath effects in the
GNSS system. Int J Antenn Propaga 2022:1–6
6. Rykała Ł, Rubiec A, Przybysz M, Krogul P, Cieślik K, Muszyński T, Rykała M (2023)
Research on the positioning performance of GNSS with a low-cost choke ring antenna. Appl
Sci 13(2):1007
7. Jiang Z, Groves PD (2014) NLOS GPS signal detection using a dual-polarisation antenna. GPS
Solut 18:15–26
8. Palamartchouk K, Clarke PJ, Edwards SJ, Tiwari R (2015) Dual-polarization GNSS obser-
vations for multipath mitigation and better high-precision positioning. In: Proceedings of the
28th international technical meeting of the Satellite division of The Institute of Navigation, pp
2772–2779
1 GNSS Pseudorange Correction Using Machine Learning in Urban Areas 29
9. Guermah B, Sadiki T, El Ghazi H (2017) Fuzzy logic approach for GNSS signal classification
using RHCP and LHCP antennas. In: IEEE 8th annual ubiquitous computing, electronics and
mobile communication conference (UEMCON), pp 203–208
10. Egea-Roca D, Tripiana-Caballero A, López-Salcedo JA, Seco-Granados G, De Wilde W,
Bougard B, ..., Popugaev A (2018) GNSS measurement exclusion and weighting with a dual
polarized antenna: the FANTASTIC project. In: 8th international conference on the localization
and GNSS, pp 1–6
11. Ge X, Liu X, Sun R, Fu L, Qiu M, Zhang Z (2023) A weighted GPS positioning algorithm for
urban canyons using dual-polarised antennae. J Locat Based Serv 17(3):185–206
12. Seco Granados G (2000) Antenna arrays for multipath and interference mitigation in GNSS
receivers. Universitat Politècnica de Catalunya
13. Closas P, Fernández-Prades C (2011) A statistical multipath detector for antenna array based
GNSS receivers. IEEE Trans Wirel Commun 10(3):916–929
14. Daneshmand S, Broumandan A, Sokhandan N, Lachapelle G (2013) GNSS multipath
mitigation with a moving antenna array. IEEE Trans Aerosp Electron Syst 49(1):693–698
15. Vagle N, Broumandan A, Jafarnia-Jahromi A, Lachapelle G (2016) Performance analysis of
GNSS multipath mitigation using antenna arrays. J Glob Position Syst 14(1):1–15
16. Razgūnas M, Rudys S, Aleksiejūnas R (2023) GNSS 2 × 2 antenna array with beamforming
for multipath detection. Adv Space Res 71(10):4142–4154
17. Suzuki T, Matsuo K, Amano Y (2020) Rotating GNSS antennas: simultaneous LOS and NLOS
multipath mitigation. GPS Solut 24:1–13
18. Van Dierendonck AJ, Fenton P, Ford T (1992) Theory and performance of narrow correlator
spacing in a GPS receiver. Navig 39(3):265–283
19. McGraw GA, Braasch MS (1999) GNSS multipath mitigation using gated and high-resolution
correlator concepts. In: Proceedings of the 1999 national technical meeting of the Institute of
Navigation, pp 333–342
20. Garin L, van Diggelen F, Rousseau JM (1996) Strobe and edge correlator multipath mitigation
for code. In: Proceedings of the 9th international technical meeting of the satellite division of
the Institute of Navigation, pp 657–664
21. Irsigler M, Hein GW, Eissfeller B (2004) Multipath performance analysis for future GNSS
signals. In: Proceedings of the 2004 national technical meeting of the Institute of Navigation,
pp 225–238
22. Van Nee RD, Siereveld J, Fenton PC, Townsend BR (1994) The multipath estimating delay
lock loop: Approaching theoretical accuracy limits. In: Proceedings of the 1994 IEEE position,
location and navigation, pp 246–251
23. Meguro J, Murata T, Takiguchi J, Amano Y, Hashizume T (2009) GPS multipath mitigation for
urban area using omnidirectional infrared camera. IEEE Trans Intell Transp Syst 10(1):22–30
24. Shytermeja E, Garcia-Pena A, Julien O (2014) Proposed architecture for integrity monitoring
of a GNSS/MEMS system with a fisheye camera in urban environment. In: International
conferences localization GNSS 2014, Helsinki, Finland, 24–26 June 2014, pp 1–6
25. Tokura H, Kubo N (2016) Effective satellite selection methods for RTK-GNSS NLOS exclusion
in dense urban environments. In: Proceedings of the ION GNSS + 2016, Portland, Oregon,
September, pp 304–312
26. Moreau J, Ambellouis S, Ruichek Y (2017) Fisheye-based method for GPS localization
improvement in unknown semi-obstructed areas. Sensors 17(1):119
27. Horide K, Yoshida A, Hirata R, Kubo Y, Koya Y (2019) NLOS satellite detection using fish-eye
camera and semantic segmentation for improving GNSS positioning accuracy in urban area.
Proc ISCIE Int Symp Stochastic Syst Theory Appl 2019:212–217
28. Wen W, Zhang G, Hsu LT (2019a) GNSS NLOS exclusion based on dynamic object detection
using LiDAR point cloud. IEEE Trans Intell Transp Syst 22(2):853–862
29. Wen W, Zhang G, Hsu LT (2019b) Correcting NLOS by 3D LiDAR and building height to
improve GNSS single point positioning. Navig 66(4):705–718
30. Hassan T, Fath-Allah T, Elhabiby M, Awad A, El-Tokhey M (2022) Detection of GNSS no-
line of sight signals using LiDAR sensors for intelligent transportation systems. Surv Rev
54(385):301–309
30 Q. Cheng and R. Sun
54. Suzuki T, Amano Y (2021) NLOS multipath classification of GNSS signal correlation output
using machine learning. Sensors 21(7):2503
55. Lau L, Cross P (2007) Development and testing of a new ray-tracing approach to GNSS
carrier-phase multipath modelling. J Geod 81:713–732
56. Nicolas ML, Jacob M, Smyrnaios M, Schæn S, Kürner T (2011) Basic concepts for the
modelling and correction of GNSS multipath effects using ray tracing and software receivers. In:
2011 IEEE-APS topical conference on antennas and propagation in wireless communications,
pp 890–893
57. Zhu F, Ba T, Zhang Y, Gao X, Wang J (2020) Terminal location method with NLOS exclu-
sion based on unsupervised learning in 5G-LEO satellite communication systems. Int J Satell
Commun Netw 38(5):425–436
58. Li L, Elhajj M, Feng Y, Ochieng WY (2023) Machine learning based GNSS signal classification
and weighting scheme design in the built environment: a comparative experiment. Satell Navig
4(1):1–23
59. Ng HF, Zhang G, Hsu LT (2021) Robust GNSS shadow matching for smartphones in urban
canyons. IEEE Sens J 21(16):18307–18317
60. Sun R, Fu L, Wang G, Cheng Q, Hsu LT, Ochieng WY (2021) Using dual-polarization
GPS antenna with optimized adaptive neuro-fuzzy inference system to improve single point
positioning accuracy in urban canyons. Navig 68(1):41–60
61. Eueler HJ, Goad CC (1991) On optimal filtering of GPS dual frequency observations without
using orbit information. Bull Géodésique 65:130–143
62. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat
1189–1232
63. Breiman L (2001) Random Forests. Mach Learn 45:5–32
64. Ham J, Chen Y, Crawford MM, Ghosh J (2005) Investigation of the random forest framework
for classification of hyperspectral data. IEEE Trans Geosci Remote Sens 43(3):492–501
65. Stumpf A, Kerle N (2011) Object-oriented mapping of landslides using Random Forests.
Remote Sens Environ 115(10):2564–2577
66. Biagi L, Caldera S (2013) An efficient leave one block out approach to identify outliers. J Appl
Geodesy 7(1):11–19
67. Smolyakov I, Rezaee M, Langley RB (2020) Resilient multipath prediction and detection
architecture for low-cost navigation in challenging urban areas. Navig 67(2):397–409
Chapter 2
Deep Learning-Enabled Fusion to Bridge
GPS Outages for INS/GPS Integrated
Navigation
Abstract The low-cost inertial navigation system (INS) suffers from bias and
measurement noise, which would result in poor navigation accuracy during the global
positioning system (GPS) outages. Aiming to bridge the GPS outages duration and
enhance the navigation performance, a deep learning network architecture named
GPS/INS neural network (GI-NN) is proposed to assist the INS. The GI-NN combines
a convolutional neural network and a gated recurrent unit neural network to extract
the spatial features from the inertial measurement unit (IMU) signals and track their
temporal characteristics. The relationship among the attitude, specific force, angular
rate and the GPS position increment is modelled, while the current and previous
IMU data are used to estimate the dynamics of the vehicle via the proposed GI-NN.
Numerical simulations, real field tests and public data tests are performed to evaluate
the effectiveness of the proposed algorithm. Compared with the traditional machine
learning algorithms, the results illustrate that the proposed method can provide more
accurate and reliable navigation solution in the GPS denied environments.
2.1 Introduction
The integrated navigation system based on the Inertial Navigation Systems (INS)
and Global Positioning System (GPS) is a high-precision position and navigation
solution for most unmanned ground vehicles or unmanned aerial vehicles (UAVs)
when GPS signals are available [1–3]. However, when the UAVs equipped with
Y. Zhou (B)
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Xueyuan Avenue,
Xili Univerrsity Town, Shenzhen 1068, China
e-mail: [email protected]
Y. Liu
Artificial Intelligence and Digital Economy Guangdong Provincial Laboratory, Shenzhen, China
J. Hu
School of Information and Electrical Engineering, Hunan University of Science and Technology,
Xiangtan, China
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 33
K. Yu (ed.), Positioning and Navigation Using Machine Learning Methods, Navigation:
Science and Technology 14, https://doi.org/10.1007/978-981-97-6199-9_2
34 Y. Zhou et al.
GPS pass through environments with weak GPS signals, such as dense skyscraper
or tunnels, the INS/GPS integrated navigation system would turn to only INS mode
with the presence of the inertial measurement unit (IMU) bias drift and scale factor
instability [4], so the navigation accuracy would sharply decreases. Therefore, it is
necessary to explore more robust integrated navigation systems to adapt to various
complex environments.
To address the issues of GPS interruption on the navigation performance, this
chapter proposes a deep learning network structure-the GPS/INS Neural Network
(GI-NN) to assist in the INS position navigation. The GI-NN combines the Convolu-
tional Neural Network (CNN) and Gated Recurrent Unit Neural Network (GRUNN)
to extract spatial features from the IMU signals and track their temporal information.
It establishes a relationship model among attitude, specific force, angular rate, and
GPS position increments, so as to dynamically estimate the vehicle motion state via
the current and past IMU data. Furthermore, a hybrid fusion strategy is designed to
fit the nonlinear relationship between the sensor measurements and GPS position
increments: when GPS is available, INS data is fused with GPS data using a Kalman
filter to obtain more reliable position and velocity information. Meanwhile, the GPS
and INS data are stored in an onboard computer for the GI-NN model training. When
GPS signals are unavailable, the trained GI-NN model is used to predict the GPS
position increments and generate virtual GPS position values to continue to fuse with
INS data, thereby maintaining high navigation accuracy. Finally, the effectiveness of
the algorithm is validated through simulation, actual experiments and public datasets.
Comparative experimental results with the traditional machine learning algorithms
demonstrate that the proposed method can provide a more accurate and reliable
navigation solution in GPS interrupted environments.
INS and GPS each has distinct advantages and notable shortcomings. INS is an
autonomous navigation system that does not rely on external electromagnetic infor-
mation, featuring characteristics like good concealment, strong resistance to the inter-
ference, and short-term high precisions. However, the long-term errors of the INS
diverge without bounds over time due to the errors accumulating in the inertial
devices. GPS has long-term stable positioning accuracy, and errors do not accumu-
late over time, but its output data rate is relatively low. Therefore, a combined system
can be constructed to overcome their individual shortcomings and a robust position
navigation solution can provide with higher accuracy than a single navigation system,
by fully leveraging the complementary characteristics of the INS/GPS.
2 Deep Learning-Enabled Fusion to Bridge GPS Outages for INS/GPS … 35
Fig. 2.2 Local horizontal coordinate system and body coordinate system
pointing to local geographic east, and the Z-axis forming a right-handed Cartesian
coordinate system with the X and Y axes, perpendicular to the ellipsoidal surface of
the Earth. Since the Z-axis can be oriented upward or downward perpendicular to
the Earth’s ellipsoidal surface, there are two types of the Local Horizontal Coordi-
nate System: one is the North-East-Down (NED), and the other is the East-North-Up
(ENU). The NED coordinate system is adopted in this chapter.
4. Body Coordinate System
In practical applications, the measurement axes of the accelerometer and gyroscope
in the MEMS-IMU are determined by the axes of the motion platform on which the
device is mounted, forming the Body Coordinate System. As shown in Fig. 2.2b,
the origin is set at the mass center of the vehicle, the Y-axis points forward along
the vehicle, the X-axis is perpendicular to the Y-axis and points sideways along the
vehicle, and the Z-axis forms a right-handed Cartesian coordinate system with the
X and Y axes, pointing vertically along the vehicle, following the right-hand rule.
The Kalman Filter (KF) [5] is an efficient algorithm in control theory, also known as
the Linear Quadratic Estimator (LQE), which can be used to estimate the unknown
variables via a series of observations and measurements, providing greater accuracy
than estimates based on the individual measurements [6, 7]. The corresponding state
and measurement equations must be established in order to use the KF for the system
state estimation. Assuming that the actual state at time k is derived from the state at
time k − 1, the discrete form of the system state equation can be written as,
xk = Fk xk−1 + Bk uk + ωk (2.1)
2 Deep Learning-Enabled Fusion to Bridge GPS Outages for INS/GPS … 37
where xk is the state vector at the kth moment; Fk is the state transition model applied
to the previous state xk−1 ; Bk is the coefficient vector related to the control vector
uk ; ωk is the process noise assumed as a zero mean with normal distribution and
covariance matrix Qk .
The measurement equation can be defined as,
zk = Hk xk + vk (2.2)
where zk is the measurement vector; Hk is the observation model which maps the
actual state space into the observed space; vk is the observation noise assumed to be
zero mean Gaussian white noise with covariance Rk .
If the estimated state vector xk and measurement vector zk can be written in the
form of Eqs. (2.1) and (2.2), and the noise ωk and vk follow the zero mean Gaussian
white noise distribution, the process of KF algorithm can be divided into two parts,
(1) Time Update:
Δ
− Δ
xk = FK xk−1 + Bk uk (2.3)
Δ
− Δ ( −) Δ
xk = xk + Kk zk − Hk xk (2.6)
Pk = (I − Kk Hk )Pk− (2.7)
−
In Eqs. (2.3) and (2.4), xk is the a priori estimate of the system state, and Pk−
Δ
is the a priori error covariance. Pk denotes the predicted state covariance matrix,
and Kk is the Kalman filter gain matrix. The essence of the Kalman filter lies in the
iterative process of time update and measurement update, as depicted in Fig. 2.3.
First, the filter undergoes initialization, which requires the initial values for the state
estimate x0 and the estimated mean square error P0 . P0 is derived based on x0 and is
typically set to a relatively high value. Besides, the initial estimates of the system noise
covariance matrix Q and the measurement noise covariance matrix R are required,
which are based on prior state information about the system for the optimal state
estimation. In the first step of the time update process, the system state is recursively
− Δ
updated from time k-1 to time k, denoted as xk . The second step of the time update
− −Δ
involves calculating the covariance Pk of xk . This process utilizes all the available
information at time k − 1 to obtain the expected value of the state error variance at
time k.
38 Y. Zhou et al.
The architecture of the INS/GPS integrated navigation can be classified into loosely
coupled, tightly coupled and ultra-tightly coupled types. Loosely coupled structure
has the advantages such as simplicity and good robustness, making it widely used
in the navigation of UAVs. Therefore, this section takes loosely coupled as the basic
2 Deep Learning-Enabled Fusion to Bridge GPS Outages for INS/GPS … 39
framework, and utilize the KF to fuse the INS/GPS navigation information, aiming
to achieve integrated navigation positioning.
The process of the INS pose estimation is illustrated in Fig. 2.4. The raw measure-
ments of the gyroscopes are used to calculate the attitude matrix of the carrier.
Through this attitude matrix, the specific force information measured by the
accelerometer along the axes of the carrier coordinate system is transformed into
a specific coordinate system (such as the navigation coordinate system), then the
navigation computation is performed.
The INS position estimation algorithm first requires to associate the measurements
of the IMU with the navigation information. Since the navigation position infor-
mation is obtained by integrating the IMU measurements, the most direct method
to associate the two is to establish differential equations between them, namely
attitude differential equations, velocity differential equations and position differen-
tial equations. By solving the corresponding differential equations, the related pose
information can be obtained. In order to establish a comprehensive and general
INS differential equation, taking medium-to-high precision inertial navigation as an
example, based on reference [8], the attitude differential equation with the navigation
coordinate system (N-frame) as the reference frame is written as,
Fig. 2.4 The flowchart of the INS position and attitude calculation
40 Y. Zhou et al.
[ b ]
Ċbn = Cbn ωib × − [ωin
n
×]Cbn (2.9)
where ωinn
is the rotation from the n-frame to the i-frame, including two components:
the rotation of the navigation coordinate frame caused by the Earth rotation and the
rotation of the n-frame due to the motion of the INS near the Earth surface caused
by the curvature of the Earth, namely ωin n
= ωien +ωnen ,
[ ]T
ωien = 0 ωie cosL ωie sinL (2.10)
[ ]T
ωen
n
= − RMvN+h vE vE
RN +h RN +h
tanL (2.11)
where ωie is the Earth rotation angular velocity; L and R are the geographical latitude
and height, respectively; vN and vE are the velocities in the northward and eastward
directions of the vehicle, respectively; RN and RM are the radii of the Earth meridian
and equatorial circles, respectively; h.
Correspondingly, the velocity differential equation and the position differential
equation are demonstrated in Eqs. (2.12) and (2.13), respectively, where ωien × vn
represents the acceleration caused by the vehicle moving on the rotating surface of
the Earth, and ωen
n
×vn is the centripetal acceleration caused by the vehicle movement
on the Earth surface,
( )
v̇n = Cbn fibb − 2ωien + ωen
n
× vn + g n (2.12)
ṙ n = vn − ωen
n
× rn. (2.13)
The accuracy of the INS is affected by various factors, including initial alignment
errors, inertial sensor errors, and limitations of the processed algorithms [9]. To fully
understand the impact of these errors on the output parameters of the INS (position,
velocity, attitude), it is necessary to establish differential equations of the INS errors.
Hence, the KF can be utilized to estimate and compensate for these errors. The INS
attitude error can be derived as,
φ̇ = φ × ωin
n
+ δωin
n
− εn (2.14)
where δV n and V n are the velocity error and velocity in the east, north and upward,
respectively; f n is the specific force; ωien and ωen
n
are the earth self-rotation rate and
the angle rate relative to the earth in the navigation coordinate system respectively;
∇ n is the accelerometers bias of the navigation frame.
The INS position error equation is written as,
⎧ δVN
⎪
⎨ δ L̇ = RM +h
− (RδhV+h)
N
2
M
δVE secL δLVE tanLsecL δhVE secL
δ λ̇ = + − (2.16)
⎪
⎩
RN +h RN +h (RN +h)2
δ ḣ = δVU
where δL, δλ and δh are the errors of the latitude, longitude and height; δVN , δVU and
δVE denote the velocity errors in the north, east and upward directions respectively;
RM and RN are the radiuses of the curvature in the meridian and prime vertical.
As depicted in Fig. 2.5, the GPS and INS are first decoupled and operated indepen-
dently to provide the navigation outputs separately. To improve the output perfor-
mance, the outputs of both GPS and INS are subtracted and fed back to the KF. The
errors of the INS are then estimated based upon the error differential equations of the
INS. After the error correction and compensation, the outputs of the INS are realized
in the form of position, velocity, and attitude for the integrated navigation output.
According to Sect. 2.2.2 the KF model in the state equation of INS can be
established as,
δ ẋ = Fδx + Gω (2.17)
where δr = [δL, δλ, δh]T is the position error vector in the longitude, latitude,
and altitude directions; δv = [δvn , δve , δvd ]T is the velocity error vector in the
north, east, and down directions; δφ = [δωx , δωy , δωz ]T denotes the error vector for
42 Y. Zhou et al.
[ ]T
pitch, roll, and yaw angles; δω = δωx, δωy, δωz is the gyroscope error vector;
δf = [δfx , δfy , δfz ]T is the accelerometer error vector; ω denotes the Gaussian white
noise with unit variance.
In Eq. (2.17), G represents the noise distribution vector, including the variance of
the state vector,
and F in Eq. (2.17) is the dynamic coefficient matrix composed of the error models
of the INS position, velocity, attitude and inertial devices. Its specific form can be
derived based on Eqs. (2.14)–(2.16),
⎡ ⎤
03×3 Fr 03×3 03×3 03×3
⎢0 Rb ⎥
⎢ 3×3 03×3 Fv 03×3 ⎥
⎢ ⎥
F = ⎢ 03×3 Fφ 03×3 Rb 03×3 ⎥ (2.20)
⎢ ⎥
⎣ 03×3 03×3 03×3 Fω 03×3 ⎦
03×3 03×3 03×3 03×3 Ff
Therefore, the system equations for the loosely coupled INS/GPS integrated
navigation can be written as,
⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤
δṙ 03×3 Fr 03×3 03×3 03×3 δr σr
⎢ δ v̇ ⎥ ⎢ 0 Rb ⎥ ⎢ δv ⎥ ⎢ σ ⎥
⎢ ⎥ ⎢ 3×3 03×3 Fv 03×3 ⎥⎢ ⎥ ⎢ v ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ δ φ̇ ⎥ = ⎢ 03×3 Fφ 03×3 Rb 03×3 ⎥⎢ δφ ⎥ + ⎢ σφ ⎥ω (2.21)
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣ δ ω̇ ⎦ ⎣ 03×3 03×3 03×3 Fω 03×3 ⎦⎣ δω ⎦ ⎣ σω ⎦
δ f˙ 03×3 03×3 03×3 03×3 Ff δf σf
Similarly, the measurement equations for the INS/GPS integrated navigation can
be written as,
2 Deep Learning-Enabled Fusion to Bridge GPS Outages for INS/GPS … 43
where I represents the identity matrix, thus the complete measurement equation for
loosely coupled INS/GPS can be written as,
[ ] [ ] [ ]
rINS − rGPS I3×3 03×3 03×9 η
= δxk + r (2.25)
vINS − vGPS 03×3 I3×3 03×9 ηv
So far, the derivation of the system state equation and observation equation for
the loosely coupled INS/GPS integrated navigation can be obtained. When the GPS
signal is available, a KF is constructed based on the derived system state and obser-
vation equations, performing the KF time update and measurement update processes
iteratively to estimate the INS error compensation after the INS/GPS fusion. In turn,
the INS navigation results can be corrected to improve the accuracy of the INS.
From the previous analysis, it is evident that when the GPS signal is available, the
KF can effectively fuse the INS and GPS measurements, thereby correcting the INS
errors. However, in the scenarios where the GPS signal is interrupted, such as when a
mobile robot enters a tunnel or encounters dense tall buildings, the KF may be unable
to obtain new GPS measurements for an extended period, leading to the inability to
44 Y. Zhou et al.
complete the measurement update process and consequently failing to correct the INS
errors. This would result in a rapid degradation of the accuracy of the INS. Therefore,
considering the cost and size constraints of mobile robots, we propose a navigation
algorithm assisted by the deep learning. In the GPS signal denied environments, a
pre-trained deep learning model is utilized to assist in correcting INS errors, thereby
mitigating the INS error drift.
Here, the INS and GPS data are assumed as a kind of time series data so that a
deep learning model is designed based on the Gated Recurrent Unit (GRU) Neural
Network to assist in the INS navigation. Unlike the feedforward Neural Network,
the connections between RNN nodes can form a directed graph along the sequence,
allowing for more effective processing of the input time-series data, via their internal
states [10–12]. However, it could face the vanishing gradient problem, which makes
it unable to find appropriate gradients for long-term memory. To solve this issue,
Kyunghyun Cho et al. [13] have developed the GRU Neural Network, similar to (Long
Short-Term Memory) LSTM, with a forget gate but fewer parameters. Therefore, we
use the GRU neural network to predict the GPS position increments when GPS fails.
A GRU memory unit is illustrated in Fig. 2.6, where ht−1 is the hidden state of
the previous moment, ht is the current output of the hidden state, and xt is the input
data in the current moment. There are two gates in the GRU structure, i.e., the update
gate and the reset gate. The update gate is responsible for determining how much of
the previous hidden state is to be retained and which portion of the new proposed
hidden state (derived from the reset gate) is to be added on the final hidden state. The
reset gate is responsible to decide which portions of the previous hidden state are to
be combined with the current input to form a new hidden state.
The forward passing equations of the GRU can be written as,
where Wxr , Whr ,Wxz ,Whz , Wxh and Whh are the weight matrices between the input
layer, update gate, reset gate and hidden state;br ,bz and bh are the bias vectors of the
update gate, reset gate and hidden state respectively. σ and tanh are the activation
2 Deep Learning-Enabled Fusion to Bridge GPS Outages for INS/GPS … 45
functions to transform the hidden states to the nonlinear states. Θ represents the
Hadamard product operation.
The main idea of the INS/GPS integrated navigation framework assisted by the
deep learning models proposed in this Chapter is to utilize the deep learning to
build a model between the navigation information and INS outputs (velocity, atti-
tude, specific force and angular rate), while maintaining high navigation accuracy
when GPS fails. Navigation information generally includes position error or posi-
tion increments between the INS and GPS. The main motivation for using the deep
learning methods is that the INS/GPS system is a nonlinear and complex system,
making it difficult to establish an accurate mathematical model. The accuracy of
the traditional filtering-based methods largely depends on the quality of the mathe-
matical model. When some sensors fail, such as due to the GPS signal interruption,
filtering-based methods also become ineffective. Deep learning-based methods are
data-driven and do not require a precise mathematical model of the system, making
them suitable for handling nonlinear systems. Compared to the filters, deep learning
methods have stronger learning capabilities, while the filter parameters are fixed with
limited adaptability to different scenarios.
Currently, many ML-aided models have been proposed to describe the relationship
between the navigation information and the INS outputs, almost all of which can be
divided into 3 classes in terms of the outputs, i.e., OINS − δPINS , OINS − Xk and
OINS − ΔPGPS . The OINS − δPINS model can be designed to find the relationship
46 Y. Zhou et al.
between the INS information and the position error of GPS & INS. The OINS − Xk
model intends to establish the relationship between the output of INS and the state
vector Xk of KF. In the OINS − ΔPGPS model, the input is the INS and the output
is the position increments of GPS. While the first two models contain the INS and
GPS information, it would introduce additional mixed errors compared with the
OINS − ΔPGPS model. In the OINS − ΔPGPS model [14], the position increments of
the GPS can be denoted as,
¨
ΔPGPS = V̇n (t)dtdt (2.30)
¨
( )
= (Cbn fibb (t) − 2ωien (t) + ωen
n
(t) × Vn (t) + Gn )dtdt (2.31)
where Vn is the velocity of the vehicle in the navigation coordinate system, and Gn
is the gravity vector. Therefore, this chapter selects the (OINS − ΔPGPS ) model to
construct the deep learning-assisted INS/GPS integrated navigation structure.
In Eq. (2.30), ωien and Gn are affected by the longitude and latitude, while ωen n
is related to Vn . Since the motion range of the mobile robots in actual scenarios is
small, the variations in longitude and latitude are minimal. Therefore, Cbn , fib and Vn
are the main factors affecting ΔPGPS , where Cbn can be expressed as follows,
⎡ ⎤
cosθ cosψ −cosγ sinψ + sinγ sinθ cosψ sinγ sinψ + cosγ sinθ cosψ
⎣ cosθ sinψ cosγ cosψ + sinγ sinθ sinψ −sinγ cosψ + cosγ sinθ sinψ ⎦ (2.32)
−sinθ sinγ cosθ cosγ cosθ
where {θ ,γ ,ψ} represent pitch, roll and yaw angles. The attitude angles are mostly
obtained by integrating the angular velocity ωib b
measured by the gyroscope. In
summary, ΔPGPS is mainly determined by fib , ωib , θ , γ and ψ. Therefore, these vari-
b
ables are selected as inputs to the deep learning model, which learns to fit the math-
ematical relationship between the position increments and the INS/GPS integrated
navigation system when GPS signal is normal.
The trained deep learning model (INS/GPS Neural Network, GI-NN) is deployed
on the mobile robot initially. After the robot starts to move, the deep learning-enabled
INS/GPS integrated navigation can be divided into two modes. As shown in Fig. 2.7,
when GPS is available, the navigation system can operate in the online training mode.
The INS would provide the velocity VINS , position PINS , and attitude AINS , which are
fused with the position PGPS provided by the GPS using a KF. Simultaneously, the
estimated velocity error δV , position error δP and attitude error δA are fed back to
the inertial navigation system to reduce its position drift. Furthermore, the INS/GPS
data is stored in the onboard computer for training the GI-NN deep learning model.
Typically, the GI-NN deep learning model is primarily in a training state as GPS
interruptions only represent a small portion of the mobile robot motion duration.
During the long-term training process, IMU data is continuously sent to the GI-NN
deep learning model for training, ensuring that GI-NN is adequately trained with
2 Deep Learning-Enabled Fusion to Bridge GPS Outages for INS/GPS … 47
Fig. 2.7 The INS/GPS integrated navigation system based on GI-NN in the training mode
abundant data. Through this training process, the weights between the hidden layer
neurons can be adjusted to better map the input–output relationship.
Once the GPS signal becomes unavailable, the deep learning-enabled INS/GPS
integrated navigation system enters a prediction mode. During the GPS outage, the
KF cannot acquire new GPS observations, resulting in the inability to update KF
estimations. Consequently, the INS/GPS integrated navigation system can be effec-
tively transformed into an independent inertial navigation system, resulting in more
accumulated errors over time. The block diagram of the prediction mode is shown
in Fig. 2.8. The trained GI-NN deep learning model predicts the GPS position incre-
ments ΔPGPS . By summing all ΔPGPS increments according to Eq. (2.33), a virtual
GPS position can be obtained,
∑
k
PGPS (k) = PGPS0 + ΔP GPSi (2.33)
i=0
where PGPs0 is the initial position when GPS fails at the kth moment. This virtual
PGPS (k) is then substituted for the interrupted GPS position increments and fused
with INS by KF. The hybrid structure of the INS/GPS will maintain the navigation
continuously when GPS signals are lost.
Fig. 2.8 The INS/GPS integrated navigation system based on GI-NN in the prediction mode
48 Y. Zhou et al.
ΔPGPS
k+1
= GI − NN (S1 , S2 , . . . , Sk ) (2.34)
where W (l) and b(l) are the learned parameters, and rectified linear unit (RELU)
is adopted as the activation function. After stacking four convolutional layers, the
acquired feature map can contain the correlated features from different sensors data.
The GRU network is used to capture the temporal sequential dependency, which
is proposed to address the exploding and vanishing gradient issue of the traditional
Recurrent Neural Network (RNN), described as,
where ht is the output of the GRU layer at time step t. Finally, a dropout layer with a
probability of 0.25 is also applied to avoid the overfitting, and a linear layer is added
to transform the high dimensional data as the output data dimension to generate the
navigation solutions.
To fully test and validate the performance of the proposed algorithm, simula-
tion experiments, physical experiments, and experiments using public datasets are
conducted, and comparisons are implemented with different algorithms: (1) INS; (2)
Multi-layer Perceptron (MLP) model; (3) Long Short-Term Memory (LSTM) neural
network; (4) GI-NN model.
In the simulation tests, the NaveGo [15] toolbox, developed via MATLAB, is utilized
to generate the UAV flight trajectories. After generating the motion trajectories,
the related INS and GPS output data are obtained. The specific parameters of the
sensors during the simulation process are presented in Table 2.1. The deep learning
framework PyTorch 1.8 is employed, and the experimental setup consists of an Intel
Core i7-6700 processor running at 3.4 GHz with 16 GB RAM. Initially, a UAV flight
trajectory of 430 s is generated using NaveGo, including IMU, GPS and navigation
data, namely attitude angles, velocity and position. The simulated trajectory, as shown
in Fig. 2.10, includes UAV climbing, straight flight and turning, covering the basic
motion states.
To train the network model more effectively, 80% of the trajectory data from
0 to 250 s are used for the model training, with the remaining 20% reserved for
the performance evaluation and parameter tuning. To enhance the testing efficiency,
two typical scenarios, the straight-line motion phase (260–290 s) and the turning
phase (370–400 s), are selected for testing. Regarding RNN, the sequence length is
utilized to determine how much contextual information is sent to the model on each
occasion. Considering the high sampling rate of the IMU sensor, a length of 40 and
a time window of 200 ms are applied to the IMU sequence data, dividing it into
multiple non-overlapping blocks as model inputs.
50 Y. Zhou et al.
Fig. 2.10 The simulation trajectory. a The 3D simulation trajectory. b The position of the simulation
trajectory
To mitigate the risk of overfitting and expedite training, the Adam optimizer [16]
with a cosine annealing restart scheduler [17] is employed for training, with a learning
rate initialized to 0.0001, a batch size set to 64, and 200 training epochs executed.
Based on the GI-NN-assisted integrated navigation system, it initially operates
in the training mode, during which a 30-s GPS outage period is designed, occurring
between 260 and 290 s. As depicted in Fig. 2.11, the north and east position errors are
compared for the INS, MLP, LSTM, and GI-NN methods, represented by black, blue,
orange and red curves, respectively. For simplicity of the observation, a logarithmic
transformation is applied to the final results. Due to the lack of true GPS position
information, the position error gradually increases over time. In the case of INS,
MLP, LSTM, and GI-NN, the maximum east position errors are 166.8 m, 11.9 m,
7.3 m, and 5.4 m, respectively, while the maximum north position errors are 142.9 m,
7.4 m, 4.5 m and 3.4 m, respectively. Hence, the results indicate that the proposed
GI-NN algorithm outperforms INS, MLP, and LSTM, thereby validating that GI-NN
can more accurately predict the position increments.
2 Deep Learning-Enabled Fusion to Bridge GPS Outages for INS/GPS … 51
Fig. 2.11 The position errors among different algorithms during the period of 260–290 s in the
simulation test. a The east position error. b The north position error
To further validate the proposed method, actual UAV flight tests are conducted. A
quadcopter platform has been established, as depicted in Fig. 2.13a, for collecting
flight data containing various real-world noises. The autopilot of the quadcopter is
PIXHAWK, a high-performance autopilot suitable for various robot platforms such
as fixed-wing aircraft, multi-rotor aircraft, helicopters, cars and boats. The primary
IMU sensors in PIXHAWK are Invensense MPU 6000 and ST Micro LSM 303D,
with gyroscope bias of 5 ° /s and accelerometer bias of 60 mg [18]. To acquire posi-
tion reference data, two GPS receivers with different accuracies are simultaneously
52 Y. Zhou et al.
Fig. 2.12 The position errors among different algorithms during the period of 370–400 s in the
simulation test. a The east position error. b The north position error
installed on the quadcopter: the ublox-neo-m8n GPS receiver and the ublox-neo-6 m
GPS receiver. The positioning accuracy of the ublox-neo-m8n GPS receiver exceeds
that of the ublox-neo-6 m. Therefore, data from the ublox-neo-m8n GPS receiver
are used as the position references. Further, by employing Kalman Filtering, data
from the ublox-neo-6 m GPS receiver are fused with INS data to obtain the position
estimates.
As shown in Fig. 2.13b, the UAV flight trajectory is delineated by a red line, while
the interruptions in the GPS signal, each lasting 35 s, are indicated by the yellow lines.
To ensure the reliability of the GPS signal, the experiment is conducted in an open
playground. Throughout the entire testing period, a minimum of 7 or more satellite
signals are consistently available. Additionally, intentional GPS signal interruptions
are induced by human-made GPS shielding.
Fig. 2.13 The quadcopter equipped with PIXHAWK autopilot and flight trajectory. a The UAV.
b The UAV flight trajectory
2 Deep Learning-Enabled Fusion to Bridge GPS Outages for INS/GPS … 53
Figure 2.14 displays the northward and eastward position errors generated by
different algorithms during the first GPS outage, while Table 2.2 summarizes the
maximum errors in the eastward and northward positions during GPS outages. It is
obvious that when a GPS outage occurs, the KF lacks the GPS update data, causing
the INS to degrade into the inertial navigation system. Consequently, the naviga-
tion errors rapidly accumulate over time. However, when MLP, LSTM, or GI-NN
is utilized for the estimation, navigation errors are somewhat suppressed. During
the first GPS outage, the maximum eastward position errors for only INS, MLP,
LSTM, and GI-NN are 107.8 m, 15.2 m, 11.8 m, and 2.4 m, respectively, while the
maximum northward position errors are 36.1 m, 35.5 m, 12.9 m, and 10.6 m, respec-
tively. Comparative analysis reveals that the maximum error of the GI-NN-assisted
navigation method is smaller than that of MLP, LSTM, and only INS, demonstrating
the effectiveness of the GI-NN-assisted integrated navigation method in improving
the navigation accuracy during the GPS outages and its superior ability to suppress the
inertial navigation errors compared to the traditional feedforward neural networks.
To further analyze the role of the GI-NN-assisted integrated navigation system
during large-scale maneuvers of UAVs, the moment of the second GPS outage is
chosen to coincide with the turning and direction change phase of the UAV. As
depicted in Fig. 2.15, the eastward and northward position errors of INS, MLP,
LSTM and GI-NN methods are compared and the results are already summarized
Fig. 2.14 The position errors among different algorithms during the period of 365–400 s in the
real field test. a The east position error. b The north position error
Table 2.2 The max position error among different algorithms in the real field test
Max position error(m) INS MLP LSTM GI-NN
East North East North East North East North
365–400 s 107.8 36.1 15.2 35.5 11.8 12.9 2.4 10.6
490–525 s 93.4 235.9 62.8 63.1 49.6 51.4 39.7 40.5
54 Y. Zhou et al.
Fig. 2.15 The position errors among different algorithms during the period of 490–525 s in the
real field test. a The east position error. b The north position error
in Table 2.2. During the second GPS outage period, compared to the single INS
navigation mode without assistance, GI-NN reduces the positioning errors in the
eastward and northward directions by 57.5% and 82.8%, respectively. Through the
analysis of the actual UAV flight experiments, it is observed that the GI-NN-assisted
integrated navigation method exhibits better position error suppression capability
and higher navigation accuracy compared to the traditional neural networks.
The dataset experiment employs the publicly available INS/GPS integrated naviga-
tion dataset from the NaveGo toolbox [19], which was collected by Gonzalez and
Dabove [20] through driving a vehicle equipped with Ekinox-D IMU and GNSS on
the streets of Turin. Since the testing vehicle entered an underground parking lot, the
dataset includes two 30-s GPS outage scenarios, used to evaluate the performance
of the proposed GI-NN in assisting the INS. The trajectory of the NaveGo dataset is
illustrated in Fig. 2.16, with the positions of the two GPS signal outages marked by
red rectangles.
The results of the first GPS signal outage test are shown in Fig. 2.17 and Table 2.3:
(1) The single INS mode exhibits the maximum position error. For instance, the
maximum errors in the latitude and longitude positions reach 30.2 m and 19.1 m,
respectively; (2) MLP outperforms the first method, with the maximum errors in
the latitude and longitude reduced to 7.5 m and 9.1 m, respectively; (3) LSTM can
achieve further reduction in the maximum errors of the latitude and longitude to
2.8 m and 3.7 m, respectively, compared to those of the MLP; (4) Among the four
test results, the GI-NN method performs the best. Compared to the only inertial
2 Deep Learning-Enabled Fusion to Bridge GPS Outages for INS/GPS … 55
mode, the maximum errors in the latitude and longitude are reduced by 92.7% and
90.1%, respectively.
To compare the performance of the Conv-LSTM neural network model and GI-
NN, experiments are also conducted during the first GPS outage. The maximum
errors in the latitude and longitude for the Conv-LSTM neural network model are
4.1 m and 4.5 m, respectively. From Fig. 2.18, it can be observed that GI-NN achieves
higher accuracy than that of the Conv-LSTM.
From Fig. 2.19 and Table 2.3, it can be observed that although the four models
exhibit different performance during the second GPS outage, they show similar
Fig. 2.17 The position errors among different algorithms during the first outage in the dataset test.
a The latitude position error. b The longitude position error
Table 2.3 The max position error (m) among different algorithms in the dataset test
Max position error(m) INS MLP LSTM GI-NN
Lat Lon Lat Lon Lat Lon Lat Lon
Outage 1 30.2 19.1 7.5 9.1 2.8 3.7 2.2 1.9
Outage 2 39.1 13.8 6.9 6.8 5.9 6.9 0.6 0.9
56 Y. Zhou et al.
Fig. 2.18 The position errors between Conv-LSTM and GI-NN algorithms during the first outage
in the dataset test. a The latitude position error. b The longitude position error
results to the first GPS outage: the single INS model has the largest errors, with
a maximum latitude error of 39.1 m and a maximum longitude error of 13.8 m. Both
LSTM and MLP outperform the only inertial model, with LSTM having maximum
latitude and longitude errors of 5.9 m and 6.9 m, respectively, and MLP having lati-
tude and longitude errors of 6.9 m and 6.8 m, respectively. Among them, GI-NN
performs the best, with maximum errors in the latitude and longitude of 0.6 m and
0.9 m, respectively.
Fig. 2.19 The position errors among different algorithms during the second outage in the dataset
test. a The latitude position error. b The longitude position error
2 Deep Learning-Enabled Fusion to Bridge GPS Outages for INS/GPS … 57
2.6 Conclusion
This chapter proposes a novel INS/GPS integrated navigation system based on deep
learning assistance to reduce the navigation error accumulation during GPS signal
interruptions. The main advantages of this method are not only the extraction of
feature representations of navigation sensors from measurement noise but also the
automatic association of current inputs with historical model information. To validate
the performance of the proposed method, numerical simulation, field experiments
and public data experiments are conducted. The experimental results demonstrate
that during a 35-s GPS outage, the navigation accuracy of the GI-NN algorithm
is improved by 58% compared to the single INS algorithm, by 21% compared to
the LSTM algorithm, and by 37% compared to the MLP algorithm. Therefore, the
designed deep learning network, GI-NN, can estimate the intrinsic nonlinear rela-
tionship between INS outputs and GPS position increments, so as to provide accurate
navigation information during GPS interruptions.
References
1. Abdel-Hafez MF, Saadeddin K, Jarrah M (2015) Constrained low-cost GPS/INS filter with
encoder bias estimation for ground vehicles’ applications. Mech Syst Signal Process 58:285–
297
2. Noureldin A, Karamat TB, Eberts MD, El-Shafie A (2009) Performance enhancement of
MEMS-based INS/GPS integration for low-cost navigation applications. IEEE Trans Veh
Technol 58(3):1077–1096
3. Sebesta KD, Boizot N (2014) A real-time adaptive high-gain EKF, applied to a quadcopter
inertial navigation system. IEEE Trans Industr Electron 61(1):495–503
4. Wang J et al (2008) Integration of GPS/INS/vision sensors to navigate unmanned aerial vehicles.
Int Arch Photogram Remote Sens Spatial Inf Sci Conf 37:963–970
5. Kailath T (1968) An innovations approach to least-squares estimation–Part I: Linear filtering
in additive white noise. IEEE Trans Autom Control 13(6):646–655
6. Angus JE (1992) Forecasting, structural time series and the Kalman filter. Technometrics
34(4):496–497
7. Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng
Trans ASME 82(1):35–45
8. Yan G, Wang J (2019) Strapdown inertial navigation algorithm and integrated navigation
principle. Northwester Polytechnical University Press, Xi’an, China, in Chinese
9. Li J, Song N, Yang G, Li M, Cai Q (2017) Improving positioning accuracy of vehicular
navigation system during GPS outages utilizing ensemble learning algorithm. Inf Fusion 35:1–
10
10. Wagstaff B, Kelly J (2018) LSTM-based zero-velocity detection for robust inertial navigation.
In: 2018 9th international conference on indoor positioning indoor navigation (IPIN). https://
doi.org/10.1109/IPIN.2018.8533770
11. Nakashika T, Takiguchi T, Ariki Y (2014) Voice conversion using RNN Pre-Trained by recur-
rent temporal restricted Boltzmann machines. IEEE/ACM Trans Audio Speech Lang Proc
23(3):580–587
12. Li J et al (2016) Visualizing and understanding neural models in NLP. In: The North American
chapter of the association for computational linguistics, 681–691
58 Y. Zhou et al.
13. Chung J, Gulcehre C, Cho K, Bengio Y (2015) Gated feedback recurrent neural networks. Int
Conf Mach Learn 37:2067–2075
14. Yao Y (2017) Xu X (2017) A RLS-SVM aided fusion methodology for INS during GPS
outages. Sensors 17(3):432
15. Gonzalez R, Giribet JI, Patino HD (2015) NaveGo: a simulation framework for low-cost
integrated navigation systems. Control Eng Appl Inf 17(2):110–120
16. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: 3rd international
conference on learning representations (ICLR), pp 1–15
17. Loshchilov I, Hutter F (2017) SGDR: stochastic gradient descent with warm restarts. In:
International conference on learning representations (ICLR 2017), pp 1–16
18. MRO Pixhawk Flight Controller (Pixhawk 1). Accessed: 2017. https://docs.px4.io/master/en/
flightcontroller/mro pixhawk.html
19. NaveGo. Accessed: 2018. https://github.com/rodralez/NaveGo
20. Gonzalez R (2019) Dabove P (2019) Performance assessment of an ultra low cost inertial
measurement unit for ground vehicle navigation. Sensors 19(18):3865
Chapter 3
Integrity Monitoring for GNSS Precision
Positioning
3.1 Introduction
With the improvement of GNSS infrastructure and the growing demand for precise
applications, high-precision positioning technologies such as Precise Point Posi-
tioning (PPP) [1], Real-Time Kinematic (RTK) [2, 3], and PPP-RTK have deeply
penetrated into people’s daily lives [4, 5]. However, users often have complex require-
ments when using these technologies, not solely focusing on high precision but also
pursuing high reliability. Autonomous driving is a typical application scenario that
places additional emphases on reliability and safety [6–9].
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 59
K. Yu (ed.), Positioning and Navigation Using Machine Learning Methods, Navigation:
Science and Technology 14, https://doi.org/10.1007/978-981-97-6199-9_3
60 L. Yang et al.
Integrity is a performance indicator that reflects the safety and reliability of posi-
tioning. Conducting integrity monitoring can effectively ensure the credibility of
positioning, and therefore also become a necessitate to provide reliable services for
high-precision positioning. Theoretical research on integrity monitoring algorithms
for high-precision positioning is currently timely. The GNSS industry has already
established a mature set of software and hardware processes for achieving high-
precision positioning algorithms. The current challenge lies not only in providing
high-precision positioning services but also in offering trustworthy high-precision
positioning services. Integrity monitoring is currently a focal point of the industry,
and it already has some technological foundations.
In this chapter, Sect. 3.1 analyzes the demand for integrity monitoring in high-
precision positioning. Section 3.2 provides a brief overview about the developments
of integrity monitoring for aviation, and presents the key issues of integrity moni-
toring for high-precision GNSS positioning. Section 3.3 comprehensively discussed
the challenges on the FDE and PL calculation procedures for high-precision posi-
tioning. Section 3.4 presents two preliminary studies that aim to extend the integrity
monitoring algorithms to a generalized GNSS positioning procedure. Section 3.5
concludes this chapter.
RTK uses double-differencing (DD) carrier phase observations and combines ambi-
guity resolution to achieve instantaneous centimeter-level positioning accuracy [16,
17]. It is currently the most widely used high-precision positioning method. Double-
differencing observations can eliminate the majority of errors, including satellite
clock errors, atmospheric propagation errors, and receiver clock errors. However,
due to the impact of multipath effects and carrier phase biases, the reliability of
ambiguity resolution may be reduced, ultimately affecting positioning accuracy [18].
Therefore, the key factors for the integrity monitoring of RTK lie in the ambiguity
resolution and biases [19]. For ambiguity resolution, the Carrier RAIM (CRAIM)
is used to constrain the resolution of the L1 carrier phase ambiguity [20], and Solu-
tion Separation (SS)-based integrity monitoring method is employed to monitor the
faulty ambiguity fixes [21]. An adaptive Kalman filtering algorithm with ambiguity
success rate as a dynamic adjustment factor has also been proposed to enhance the
efficiency of ambiguity resolution [22], and an integrity monitoring based ratio test
is also used to replace the traditional ratio tests for ambiguity validation [23]. For
biases, both cycle slips [24] and colored noise [25] in the observations need to be
considered, and enhanced Fault Detection and Exclusion (FDE) algorithms should
be used. As for the remaining atmospheric residuals and the multipath effect, the
new weighting models can be employed to mitigate these influences [26].
PPP can achieve centimeter-level positioning accuracy using only a single receiver
with the use of high-precision external augmentation products, most notably precise
orbit and clock products, and it takes around half an hour to converge to centimeter-
level accuracy. To achieve rapid convergence, PPP must be combined with Uncal-
ibrated Phase Delays (UPD) [27] products and ambiguity resolution for ambiguity
fixing, a technique known as PPP-AR [28]. PPP-AR, combined with high-precision
atmospheric augmentation products calculated by Continuously Operating Reference
62 L. Yang et al.
Station (CORS) network allows the realization of PPP-RTK [29]. Integrity moni-
toring for PPP/PPP-RTK should focus on three main aspects: server-end integrity
monitoring, user-end integrity monitoring, and high-precision augmentation prod-
ucts integrity monitoring. Currently, the majority of research is focused on user-
side integrity monitoring. This includes integrity monitoring algorithms based on
Multi-Hypothesis Solution Separation (MHSS) in ARAIM [30–32], integrity moni-
toring algorithms that extend MHSS to Kalman Filters (KF) [33], as well as integrity
monitoring algorithms based on MHSS that consider ambiguity fixing errors [34].
FDE algorithms for PPP/PPP-RTK have also been studied [35, 36]. To enhance the
performance of Positioning, Navigation, and Timing (PNT) services in GNSS-denied
environments, multi-sensor fusion technology has been proposed. GNSS/INS, as the
most widely used integrated navigation system, can be divided into two types of
integrity monitoring algorithms for its tightly coupled navigation system: innovation
based or residual based methods [37] and MHSS based methods [38, 39]. There
is also research that applies RAIM as integrity monitoring method for PPP/LiDAR
loosely coupled SLAM [40].
Integrity monitoring in the civil aviation is based on Single Point Positioning (SPP)
using pseudorange measurements. Therefore, typical FDE algorithms such as RAIM
and ARAIM generally focus on gross errors detection in pseudorange observations,
often manifested as pseudorange outliers. Generally, only pseudorange observations
from GPS or a combination of GPS/Galileo dual systems are used, and the positioning
results are obtained using a snapshot weighted Least Squares (LS) algorithm [14, 42].
On the contrary, high-precision positioning algorithms such as RTK/PPP/PPP-
RTK not only use pseudorange observations but also use carrier phase observations.
Different types of data introduce different outliers, particularly manifested as cycle
slips and faulty-fixed ambiguities on carrier phase observations [43, 44]. The intro-
duction of new observation types renders LS algorithms inadequate, and currently,
high-precision positioning relies on KF algorithms. Additionally, to ensure the stable
3 Integrity Monitoring for GNSS Precision Positioning 63
influence of various risk sources, the observation errors are actually non-Gaussian.
As a result, the positioning errors do not rigorously obey the Gaussian distribution.
Theoretically, determining the percentiles of non-Gaussian random variables is the
core challenges for PL calculation. If the probability density function of the posi-
tioning errors can be determined, the percentile of an arbitrary probability can be
obtained by integral. The corresponding PL can then be obtained by inverse calcu-
lation using numerical methods such as the binary searching. Although the exact
value of PL can be obtained via the Monte Carlo simulation, this is not applicable
for real-time situation. In ICAO regulations and algorithms, the PL is particularly
defined with accordance to the Probability of Hazardous Misleading Information
(PHMI) and Time-to-Alert (TTA), which requires for integral calculation in multiple
intervals within a certain time window, and thus further increasing the computational
complexity.
Calculating PL is the main task of integrity monitoring. Due to the difficulty
of integral calculation upon a non-Gaussian probability density function, the over-
bounding technologies is generally used in current integrity monitoring algorithms
for aviation applications. With the FDE procedure, significant outlying observations
are firstly excluded. Then, by overbounding each error term of the reserved observa-
tions, distribution of the final positioning error can also be overbounded by a Gaussian
distribution. As a valid parameter, the PL value should balance between being overly
conservative and too permissive. The primary requirement for PL is that it must
successfully envelop the positioning error. This ensures that the system integrity is
not compromised by excessive errors in positioning. However, an overly conservative
PL value, on the other hand, would lead to a decrease in system availability.
For those high-precise positioning modes, PL calculation should be designed
according to the specific positioning and FDE algorithms. Two common issues should
always be considered: (1) the stochastic model that describes the observation error
characteristics should be capable to overbound the actual observation errors; (2)
impacts of the undetected faults, mainly including pseudorange outliers, wrongly
fixed ambiguities, carrier phase cycle slips, should be considered.
Firstly, compared with the functional model, the impacts of stochastic model on
positioning are usually considered to be less significant, since an inaccurate stochastic
model would only lead to an more dispersive estimation, but still unbiased. However,
for integrity monitoring, the impacts of stochastic model play the same role as the
function model. An inaccurate stochastic model would not only destroy the validity
of hypothesis testing in the FDE procedure, but also destroy the validity of PL for
the final system availability determination. The stochastic model for high-precise
positioning algorithms should consider their discrepancies on dealing with each
error term, i.e., the impacts of residual tropospheric and ionospheric delays should
be modelled as the code and phase observation error terms, and compensated for in the
stochastic model, if these residuals have not been parameterized in the RTK or PPP/
PPP-AR/PPP-RTK functional model. Otherwise, if these residual effects have been
parametrized in the function model, prior values with accurate variance information
would be introduced as constraints to mitigate overparametrization effects. In both
3 Integrity Monitoring for GNSS Precision Positioning 65
cases, conservative but tight variances of each factor are the core for producing valid
PLs in the positioning domain.
Secondly, even after some cautious FDE procedures, undetected outlying obser-
vations may still be reserved. Because the FDE procedure always relies on hypothesis
testing, by which there are always some probabilities of committing missed detec-
tion and wrong identification under a certain fault mode. For high-precision GNSS
positioning, carrier phase observations play a dominant role, and meanwhile it also
introduces some specific undetected faults. Both the wrongly fixed ambiguities and
undetected cycle slips would behave as integer biases on the carrier phase obser-
vations. Although this kind of effects can be mostly eliminated via conducting an
overly strict FDE procedure which abandons all those doubtful observations, the
system availability would adversely reduce. Generally, the corresponding impacts
on the positioning biases should be modelled and considered during the PL calcu-
lation. This leads to an intrinsic combination between testing and estimation, based
on a mixed-integer model.
In this section, two preliminary studies that aim to extend the integrity monitoring
algorithms to a generalized GNSS positioning procedure have been summarized.
First, a DIA-MAP method that can be used for FDE purpose is presented. Second,
an overbounding method that stochastically characterizes the residual tropospheric
delays is presented.
For FDE purpose, while various test statistics have been developed to address
scenarios involving multiple outliers [56–58], directly comparing individual test
statistics becomes impractical when the suspected outlier count varies outlier number
[17]. Consequently, the Detection and Identification Adaptation (DIA) for multiple
outliers, based solely on residuals, is hindered by masking and swamping effects.
In cases where the maximum suspected outlier count is high, statistical tests tend
to falsely declare more outliers than actually present [59]. Moreover, the presence
of multiple outliers can easily mask each other, complicating the detection and
identification process [60].
To solve this problem, the DIA based on maximum a posteriori estimate (DIA-
MAP) is proposed [61]. By leveraging the prior distribution of gross errors, DIA-
MAP selects the hypothesis with the maximum posterior probability for outlier detec-
tion and identification. With the priors of gross error, DIA-MAP provides a unified
66 L. Yang et al.
DIA process for both single and multiple outliers. Also, the prior can be flexibly
adjusted rather than fixed to be uniform, so that the DIA method can be adapted to
different application cases.
In the detection step, the validity of the null hypothesis is checked by a global
test, without parameterizing particular alternative hypotheses. Generally, the overall
test statistic is formed as
T0 = yT ∑ −1 ∑ ee ∑ −1 y
Δ Δ
(3.1)
accept H0 if T0 ≤ kα (3.2)
with
kα = χα2 (m − n) (3.3)
Here χα2 (m − n) denotes the corresponding critical value of the Chi-square distri-
bution that is dependent on the confidence level α and the degree of freedom
m − n.
In the identification step, following the MAP principle, if the observation vector
y is contaminated by outliers, then the optimal estimates of the hypothesis Hi , the
Δ
Δ
gross error ∇ i , and the unknown parameter xi are the ones that maximize the posterior
probability distribution:
Δ
Δ ( )
Hi , ∇ i , xi = argmax p Hj , ∇j , x|y , j ∈ {1, . . . , N } (3.4)
Hj ,∇j ,x
Here Posti represents the magnitude relationship of the posterior probability of the
hypotheses, which is given by:
1
Posti = qi lnε + (m − qi ) ln(1 − ε) − yT ∑ −1 ∑ ee ∑ −1 y Δ Δ
2
1 ( )−1
+ yT ∑ −1 ∑ ee ∑ −1 C i C Ti ∑ −1 ∑ ee ∑ −1 C i
Δ Δ Δ Δ
C Ti ∑ −1 ∑ ee ∑ −1 y
Δ Δ
(3.6)
2
where qi is the outlier number in Hi and ε is the preset outlier rate. The estimate of
gross errors and unknown parameters are respectively described as:
3 Integrity Monitoring for GNSS Precision Positioning 67
Δ
( )−1
∇ i = C Ti ∑ −1 ∑ ee ∑ −1 C i
Δ Δ
C Ti ∑ −1 ∑ ee ∑ −1 y
Δ Δ
(3.7)
Δ
( )−1
∇ i = C Ti ∑ −1 ∑ ee ∑ −1 C i
Δ Δ
C Ti ∑ −1 ∑ ee ∑ −1 y
Δ Δ
(3.7)
and
( )−1 ( ) Δ
xi = AT ∑ −1 A AT ∑ −1 y − C i ∇ i
Δ
(3.8)
where A is the design matrix of unknown parameters, C i is the design matrix of gross
error size under Hi . The flow chart of DIA-MAP is illustrated in Fig. 3.1.
Figure 3.2 shows the positioning RMSEs of the five methods (LS, DIA-
datasnooping, iterative DIA-datasnooping, DIA-p-value, and DIA-MAP) in one-day
data of BeiDou Navigation Satellite System (BDS) SPP examples. Table 3.1 summa-
rizes the average RMSEs of all epochs for each method. One can find that, due to
lack of robustness, the RMSE of LS always shows an increasing tendency when
the gross error enlarges, regardless of the value of ε. As for the DIA-datasnooping,
iterative DIA-datasnooping, and DIA-p-value, the RMSEs remain in a stable trend
with the growth of gross error size when ε = 10−3 . It means that when the gross
error rate is relatively small, the conventionally used methods all show satisfactory
performance on robustness. However, when ε is relatively large these methods cannot
always bound the estimation errors. But for DIA-MAP, no matter how many outliers
occur, the RMSE of DIA-MAP can always decline to a lower bound as the gross
error enlarges.
68 L. Yang et al.
Tropospheric delay due to the neutral atmosphere is a major error source for GNSS
positioning. Although Various Zenith Total Delay (ZTD) models have been proposed
and implemented, residual tropospheric delay can still lead to meter-level biases in
positioning [62]. The Minimum Operational Performance Standards by Radio Tech-
nical Committee for Aeronautics (RTCA MOPS) uses a constant standard deviation
(STD) of 0.12 m to characterize the residual ZTDs [14], equivalent to an upper bound
of 0.64 m for residual ZTDs. However, this value is too conservative, and does not
consider the spatiotemporal-varying characteristic of residual ZTDs [63]. Therefore,
a global and spatiotemporal-varying overbounding method was proposed to evaluate
the performance of a ZTD model and apply it for GNSS positioning and integrity
monitoring [64]. The proposed method is applied to three conventionally used ZTD
models, providing a tighter but still conservative overbounding model of residual
ZTDs.
The flowchart of stochastic modelling for the residual tropospheric delays is shown
in Fig. 3.3. First, to analyze the residual ZTDs, the global grid-wise VMF3 products
Table 3.1 Average positioning RMSEs of five methods under different rates and sizes of gross errors (Unit: m)
Methods ε = 10−3 ε = 10−2 ε = 10−1
∇i /σi = 10 ∇i /σi = 30 ∇i /σi = 50 ∇i /σi = 10 ∇i /σi = 30 ∇i /σi = 50 ∇i /σi = 10 ∇i /σi = 30 ∇i /σi = 50
LS 3.44 4.52 6.14 4.65 10.40 16.77 11.09 31.96 53.09
DIA-datasnooping 3.30 3.30 3.32 3.55 4.60 6.26 9.98 28.12 46.66
Iterative DIA-datasnooping 3.30 3.29 3.29 3.46 3.64 4.03 8.74 20.71 33.36
3 Integrity Monitoring for GNSS Precision Positioning
DIA-p-value 3.34 3.33 3.33 3.65 3.41 5.47 7.09 15.59 42.79
DIA-MAP 3.30 3.29 3.29 3.43 3.34 3.33 6.68 9.17 13.39
69
70 L. Yang et al.
were selected as the reference, and the residual ZTDs were obtained by subtracting
the ZTDs calculated by a model from the corresponding reference value at each grid
point. Then the residual ZTDs are adaptively banded using the hierarchical clustering
method, which uses two criteria: samples with similar seasonal variations should be
clustered together, and samples at adjacent latitudes should be clustered together.
After the adaptive banding, the residuals at the same latitude band are stacked
along the date axis, and the two-step gaussian overbounding method [65] was used to
calculate the overbounding STD and bias at each DOY. Then the overbounding biases
were averaged, and the overbounding STDs were fitted using a periodic function.
The overbounding models for residual ZHDs and ZWDs are established separately,
and the overbounding model for the total residual ZTDs is established by combining
the two. The upper bound Δmax (DOY ) of the residual ZTDs can be calculated by:
where σ (DOY ) is the fitted function of the overbounding STD, b is the averaged
overbounding bias, and KIR represents the right tail quantile of the PDF of a stan-
dard normal distribution at the extreme probability level corresponding to the preset
integrity risk (IR).
As an illustration, Fig. 3.4 shows the overbounding results for the UNB3 model,
which show global and annual characteristics, with overbounding biases and daily
STD values varying at different latitude bands. Seasonal variations are observed in
3 Integrity Monitoring for GNSS Precision Positioning 71
low and middle latitudes and are asymmetrical in the northern and southern hemi-
spheres. The overbounding model varies across seasons and latitudes, making it
neither overly conservative nor overly optimistic.
Figure 3.5 shows the residual ZTDs overbounding results for UNB3 model using
the external IGS ZTD products, which shows that the calculated upper bound could
successfully overbound the residual ZTDs, and is tighter than 0.64 m recommended
by RTCA in most cases. Therefore, adopting spatiotemporal-varying STDs can
improve positioning and integrity monitoring performance.
The demands of integrity monitoring for GNSS positioning have been steadily
growing, as the developments of intelligent transport systems (ITS), and other
safety-of-life (SoL) applications. Initially in civil aviation, three augmentation
72 L. Yang et al.
systems, namely GBAS, SBAS, and ABAS, mainly utilizing pseudorange or phase-
smoothing-pseudorange observations, have been well developed, to provide GNSS
integrity monitoring services that comply with the standards set by the ICAO. While
for those high-precision positioning technologies such as PPP, RTK and PPP-RTK,
significant challenges still persist in achieving integrity monitoring service.
Issues revolve around two main aspects: FDE and PL calculation. In general,
the FDE for high-precision positioning should be applicable to multi-frequency and
multi-system observations under a KF procedure, as well as should fulfill cycle slip
detection and ambiguity resolution validation. Also, the PL should be calculated
based on a stochastical model that can overbound the actual code and phase errors,
as well as by considering impacts of the undetected pseudorange outliers, carrier
phase cycle slips, and wrongly fixed ambiguities.
To generalize the FDE procedure, the authors have proposed a DIA-MAP method
which can provides a unified DIA process for multiple outliers by utilizing the priors
of gross errors. To generalize one term of the stochastical model for PL calcula-
tion, a global and spatiotemporal-varying overbounding method has been proposed
to envelop the residual tropospheric delays for different ZTD models. These inves-
tigations are expected to be further extended upon those high-precision positioning
technologies.
References
1. Zumberge JF, Heflin MB, Jefferson DC, Watkins MM, Webb FH (1997) Precise point posi-
tioning for the efficient and robust analysis of GPS data from large networks. J Geophys Res
Solid Earth 102:5005–5017. https://doi.org/10.1029/96JB03860
2. Feng Y (2008) GNSS three carrier ambiguity resolution using ionosphere-reduced virtual
signals. J Geod 82:847–862. https://doi.org/10.1007/s00190-008-0209-x
3. Li B, Shen Y, Feng Y, Gao W, Yang L (2014) GNSS ambiguity resolution with controllable
failure rate for long baseline network RTK. J Geod 88:99–112. https://doi.org/10.1007/s00190-
013-0670-z
4. Teunissen PJG, Khodabandeh A (2015) Review and principles of PPP-RTK methods. J Geod
89:217–240. https://doi.org/10.1007/s00190-014-0771-3
5. Zhang B, Hou P, Odolinski R (2022) PPP-RTK: from common-view to all-in-view GNSS
networks. J Geod 96:102. https://doi.org/10.1007/s00190-022-01693-y
6. Du Y, Wang J, Rizos C, El-Mowafy A (2021) Vulnerabilities and integrity of precise point
positioning for intelligent transport systems: overview and analysis. Satell Navig 2:3. https://
doi.org/10.1186/s43020-020-00034-8
7. Hassan T, El-Mowafy A, Wang K (2021) A review of system integration and current integrity
monitoring methods for positioning in intelligent transport systems. IET Intell Transp Syst
15:43–60. https://doi.org/10.1049/itr2.12003
8. Yang L, Sun N, Rizos C, Jiang Y (2022) ARAIM stochastic model refinements for GNSS
positioning applications in support of critical vehicle applications. Sensors 22:9797. https://
doi.org/10.3390/s22249797
9. Zhu N, Marais J, Bétaille D, Berbineau M (2018) GNSS position integrity in urban environ-
ments: a review of literature. IEEE Trans Intell Transp Syst 19:2762–2778. https://doi.org/10.
1109/TITS.2017.2766768
10. ICAO (2018) ICAO Standards and recommended practices (SARPs), Annex 10 vol I, Radio
navigation aids
3 Integrity Monitoring for GNSS Precision Positioning 73
11. RTCA (2017) DO-253D, minimum operational performance standards for GPS local area
augmentation system airborne equipment
12. McGraw GA, Murphy T, Brenner M, Pullen S, Dierendonck AJV (2000) Development of the
LAAS accuracy models, pp 1212–1223
13. Roturier B, Chatre E, Ventura-Traveset J (2001) The SBAS integrity concept standardised by
ICAO. Application to EGNOS
14. Blanch J, Walker T, Enge P, Lee Y, Pervan B, Rippl M, Spletter A, Kropp V (2015) Baseline
advanced RAIM user algorithm and possible improvements. IEEE Trans Aerosp Electron Syst
51:713–732. https://doi.org/10.1109/TAES.2014.130739
15. Hewitson S, Wang J (2006) GNSS receiver autonomous integrity monitoring (RAIM)
performance analysis. GPS Solut 10:155–170. https://doi.org/10.1007/s10291-005-0016-2
16. Parkins A (2011) Increasing GNSS RTK availability with a new single-epoch batch partial
ambiguity resolution algorithm. GPS Solut 15:391–402. https://doi.org/10.1007/s10291-010-
0198-0
17. Teunissen PJ, Montenbruck O (2017) Springer handbook of global navigation satellite systems.
Springer Cham
18. Li L, Shi H, Jia C, Cheng J, Li H, Zhao L (2018) Position-domain integrity risk-based ambiguity
validation for the integer bootstrap estimator. GPS Solut 22:39. https://doi.org/10.1007/s10291-
018-0703-4
19. Wang K, El-Mowafy A (2021) Effect of biases in integrity monitoring for RTK positioning.
Adv Space Res 67:4025–4042. https://doi.org/10.1016/j.asr.2021.02.032
20. Feng S, Ochieng W, Moore T, Hill C, Hide C (2009) Carrier phase-based integrity monitoring for
high-accuracy positioning. GPS Solut 13:13–22. https://doi.org/10.1007/s10291-008-0093-0
21. Gao Y, Jiang Y, Gao Y, Huang G, Yue Z (2023) Solution separation-based integrity monitoring
for RTK positioning with faulty ambiguity detection and protection level. GPS Solut 27:140.
https://doi.org/10.1007/s10291-023-01472-y
22. Wang Z, Hou X, Dan Z, Fang K (2022) Adaptive Kalman filter based on integer ambiguity
validation in moving base RTK. GPS Solut 27:34. https://doi.org/10.1007/s10291-022-01367-4
23. Li L, Li Z, Yuan H, Wang L, Hou Y (2016) Integrity monitoring-based ratio test for GNSS
integer ambiguity validation. GPS Solut 20:573–585. https://doi.org/10.1007/s10291-015-
0468-y
24. Kim D, Song J, Yu S, Kee C, Heo M (2018) A new algorithm for high-integrity detection and
compensation of dual-frequency cycle slip under severe ionospheric storm conditions. Sensors
18:3654. https://doi.org/10.3390/s18113654
25. Gao Y, Gao Y, Liu B, Jiang Y (2021) Enhanced fault detection and exclusion based on Kalman
filter with colored measurement noise and application to RTK. GPS Solut 25:82. https://doi.
org/10.1007/s10291-021-01119-w
26. Wang K, El-Mowafy A, Rizos C, Wang J (2020) Integrity monitoring for horizontal RTK
positioning: new weighting model and overbounding CDF in open-sky and suburban scenarios.
Remote Sens 12:1173. https://doi.org/10.3390/rs12071173
27. Geng J, Chen X, Pan Y, Mao S, Li C, Zhou J, Zhang K (2019) PRIDE PPP-AR: an open-source
software for GPS PPP ambiguity resolution. GPS Solut 23:91. https://doi.org/10.1007/s10291-
019-0888-1
28. Li X, Li X, Yuan Y, Zhang K, Zhang X, Wickert J (2018) Multi-GNSS phase delay estimation
and PPP ambiguity resolution: GPS, BDS, GLONASS, Galileo. J Geod 92:579–608. https://
doi.org/10.1007/s00190-017-1081-3
29. Lyu Z, Gao Y (2022) PPP-RTK with augmentation from a single reference station. J Geod
96:40. https://doi.org/10.1007/s00190-022-01627-8
30. Gunning K, Blanch J, Walter T, Groot LD, Norman L (2018) Design and evaluation of integrity
algorithms for PPP in kinematic applications. Florida, Miami, pp 1910–1939
31. Zhang J, Zhao L, Yang F, Li L, Liu X, Zhang R (2022) Integrity monitoring for undifferenced
and uncombined PPP under local environmental conditions. Meas Sci Technol 33:065010.
https://doi.org/10.1088/1361-6501/ac4b12
74 L. Yang et al.
32. Zhang W, Wang J, El-Mowafy A, Rizos C (2023) Integrity monitoring scheme for undifferenced
and uncombined multi-frequency multi-constellation PPP-RTK. GPS Solut 27:68. https://doi.
org/10.1007/s10291-022-01391-4
33. Wang S, Zhan X, Xiao Y, Zhai Y (2022) Integrity Monitoring of PPP-RTK based on multiple
hypothesis solution separation. In: Yang C, Xie J (eds) China satellite navigation conference
(CSNC 2022) proceedings. Springer Nature, Singapore, pp 321–331
34. Zhang W, Wang J (2023) GNSS PPP-RTK: integrity monitoring method considering wrong
ambiguity fixing. GPS Solut 28:30. https://doi.org/10.1007/s10291-023-01572-9
35. Elsayed H, El-Mowafy A, Wang K (2023) A new method for fault identification in real-time
integrity monitoring of autonomous vehicles positioning using PPP-RTK. GPS Solut 28:32.
https://doi.org/10.1007/s10291-023-01569-4
36. Innac A, Gaglione S, Troisi S, Angrisano A (2018) A proposed fault detection and exclusion
method applied to multi-GNSS single-frequency PPP. In: 2018 European navigation conference
(ENC), pp 129–139
37. Tanil C, Khanafseh S, Joerger M, Kujur B, Kruger B, Groot LD, Pervan B (2019) Optimal
INS/GNSS coupling for autonomous car positioning integrity, pp 3123–3140
38. Gunning K, Blanch J, Walter T, Groot LD, Norman L (2019) Integrity for tightly coupled PPP
and IMU. Florida, Miami, pp 3066–3078
39. Wang S, Zhai Y, Zhan X (2023) Implementation of solution separation-based Kalman filter
integrity monitoring against all-source faults for multi-sensor integrated navigation. GPS Solut
27:103. https://doi.org/10.1007/s10291-023-01423-7
40. Li T, Pei L, Xiang Y, Wu Q, Xia S, Tao L, Guan X, Yu W (2021) P3-LOAM: PPP/LiDAR
loosely coupled SLAM with accurate covariance estimation and robust RAIM in Urban Canyon
environment. IEEE Sens J 21:6660–6671. https://doi.org/10.1109/JSEN.2020.3042968
41. El-Mowafy A (2019) On detection of observation faults in the observation and position domains
for positioning of intelligent transport systems. J Geod 93:2109–2122. https://doi.org/10.1007/
s00190-019-01306-1
42. Blanch J, Walter T, Enge P, Lee Y, Pervan B, Rippl M, Spletter A (2012) Advanced RAIM
user algorithm description: integrity support message processing, fault detection, exclusion,
and protection level calculation, pp 2828–2849
43. Ge M, Gendt G, Rothacher M, Shi C, Liu J (2008) Resolution of GPS carrier-phase ambiguities
in precise point positioning (PPP) with daily observations. J Geod 82:389–399. https://doi.org/
10.1007/s00190-007-0187-4
44. Zhao Q, Sun B, Dai Z, Hu Z, Shi C, Liu J (2015) Real-time detection and repair of cycle slips
in triple-frequency GNSS measurements. GPS Solut 19:381–391. https://doi.org/10.1007/s10
291-014-0396-2
45. Wu Z, Wang Q, Hu C, Yu Z, Wu W (2022) Modeling and assessment of five-frequency BDS
precise point positioning. Satell Navig 3:8. https://doi.org/10.1186/s43020-022-00069-z
46. Baarda W (1967) Statistical concepts in geodesy. Nederlandse Commissie Voor Geodesie
47. Baarda W (1968) A testing procedure for use in geodetic networks. Netherland Geodetic
Commission
48. Yang L, Wang J, Knight NL, Shen Y (2013) Outlier separability analysis with a multiple
alternative hypotheses test. J Geod 87:591–604. https://doi.org/10.1007/s00190-013-0629-0
49. El-Mowafy A, Wang K (2022) Integrity monitoring for kinematic precise point positioning
in open-sky environments with improved computational performance. Meas Sci Technol
33:085004. https://doi.org/10.1088/1361-6501/ac5d75
50. Gao Y, Li Z (1999) Cycle slip detection and ambiguity resolution algorithms for dual-frequency
GPS data processing. Mar Geod 22:169–181. https://doi.org/10.1080/014904199273443
51. Blewitt G (1990) An automatic editing algorithm for GPS data. Geophys Res Lett 17:199–202.
https://doi.org/10.1029/GL017i003p00199
52. Liu J, Ge M (2003) PANDA software and its preliminary result of positioning and orbit
determination. Wuhan Univ J Nat Sci 8:603–609. https://doi.org/10.1007/BF02899825
53. Teunissen PJG (1995) The least-squares ambiguity decorrelation adjustment: a method for fast
GPS integer ambiguity estimation. J Geod 70:65–82. https://doi.org/10.1007/BF00863419
3 Integrity Monitoring for GNSS Precision Positioning 75
54. Teunissen PJG (2018) Distributional theory for the DIA method. J Geod 92:59–80. https://doi.
org/10.1007/s00190-017-1045-7
55. Yang L, Shen Y, Li B, Rizos C (2021) Simplified algebraic estimation for the quality control
of DIA estimator. J Geod 95:14. https://doi.org/10.1007/s00190-020-01454-9
56. Ding X, Coleman R (1996) Multiple outlier detection by evaluating redundancy contributions
of observations. J Geod 70:489–498. https://doi.org/10.1007/BF00863621
57. Kok JJ (1984) On data snooping and multiple outlier testing. National Geodetic Survey
58. Teunissen PJG (2006) Testing theory: an introduction. VSSD Press
59. Beckman RJ, Cook RD (1983) Outlier … … …. s. Technometrics 25:119–149. https://doi.org/
10.1080/00401706.1983.10487840
60. Rousseeuw PJ, Leroy AM (1987) Robust regression and outlier detection
61. Yu Y, Yang L, Shen Y, Sun N (2023) A DIA method based on maximum a posteriori estimate
for multiple outliers. GPS Solut 27:199. https://doi.org/10.1007/s10291-023-01534-1
62. Yang L, Wang J, Li H, Balz T (2021) Global assessment of the GNSS single point positioning
biases produced by the residual tropospheric delay. Remote Sens 13:1202. https://doi.org/10.
3390/rs13061202
63. McGraw GA (2012) Tropospheric error modeling for high integrity airborne GNSS navigation.
In: Proceedings of the 2012 IEEE/ION position, location and navigation symposium, pp 158–
166
64. Yang L, Fu Y, Zhu J, Shen Y, Rizos C (2023) Overbounding residual zenith tropospheric delays
to enhance GNSS integrity monitoring. GPS Solut 27:76. https://doi.org/10.1007/s10291-023-
01408-6
65. Blanch J, Walter T, Enge P (2019) Gaussian bounds of sample distributions for integrity anal-
ysis. IEEE Trans Aerosp Electron Syst 55:1806–1815. https://doi.org/10.1109/TAES.2018.287
6583
Chapter 4
Machine Learning-Aided Tropospheric
Delay Modeling over China
Abstract Real-time precise tropospheric corrections are critical for global navi-
gation satellite system (GNSS) data processing. This chapter aims to develop a
new tropospheric delay model over China with advanced machine learning method.
Compared with previous models, the new model has features such as high accuracy, a
small number of coefficients and good continuity of service, showing a good perfor-
mance in severe weather conditions. The new model utilizes the complementary
advantages of numerical weather prediction (NWP) forecasts and real-time GNSS
observations with the aid of machine learning, which alleviates the high-dependency
on the dense GNSS network and allows for the ease of generating tropospheric correc-
tions. The results can provide a new insight into augmenting tropospheric delays for
BeiDou Satellite-Based PPP service across China.
4.1 Introduction
H. Zhang (B) · L. Li
State Key Laboratory of Geodesy and Earth’s Dynamics, Innovation Academy for Precision
Measurement Science and Technology, Chinese Academy of Sciences, Wuhan, China
e-mail: [email protected]
L. Li
College of Earth and Planetary Sciences, University of Chinese Academy of Sciences, Beijing,
China
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 77
K. Yu (ed.), Positioning and Navigation Using Machine Learning Methods, Navigation:
Science and Technology 14, https://doi.org/10.1007/978-981-97-6199-9_4
78 H. Zhang and L. Li
This chapter is organized as follows: Sect. 4.2 describes the datasets used in
this study. Section 4.3 introduces the methodology for generating the WAPTCs.
Section 4.4 presents evaluations on the WAPTCs. Section 4.5 summarizes the
conclusions.
Real-time data of 264 GNSS stations across China from two national monitoring
networks are used, including the China Crustal Movement Observation Network
(CMONOC) and the Beidou Ground-Based Augmentation System (GBAS) network.
The altitudes of GNSS stations range from sea level to as high as 4570 m and
the distribution of GNSS stations is shown in Fig. 4.1. The GPS/GLONASS dual-
frequency observation are tracked by all stations with the sampling interval of 1 s.
The period of GNSS data range from August 1 to December 31, 2020, covering
summer and winter.
The Global Forecast System (GFS) is a numerical weather prediction model,
which is developed and operated by the National Centers for Environmental Predic-
tion (NCEP) in American. The GFS model divides the atmosphere parameters into 26
Fig. 4.1 Geographic distribution of the 264 GNSS stations used in this study
4 Machine Learning-Aided Tropospheric Delay Modeling over China 81
isobaric levels with a spatial resolution of 0.25° × 0.25° every hour, such as temper-
ature, pressure, specific humidity, and other variables at each grid point. The ZTDs
can be estimated from the hourly GFS model with integral method, namely GFS
ZTD. We extracted the GFS-ZTDs with spatial resolution of 1.0° × 1.0° over China,
because the resampled result is sufficient for our study with reduced calculation
burden.
The GFS operates and forecasts every 6 h at 00:00, 06:00, 12:00, and 18:00
UTC for the first 120 h. The forecasts are successively uploaded with a latency of
approximately 3–5 h. Thus, the forecasts with short-range (5–10 h) and GNSS ZTDs
of 1s interval are combined to generate WAPTCs model. The WAPTCs model has a
temporal resolution of 5 min. The characteristics of the data used are summarized in
Table 4.1.
4.3 Methodology
e = 0.01RH · es · fw (4.2)
where RH is the relative humidity, es is the saturation water vapour pressure, and fw
is an enhancement factor. It should be noted that hydrostatic delay above top level
of GFS atmosphere profiles is calculated with Saastamoinen model while the wet
delay is too small so it can be ignored. In addition, to stay consistency with the
heights system in GNSS, the geopotential heights used in GFS are transformed into
ellipsoidal heights using Earth Gravitational Model 2008.
Topography and climate significantly affect tropospheric delay over China, due to
diverse topography and dynamic weather patterns, which lead to considerable spatial
and temporal variation in tropospheric delays. To simplify modeling complexity,
the advanced empirical model IGGtrop is introduced as the background reference
to mitigate the main spatiotemporal variation of tropospheric delay. The users are
also easier to retrieve ZTD based on the IGGtrop mode and several coefficients for
tropospheric corrections. This approach would significantly reduce redundancy and
complexity in tropospheric correction modeling and approximation.
Machine learning is a powerful tool to makes complex models easier to build, which
has been developed for tropospheric modeling. The GRNN model is introduced to
build a flexible model, mapping GNSS-ZTD to GFS-predicted ZTD, and calibrate it
with real-time GNSS ZTD.
The GRNN model has a strong nonlinear mapping ability. The training set includes
values of inputs x and values of an output y corresponding to each component of
x. The model consists of four layers: input layer, pattern layer, summation layer,
and output layer. The neurons in summation layer can be divided into two types.
One calculates the algebraic sum of the neurons in the pattern layer, which is called
the denominator unit. Another calculates the weighted sum of the neurons in the
pattern layer, which is called the molecular unit. The output layer merely divides the
denominator unit by the molecular unit to yield the desired estimate of y.
In this study, it can be determined by GRNN model that nonlinear relationship
between GFS-ZTD and GNSS-ZTD corrections using sparse data samples. The struc-
ture of the GRNN model is shown in Fig. 4.3. The input variables consist of four
parameters, namely site GFS-ZTD correction, site latitude, longitude, and height.
The output variables are the site-wise ZTD corrections estimated by real-time PPP.
4 Machine Learning-Aided Tropospheric Delay Modeling over China 85
Latitude
Longitude
Height
It should be noted that GNSS-ZTD estimates after convergence are used to avoid the
adverse impacts of inaccurate samples on training.
The training process only involves one parameter (i.e., the spread parameter σ )
that needs to be manually set in advance. In this study, we adapt a sample-based
tenfold cross-validation technique to determine the optimal spread parameter for
each epoch. This technique is essentially a resampling process, in which all samples
are used as test data once, which performs particularly well in a limited number
of data samples. Firstly, we divide all data samples (i.e., site-wise ZTD correction)
into ten groups randomly and uniformly. Next, each group will be used as the test
sample, while the remaining nine groups will be used as learning samples to train
the model. Then, we repeated the training of the model using learning samples and
evaluated it using test samples until each group of 10 samples was used as a test
sample once. Finally, the optimal σ can be determined based on the results of ten
rounds of evaluation. Once the GRNN model is properly trained, it can be used as
a tool for obtaining improved ZTD datasets, using gridded GFS-ZTD correction as
input.
coordinate parameters used in previous studies can lead to poor numerical stability
in the estimation of the polynomial coefficients. (3) The insufficient polynomial
degrees used in previous studies also limit their modelling accuracies. In this study,
we address the above issues by conducting the following tasks. (1) An improved
tropospheric dataset represented on a finer surface grid is generated and used as
the modelling dataset, which significantly improves the spatial representation. (2) A
proper coordinate scale factor is employed in polynomial approximation to improve
the numerical stability of the coefficient estimation. (3) The polynomial degree is
optimized according to feedback from both internal and external evaluations. The
polynomial approximation has the following form:
n
n
m
Tcor = aij (kdL)i (kdB)j + bt ht (4.5)
i=0 j=0 t=1
where Tcor is the ZTD correction; n is the maximum degree of the polynomial function
accounting for ZTD corrections in the horizontal direction; k is the coordinate scale
factor; dL and dB (in rad) are the latitude and longitude differences, respectively,
between the sample point and centre point of the region; and h is the height (in
km) of the sample point. The second term of the equation accounts for the vertical
variations of the ZTD corrections. Since the vertical ZTD variations are already well
characterized by the IGGtrop model a priori, m = 2 is sufficient for modelling. aij
and bt are the polynomial coefficients, which are estimated using the least-squares
technique. The optimal determination of n and k is described in the following sections.
The GRNN is a one-pass learning algorithm that features a fast-learning process,
making it suitable for real-time applications. Moreover, it also provides flexibility to
include a time-varying number of training samples, which can overcome the issue
of the time-varying number of valid GNSS stations during training due to accident
outages and breaks in the real-time stream. We briefly review the operation of GRNN
as follows.
The pattern layer of GRNN generally adopts a Gaussian function, which can
approximate a continuous value with arbitrary precision. Each neural unit in the
pattern layer has a basis function, and these basis functions are linearly combined
through the weights. Infinite approximation makes the output of the neural network
approach a certain value or causes it to no longer change, so that the GRNN model
is stabilized. The GRNN model draws the function estimate directly from training
data and does not need an iterative training procedure; thus, GRNN does not have a
local minimum issue.
4 Machine Learning-Aided Tropospheric Delay Modeling over China 87
Fig. 4.4 Bias, STD and RMS error values of the GFS-ZTDs (left panels) and real-time GNSS-
ZTDs (right panels) in summer (in red) and winter (in blue) months at the 264 GNSS stations
with respect to the GNSS-ZTDs from the postprocessing run. The values in brackets represent the
minimum and maximum values of the bias, STD and RMS for all stations
88 H. Zhang and L. Li
with around 7.0 mm appears in GFS-ZTD, indicating that GFS-ZTDs are over-
estimated in China, especially in low latitude regions. (3) Real-time GNSS-ZTD
outperforms GFS-ZTD in terms of biases, STD, and RMS. It is thus feasible to use
the real-time GNSS-ZTD to calibrate GFS-ZTDs.
To further illustate the accuracy of the GFS-ZTD over China, we evaluate the GFS-
ZTD with respect to the ZTD derived from ERA5 reanalysis products, namely ERA5-
ZTD. Zue et al. (2015) distributed a global tropospheric grid products in a horizontal
resolution of 1° × 1° based on ERA5 at GFZ. We use the same orography model
(1° × 1°) and refractivity coefficients adpated in GFZ products when gernerating
the GFS-ZTD. Therefore, we compare the GFS-ZTD grid to the ERA5-ZTD grid
(from GFZ) directly without horizontal or vertical adjustment for ZTD. The bias and
RMS of the ZTD differences between the GFS-ZTD and ERA5-ZTD are presented
in Fig. 4.5. A significant spatial variation in biases is observed on GFS-ZTD over
China, with most regions showing positive values, indicating that GFS-ZTD is often
overestimated. In addition, RMS values are related to latitude and are larger in low
latitude regions of China. The same results are confirmed by another dataset as
reference.
As for the spatial characteristics, a significant negative bias appears in GFS-ZTD
over Sichuan Basin, showing a notable dark blue at this area in Fig. 4.5a (bias).
This relates to the unique basin topography. Moreover, large RMS values are seen
in Fig. 4.5b (RMS), predominantly along the coast. This is mainly attributed to the
influences of the Asian and Pacific monsoons, which bring the amount of water
vapour over the southern coast of China from ocean. Thus, the variation of water
vapour would be larger than that over other regions, increasing uncertainties and
leading to worse consistency with the ERA5-ZTD. Therefore, the quality of the
GFS-ZTDs is related not only to latitude but also to topography and climate zones.
Fig. 4.5 Bias (left panel) and RMS (right panel) of the GFS-ZTDs with respected to the ERA5-
derived ZTDs over China from 1 August to 31 December in 2020
4 Machine Learning-Aided Tropospheric Delay Modeling over China 89
Fig. 4.6 Mean RMS errors RMS in Winter RMS in Summer Linear
of the GFS-ZTDs at the 264 14.8
(a) Summer
GNSS stations with forecast
ZTD RMS [mm]
14.6
horizons of 5, 6, 7, 8, 9 and
10 h in summer (a, in red)
14.4
and winter (b, in blue)
months. The GNSS-ZTDs
14.2
from the postprocessing run
are used as references
14.0
11.6
(b) Winter
ZTD RMS [mm]
11.4
11.2
11.0
10.8
5 6 7 8 9 10
Forecasting Hours
90 H. Zhang and L. Li
Fig. 4.7 Histogram of the ZTD errors obtained from the WAPTC (GFS-only) and WAPTC (GNSS/
GFS) at all GNSS stations across China from 1 August to 31 December 2020. The GNSS-ZTDs
from the postprocessing run are used as references. The statistical results for the errors in terms of
bias, STD and RMS values are also depicted
of the ZTD errors across all GNSS stations from August 1st to December 31st,
2020. Three cases with varying polynomial degrees (7, 10, and 13) are illustrated in
Fig. 4.7a, b, c, respectively, aiming to showcase the influence of polynomial degree
on modeling accuracy.
Two key observations are summaried from Fig. 4.7 as follows: Firstly, both
WAPTC (GFS-only) and WAPTC (GNSS/GFS) demonstrate improved accuracy with
higher polynomial degrees. Secondly, WAPTC (GNSS/GFS) outperforms WAPTC
(GFS-only) with smaller biases. This highlights the superior performance of WAPTC
(GNSS/GFS) in mitigating biases compared to solutions relying solely on GFS data.
The GRNN model trained on sparse GNSS stations contributes to the production of
tropospheric datasets with enhanced accuracy compared to GFS-only solutions.
An external validation was conducted to verify the accuracy of WAPTC (GNSS/
GFS) using post-processing GNSS-ZTDs. The validation methodology is briefly
described as follows: WAPTC (GNSS/GFS) was generated epoch by epoch. For
each epoch, 90% of available GNSS stations were randomly selected to train the
GRNN for calibrating the GFS-ZTD correction grid, while the remaining 10% were
designated as external test stations. ZTDs obtained from WAPTC (GNSS/GFS) at the
test stations were then compared against post-processing GNSS-ZTDs to calculate
errors. Simultaneously, WAPTC (GFS-only) was evaluated using the same selected
test stations. Additionally, we generated and evaluated WAPTCs by using only the
sitewise real-time GNSS-ZTDs simultaneously for comparison, which is denoted as
WAPTC (GNSS-only). The evaluation spanned a five-month period from August 1st
to December 31st, 2020, with 1 h intervals. Since test stations were randomly selected
in each epoch, the evaluation results were deemed sufficiently representative.
Figure 4.8 presents mean bias and RMS error values for WAPTC (GNSS-only),
WAPTC (GFS-only), and WAPTC (GNSS/GFS) across all randomly selected test
stations during summer and winter seasons, with polynomial degree (n) ranging
from 3 to 16. WAPTC (GNSS-only) consistently reached its minimum RMS error
4 Machine Learning-Aided Tropospheric Delay Modeling over China 91
RMS [mm]
0 16
-5 14
-10 12
-15 10
-20 8
15 Summer 45
10
Summer
40
5
RMS [mm]
Bias [mm]
35
0
30
-5
-10 25
-15 20
-20 15
0 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Degree Degree
Fig. 4.8 Mean biases and RMS errors of the ZTDs obtained from the WAPTC (GNSS-only),
WAPTC (GFS-only) and WAPTC (GNSS/GFS) with the polynomial degree varying from 3 to 16 at
the 10% randomly selected test stations during winter (upper panels) and summer (lower panels).
The accuracy of the IGGtrop model is also shown for comparison. The GNSS-ZTDs from the
post-processing are used as references
(11.9 mm in winter and 23.8 mm in summer) at relatively low degrees due to the
limited spatial coverage of GNSS stations. This scarcity makes estimating exces-
sive coefficients challenging and burdensome in GNSS-only conditions. WAPTC
(GNSS-only) exhibited no conspicuous bias at lower degrees before RMS error
began to rise. In contrast, increasing the degree led to decreased RMS error
for WAPTC (GFS-only), primarily attributable to the extensive spatial coverage
of the GFS-ZTD grid. However, WAPTC (GFS-only) displayed noticeable bias.
Ultimately, WAPTC (GNSS/GFS) leveraged the complementary advantages of
both WAPTC(GNSS-only) and WAPTC (GFS-only) to achieve optimal overall
performance.
Furthermore, the mean RMS error of WAPTC (GNSS/GFS) decreases to 10.0
mm (in winter) and 16.0 mm (in summer) when n = 13 and shows no significant
further decrease at higher degrees. This finding aligns with the outcomes depicted in
Figs. 4.7 and 4.8, where fitting errors serve as feedback. Hence, a balance between
accuracy and efficiency can be attained by setting n as 13. With this optimal value of
n, WAPTC (GNSS/GFS) enhances ZTD accuracy by approximately 16% (in winter)
and 23% (in summer) compared to the WAPTC (GNSS-only) scenario, by about 16%
(in winter) and 17% (in summer) compared to the WAPTC (GFS-only) scenario, and
by roughly 50% (in winter) and 62% (in summer) compared to the empirical IGGtrop
model for China.
92 H. Zhang and L. Li
To gain insight into the performance of the proposed WAPTCs, we conducted eval-
uations focusing on challenging conditions, particularly complex terrain and severe
weather events. These scenarios are known to pose significant challenges to tropo-
spheric models. By scrutinizing WAPTCs under such conditions, we can better
understand their capabilities and advantages.
Complex terrain: Hengduan Mountains
We evaluated WAPTCs in the challenging terrain of the Hengduan Mountains,
selecting 30 GNSS stations across the region. These stations, situated between 150
and 3500 m elevation, represent areas with complex topography. The spatial distri-
bution of these 30 stations is shown in Fig. 4.9. This assessment aims to gauge
the performance of WAPTCs in demanding conditions, particularly in regions with
intricate terrain and low latitudes over China.
The ZTDs obtained from WAPTC (GNSS/GFS) at these test stations were
compared against the corresponding post-processing GNSS-ZTDs to calculate errors.
The biases and RMS errors at all 30 stations are presented in Fig. 4.10. Importantly,
only errors generated when the stations served as test stations (not used in GRNN
training) are included in these results. The experimental period spans from August
1st to December 31st, 2020, with 1 h intervals. Each station functions as a test
station more than 670 times (equivalent to 4 weeks) during this period, ensuring the
external evaluation’s representativeness. This process was repeated using WAPTC
(GFS-only), and corresponding evaluation results are depicted in the left panels of
Fig. 4.10 for comparison.
4000
3500
3000
Height [m]
2500
2000
1500
1000
500
0
34
32 108
106
Lat 30 104
itud 28 102
e [° 26 100 [°E]
N] 98 itude
24 96 Long
Fig. 4.9 Spatial distribution of the 30 test stations in the Hengduan Mountains
4 Machine Learning-Aided Tropospheric Delay Modeling over China 93
Fig. 4.10 Biases and RMS errors of the ZTDs obtained from the WAPTC (GFS-only) and WAPTC
(GNSS/GFS) over a mountainous region from 1 August to 31 December 2020. The GNSS-ZTDs
from the postprocessing run are used as references
In Fig. 4.10a, the WAPTC (GFS-only) results have notable positive biases across
most stations with a mean value of 19.2 mm. This highlights a tendency for ZTDs
derived from WAPTC (GFS-only) to be consistently overestimated. However, the
implementation of WAPTC (GNSS/GFS) can effectively mitigate these biases,
reducing the mean bias value to 6.5 mm. This outcome underscores the successful
calibration of GFS-ZTD biases through the trained GRNN model. Furthermore, the
mean RMS error value of WAPTC (GNSS/GFS) at these 30 test stations stands at
16.9 mm, marking a notable 21% enhancement in accuracy compared to WAPTC
(GFS-only).
Severe weather: Typhoon Maysak
To illustrate the potentials of the WAPTC under severe weather conditions, Fig. 4.11
shows a case of Typhoon Maysak. The path information of Maysak is shown in
Fig. 4.11a. The error time series of the ZTDs obtained from WAPTC (GNSS/GFS)
at the two test stations (i.e., JLCB and HRBN, not used in training GRNN), which
94 H. Zhang and L. Li
Fig. 4.11 a The path track of Typhoon Maysak on 3 September 2020. b–c Time series of ZTD errors
obtained from IGGtrop, WAPTC (GFS-only) and WAPTC (GNSS/GFS) at stations JLCB (b) and
HRBN (c) before, during and after the typhoon event. The absolute ZTDs from postprocessing run
are also presented for references
are both near the path of Maysak, over a 3 day period (from 2 to 4 in September of
2020), are shown in Fig. 4.11b, c, respectively. The errors of the ZTDs obtained from
the WAPTC (GFS-only) and the IGGtrop model are also presented for comparison
purposes.
From Fig. 4.11, we can see that the IGGtrop model exhibit error peaks at both
two stations (even exceeds 125 mm at HRBN) when the Maysak is approaching.
This indicates that Typhoon Maysak causes rapid change in ZTD, which cannot be
characterized by the empirical IGGtrop model and thus resulting in large errors. The
ZTD errors obtained from WAPTC (GFS-only) are generally stable before, during
and after the typhoon event with a mean RMS error of 19.1 mm at JLCB and 18.0
mm at HRBN. The WAPTC (GNSS/GFS) jointly using the GNSS and GFS further
decreases the mean RMS error to 10.9 mm at JLCB station and 14.9 mm at HRBN
station, which suggests the superiority of the WAPTC (GNSS/GFS) over the WAPTC
(GFS-only) under severe weather conditions.
Performance of WAPTC in augmenting RTPPP
To showcase the potential of WAPTC under severe weather conditions, Fig. 4.11
presents a case study of Typhoon Maysak. The path information of Maysak is depicted
in Fig. 4.11a. Figure 4.11b, c display the error time series of ZTDs obtained from
WAPTC (GNSS/GFS) at two test stations (JLCB and HRBN), both located near
the path of Maysak, over a 3 day period (from September 2nd to 4th, 2020). These
stations were not used in the GRNN training set.
From Fig. 4.11, it’s obvious that the IGGtrop model exhibits error peaks at
both stations (even exceeding 125 mm at HRBN) as Typhoon Maysak approaches.
This suggests that the rapidly changing ZTD induced by the typhoon cannot be
4 Machine Learning-Aided Tropospheric Delay Modeling over China 95
Fig. 4.12 Real-time kinematic PPP ZTD (top panels) and positioning errors (bottom panels) from
Standard PPP and WAPTCs-augmented PPP at the test stations CDDZ and JFSP over a 24 h period
96 H. Zhang and L. Li
domains, suggesting that WAPTCs can enhance the performance of real-time PPP.
The average STD error of the WAPTCs-augmented PPP solution in the up compo-
nent is approximately 4.0 cm at the two test stations. In contrast, with standard PPP,
this value increases to 6.0–7.0 cm. Consequently, the positioning accuracy in the up
component improves by 23.8% when employing WAPTCs.
4.5 Conclusion
References
1. Boehm J, Werl B, Schuh H (2006) Troposphere mapping functions for GPS and very long
baseline interferometry from European centre for medium-range weather forecasts operational
analysis data. J Geophys Res-Solid Earth 111(B2)
2. Hobiger T, Ichikawa R, Koyama Y, Kondo T (2008) Fast and accurate ray-tracing algorithms for
real-time space geodetic applications using numerical weather models. J Geophys Res-Atmos
113(D20)
3. Li XX, Zhang XH, Ge MR (2011) Regional reference network augmented precise point
positioning for instantaneous ambiguity resolution. J Geodesy 85(3):151–158
4. Wilgan K, Hadas T, Hordyniec P, Bosy J (2017) Real-time precise point positioning augmented
with high-resolution numerical weather prediction model. GPS Solutions 21(3):1341–1353
5. Zheng F, Lou YD, Gu SF, Gong XP, Shi C (2018) Modeling tropospheric wet delays with
national GNSS reference network in China for BeiDou precise point positioning. J Geodesy
92(5):545–560
6. Bisnath S and IEEE (2020) PPP: Perhaps the natural processing mode for precise GNSS PNT,
presented at the 2020 IEEE/ION Position, Location and Navigation Symposium (Plans)
7. Lu C, Zhong Y, Wu Z, Zheng Y, Wang Q (2023) A tropospheric delay model to integrate ERA5
and GNSS reference network for mountainous areas: application to precise point positioning.
GPS Solutions 27(2)
8. European Union (2022) Galileo High Accuracy Service Signal-In-Space Interface Control
Document (HAS SIS ICD) Issue 1.0
9. European Union (2023) Galileo High Accuracy Service Service Definition Document (HAS
SDD) Issue 1.0
10. Cabinet Office (2020) Quasi-Zenith Satellite System Interface Specification Centimeter Level
Augmentation Service (IS-QZSS-L6-003)
11. CSNO (2020) BeiDou Navigation Satellite System Signal in space interface control document
Precise Point Positioning Service Signal PPP-B2b (Version 1.0)
12. Hadas T, Kaplon J, Bosy J, Sierny J, Wilgan K (2013). Near-real-time regional troposphere
models for the GNSS precise point positioning technique. Meas Sci Technol 24(5)
13. Bock O, Tarniewicz J, Thom C, Pelon J, Kasser M (2001) Study of external path delay correction
techniques for high accuracy height determination with GPS. Phys Chem Earth Part A-Solid
Earth Geodesy 26(3):165–171
14. Yao YB, Xu XY, Xu CQ, Peng WJ, Wan YY (2019) Establishment of a real-time local
tropospheric fusion model. Remote Sens 11(11)
15. Hadas T, Teferle FN, Kazmierski K, Hordyniec P, Bosy J (2017) Optimum stochastic modeling
for GNSS tropospheric delay estimation in real-time. GPS Solutions 21(3):1069–1081
16. Dousa J, Elias M, Václavovic P, Eben K, Krc P (2018) A two-stage tropospheric correction
model combining data from GNSS and numerical weather model. GPS Solutions 22(3)
17. Dousa J, Vaclavovic P (2014) Real-time zenith tropospheric delays in support of numerical
weather prediction applications. Adv Space Res 53(9):1347–1358
18. Li XX, Ge MR, Dousa J, Wickert J (2014) Real-time precise point positioning regional
augmentation for large GPS reference networks. GPS Solutions 18(1):61–71
98 H. Zhang and L. Li
19. Lu CX et al (2017) Improving BeiDou real-time precise point positioning with numerical
weather models. J Geodesy 91(9):1019–1029
20. Vaclavovic P, Dousa J, Elias M, Kostelecky J (2017) Using external tropospheric corrections
to improve GNSS positioning of hot-air balloon. GPS Solutions 21(4):1479–1489
21. Yu C, Li ZH, Penna NT, Crippa P (2018) Generic Atmospheric correction model for interfer-
ometric synthetic aperture radar observations. J Geophys Res-Solid Earth 123(10):9202–9222
22. Wilgan K, Geiger A (2019) High-resolution models of tropospheric delays and refractivity
based on GNSS and numerical weather prediction data for alpine regions in Switzerland. J
Geodesy 93(6):819–835
23. Andrei CO, Chen RZ (2009) Assessment of time-series of troposphere zenith delays derived
from the global data assimilation system numerical weather model. GPS Solutions 13(2):109–
117
24. Lu CX et al (2016) Tropospheric delay parameters from numerical weather models for multi-
GNSS precise positioning. Atmos Meas Tech 9(12):5965–5973
25. Zhang WX et al. (2020) Rapid troposphere tomography using adaptive simultaneous iterative
reconstruction technique. J Geodesy 94(8)
26. Shi JB, Xu CQ, Guo JM, Gao Y (2014) Local troposphere augmentation for real-time precise
point positioning. Earth Planets Space 66
27. Oliveira PS et al (2017) Modeling tropospheric wet delays with dense and sparse network
configurations for PPP-RTK. GPS Solutions 21(1):237–250
28. Specht DF (1991) A general regression neural network. IEEE Trans Neural Netw 2(6):568–576
29. Yuan QQ, Xu HZ, Li TW, Shen HF, Zhang LP (2020) Estimating surface soil moisture from
satellite observations using a generalized regression neural network trained on sparse ground-
based measurements in the continental U.S. J Hydrol 580
30. Zhang B, Yao YB (2021) Precipitable water vapor fusion based on a generalized regression
neural network. J Geodesy 95(3)
31. Li W et al (2015) New versions of the BDS/GNSS zenith tropospheric delay model IGGtrop.
J Geodesy 89(1):73–80
32. Hadas T, Hobiger T, Hordyniec P (2020) Considering different recent advancements in GNSS
on real-time zenith troposphere estimates. GPS Solutions 24(4)
Chapter 5
Deep Learning Based GNSS Time Series
Prediction in Presence of Color Noise
Abstract Global Navigation Satellite System (GNSS) time series prediction plays
a significant role in monitoring crustal plate motion, landslide detection, and main-
tenance of the global coordinate framework. Long Short-Term Memory (LSTM),
a deep learning model has been widely applied in the field of high-precision time
series prediction especially when combined with Variational Mode Decomposition
(VMD) to form the VMD-LSTM hybrid model. To further improve the prediction
accuracy of the VMD-LSTM model, this paper proposes a dual variational modal
decomposition long short-term memory (DVMD-LSTM) model to effectively handle
the noise in GNSS time series prediction. This model extracts fluctuation features
from the residual terms obtained after VMD decomposition to reduce the prediction
errors associated with residual terms in the VMD-LSTM model. Daily E, N, and
U coordinate data recorded at multiple GNSS stations between 2000 and 2022 are
used to validate the performance of the proposed DVMD-LSTM model. The exper-
imental results demonstrate that compared to the VMD-LSTM model, the DVMD-
LSTM model achieves significant improvements in prediction performance across
all measurement stations. The average root mean squared error (RMSE) is reduced
by 9.86%, and the average mean absolute error (MAE) is reduced by 9.44%, and the
average R2 increased by 17.97%. Furthermore, the average accuracy of the optimal
noise model for the predicted results is improved by 36.50%, and the average velocity
accuracy of the predicted results is enhanced by 33.02%. These findings collectively
attest to the superior predictive capabilities of the DVMD-LSTM model, thereby
enhancing the reliability of the predicted results.
H. Chen · T. Lu
School of Geodesy and Geomatics, East China University of Technology, Nanchang 341000,
China
e-mail: [email protected]
T. Lu
e-mail: [email protected]
X. He (B)
School of Civil and Surveying and Mapping Engineering, Jiangxi University of Science and
Technology, Ganzhou 341000, China
e-mail: [email protected]; [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 99
K. Yu (ed.), Positioning and Navigation Using Machine Learning Methods, Navigation:
Science and Technology 14, https://doi.org/10.1007/978-981-97-6199-9_5
100 H. Chen et al.
5.1 Introduction
Over the past three decades, owing to the rapid advancements in satellite naviga-
tion technology, a global network of Global Navigation Satellite System (GNSS)
continuously operational reference stations has been established. These stations play
a pivotal role as primary sources of information for various purposes, such as moni-
toring crustal plate movements [1], detecting landslides [2], monitoring deformations
in structures like bridges or dams [3], and maintaining regional or global coordinate
frameworks [4]. By analyzing extensive time series data collected from these GNSS
stations, it becomes possible to predict changes in coordinates at regular intervals,
forming a fundamental basis for identifying patterns in motion. This carries signifi-
cant practical and theoretical implications in the fields of geodesy and geodynamics
research [5].
Time series prediction techniques can be broadly categorized into two main
groups: physical simulation and numerical simulation [6]. Traditional methods in
both physical and numerical simulation rely on geophysical principles, linear compo-
nents, periodic elements, and gap filling to construct models. However, these models
often struggle to capture intricate nonlinear data, requiring manual selection of feature
information and model parameters, which can lead to systematic biases and limita-
tions [7]. In contrast, deep learning, an emerging technology, has the capability to
automatically extract relevant information by constructing deep network architec-
tures. Deep learning demonstrates robust learning abilities and excels in handling
extensive and high-dimensional data [8, 9].
Long Short-Term Memory (LSTM), a significantly improved variant of Recurrent
Neural Networks (RNN), effectively tackles the challenges of gradient vanishing,
gradient exploding, and limited long-term memory commonly encountered in
conventional RNNs [10]. Due to its remarkable abilities in long-range time series
forecasting, LSTM has found extensive application across various domains of time
series prediction, including satellite navigation. For instance, Kim et al. enhanced
the precision and stability of absolute positioning solutions in autonomous vehicle
navigation by employing a multi-layer LSTM model [11]. Tao et al. adopted a
CNN-LSTM approach to extract deep multipath features from GNSS coordinate
sequences, thereby mitigating the impact of multipath effects on positioning accuracy
[12]. Additionally, Xie et al. [13] achieved accurate predictions of landslide periodic
components using the LSTM model, establishing a landslide hazard warning system
[13].
Variational Mode Decomposition (VMD) is a signal processing methodology
rooted in variational inference principles. It decomposes signals into distinct mode
components called Intrinsic Mode Functions (IMF), each with varying frequencies
achieved through an optimization process. This process effectively extracts time–
frequency local characteristics from signals, enabling efficient signal decomposition
and analysis [14–18].
5 Deep Learning Based GNSS Time Series Prediction in Presence … 101
The integration of LSTM with VWD, i.e., the VMD-LSTM model has gained
widespread adoption in various fields for time series prediction. However, most
studies typically follow a common approach: VMD is used to decompose the original
data, predict each Intrinsic Mode Function (IMF) and the residual term separately,
and then combine these predictions to obtain the final result. While this method
yields good results for each IMF, it encounters challenges in effectively capturing
the fluctuation characteristics of the residual term, leading to notable prediction
errors in the model. Additionally, existing methods primarily focus on the accuracy
of prediction results, but often overlook the inherent noise characteristics within the
data. In light of these limitations, this chapter introduces a novel hybrid model known
as the Dual VMD-LSTM (DVMD-LSTM) model, which takes the characteristics
of noise in the data into consideration. By applying VMD decomposition to the
residual components derived from the initial VMD decomposition, the proposed
model adeptly extracts the fluctuation features within the residuals, thereby enabling
high-precision prediction of GNSS time series data.
In this chapter, by fully utilizing the multi-site GNSS coordinate data, the proposed
hybrid deep learning model is first evaluated and compared with multiple deep
learning models using traditional accuracy evaluation metrics which are Root Mean
Square Error (RMSE), Mean Absolute Error (MAE), and coefficient of determination
(R2 ). Subsequently, a noise model is introduced and the prediction results of each
model are analyzed in comparison with the optimal noise model and speed calcu-
lated by the noise model. These multi-level comparison methods fully demonstrate
the excellent performance of the hybrid deep learning model algorithm proposed in
this chapter, which provides an innovative hybrid deep learning model in the field
of GNSS time series prediction, and provides an important support for research and
application in related fields.
Mode Functions, Total Practical IMF Bandwidth, Hilbert Transforms and Analytic
Signal.
(1) Intrinsic Mode Function
In the VMD algorithm, the IMF is defined as an amplitude-modulated-frequency-
modulated (AM-FM) signal with the following expression:
where Ak (t) is the instantaneous amplitude of uk (t) and the instantaneous frequency
of uk (t) is given by:
d ϕ(t)
wk (t) = ϕk' (t) = ≥0 (5.2)
dt
Ak (t) and wk (t) exhibit slow variations with respect to the phase, denoted by φk (t),
occurring within the interval of [t − δ, t + δ] (where δ ≈ 2π/φk' (t)), uk (t) can be
regarded as a harmonic signal with amplitude Ak (t) and frequency wk (t).
(2) Total Practical IMF Bandwidth
The signal uk (t) obtained by decomposition typically comprises two main frequency
components: the instantaneous frequency and the carrier frequency. If wk is the
average frequency of a mode, its actual bandwidth BW FM increases with the
maximum deviation of the instantaneous frequency from the center frequency and
the rate of offset. Based on Carson’s rule [20], we have:
∫+∞
1 1 f (ξ )
H [f (t)] = f (t) ∗ = dξ (5.5)
πt π t−ξ
−∞
5 Deep Learning Based GNSS Time Series Prediction in Presence … 103
where H [·] stands for Hilbert transform; * stands for convolution. From the above
equation, it can be seen that the process of Hilbert transform can be regarded as that
the signal f (t) passes through a filter and the impulse response is h(t) = πt1 .
Since the convolution of h(t) = π1t is non-accumulatable, the Cauchy principal
value is introduced for the solution:
⎛ −ε ⎞
∫ ∫ ∫+∞
p.ν. f (t)dt = lim+ ⎝( f (t)dt+ f (t)dt)⎠ (5.6)
ε→0
−∞ ε
where the Hilbert transform of a signal is obtained as the Cauchy principal value
(denoted p.v.) of the convolution integral.
So the Hilbert transform of the signal f (t) can be expressed as:
∫
1 f (u)
H [f (t)] = p.ν. du (5.7)
π t−u
R
The amplitude of the signal f (t) does not change after the Hilbert transform, and the
most prominent use of Hilbert is to construct a purely real signal into a complex-
valued analytic signal.
The analytical equation obtained for the real signal f (t) after the Hilbert transform
is defined as:
where j 2 = −1; the complex exponential term ejϕ(t) denotes the amount of variation
of the complex signal rotated in time; ϕ(t) is the phase, and A(t) denotes the time-
domain amplitude. For signals of the form (5.1), the analyzed signal can be expressed
as the same amplitude function:
From the above equation, the multiplication of two analytic signals automatically
become signals composed of two frequencies (i.e. addition and subtraction of the
original two frequencies) when they are mixed. However, in the Fourier transform,
there are pairs of transforms as shown in the following equation:
104 H. Chen et al.
(2) By multiplying the exponential term with each converted mode component, the
center frequency e−jωK t of each mode is estimated and the spectral components
of each mode are modulated to align with their respective fundamental bands:
[( ) ]
j
δ(t) + ∗ uK (t) e−jωK t (5.13)
πt
(3) The bandwidth ωK calculation, i.e., the calculation of the squared gradient
paradigm for each mode, is based on Gaussian smoothing demodulation.
⎧ ( || [( ) ] ||2 )
⎪ ∑ || −jωK t ||
⎨ min j
||dt δ(t) + π t ∗ uK (t) e ||
{μK },{ωK } K 2
⎪ ∑ (5.14)
⎩ s.t. uK = x(t)
K
where x(t) represents the original signal. The procedure for solving the
variational problem is described as follows.
∑ || [( ) ] ||2 || ||2
|| j || |||| ∑ ||
||
L({uK }, {ωK }, λ) = α || ∂ δ(t) + ∗ μ (t) e −jωK t ||
+|| f (t) − u (t) ||
|| t
π t
K || || K
||
K 2 K 2
∑
+ (λ(t), f (t) − uK (t) ) (5.15)
K
5 Deep Learning Based GNSS Time Series Prediction in Presence … 105
|| ∑ ||2
where ||f (t) − k uk (t)||2 is a quadratic penalty term that speeds up convergence.
To address this unconstrained variational problem, the Alternating Direction
Method of Multipliers (ADMM) is harnessed. The focus is on locating the saddle
point through iterative updates of uKn+1 , ωKn+1 and λ n+1 , thereby seeking the optimal
solution for the constrained variational model as articulated in Eq. (5.15).
That is, the modal component is determined by the minimization of the extended
Lagrange formula:
( || [( ) ] ||
|| j ( )||2
ukn+1 = arg min α || ∂
|| t δ(t) + ∗ uk (t) exp −jw n+1 ||
||
πt k
2
|| ||2 ⎫
|| || ⎬
|| ∑ n+1 λ(t) ||
+||
||f (t) − ui (t) + || (5.16)
|| 2 ||
|| ⎭
i/=k
2
Applying the Fourier transform to the above equation yields the frequency domain
expressions for the modal components and the center frequency, respectively:
∑ λ̂(w)
fˆ (w) − ûi (w) + 2
i/=k
ûkn+1 (w) = (5.17)
1 + 2α(w − wk )2
∫∞
w|ûk (w)|2 dw
0
wkn+1 = (5.18)
∫∞
|ûk (w)| dw 2
0
∑
where ûkn+1 (w) is the Wiener filter of fˆ (w) − ûi (w),wkn+1 is the modal center
i/=k
frequency, n denotes the number of iterations, ûk (w) denotes the Fourier inverse
transform, and the real part of the final result is uk (t).
The steps for the {complete
} { }implementation of VMD are:
Step 1. Initialize ûk1 , wk1 , λ̂1 and n;
Step 2. Update the values of ûk and wk according to Eqs. (5.17) and (5.18);
Step 3. Update the Lagrange multiplier operator λ̂n+1 (w) according to
[ ]
∑
λ̂n+1 (w) = λ̂n (w) + τ fˆ (w) − ûkn+1 (W ) (5.19)
k
Step 4: Given the discriminant accuracy ε > 0, end the iteration and output the
∑ n+1 n 2
||ûk −ûk ||2
result if the iteration termination condition k ∑ n 2 < ε is satisfied; otherwise,
||ûk ||2
k
return to step 2 and continue the iteration again.
106 H. Chen et al.
where W represents the weight matrices, b represents the biases, and ft−1 is a vector
with elements ranging from 0 to 1. Each element in the vector indicates the degree of
(3) The cell state from the previous layer undergoes an element-wise multiplication
with the forget vector, which is then added to the output of the input gate. This
process results in the updated cell state:
where ft−1 ∗ Ct−1 determines how much information from the previous memory
cell state Ct−1 is forgotten, while it ∗ Ct' determines how much information from
Ct' is added to the new memory cell state Ct .
(4) The value of the subsequent hidden state Ot is determined through the output
gate ht , which incorporates information from previous inputs:
ht = Ot ∗ tanh(Ct ) (5.25)
along with the residual component, undergoes individual prediction, and their predic-
tions are cumulatively combined to derive the final output of the model. It is essential
to emphasize that IMFs, characterized by their stationary nature, achieve superior
predictive accuracy when addressed independently, thus significantly enhancing the
overall predictive capability of the VMD-LSTM model. Notably, the specific predic-
tion process, as depicted on the left-hand side of Fig. 5.2, does not involve any
decomposition of the residual value.
However, it is crucial to acknowledge that the residual component resulting from
the VMD decomposition of real-world data retains certain fluctuation characteristics
and non-white noise elements, such as high-frequency noise. In response to this,
the model proceeds to conduct further decomposition of the residual terms using
VMD and predicts the decomposed modal components to mitigate the influence
of incomplete VMD decomposition. The DVMD-LSTM model enhances overall
prediction accuracy by replacing the predicted outcomes of the original residual
terms with the combined modal components. This strategic adjustment effectively
reduces the impact of residual terms on prediction accuracy. The detailed workflow
is elucidated in Fig. 5.2.
The precise prediction procedure of the DVMD-LSTM model can be delineated
as follows:
Step 1: Commence with the preprocessing of GNSS time series data, encom-
passing tasks such as the removal of outliers, interpolation, and other data prepro-
cessing techniques. Subsequently, feed the preprocessed data into the VMD for
decomposition.
Step 2: Further break down the residue component denoted as “r1 ”, derived from
the initial VMD operation, into individual modal components. Concurrently, conduct
another round of VMD to obtain a new residue component, designated as “r2 ”.
5 Deep Learning Based GNSS Time Series Prediction in Presence … 109
Step 3: Aggregate the modal components derived from the VMD decomposition
of “r1 ” to formulate the Fused Intrinsic Mode Function (Fuse-IMF). This Fuse-IMF
is utilized as a predictive feature within the LSTM model.
Step 4: Employ the individual modal components, extracted from the VMD
decomposition of the original GNSS time series, as distinct features. These features
are input separately into the LSTM model for prediction, yielding K prediction
outcomes, where K signifies the count of modal components generated during VMD
decomposition.
Step 5: Sum the K prediction outcomes generated in Step 4 with the prediction
outcome of the Fuse-IMF to acquire the ultimate prediction result of the DVMD-
LSTM model.
Step 6: Compute the performance metrics to evaluate the model’s performance
across various noise models.
To evaluate the prediction accuracy and analyze the noise characteristics of the hybrid
model, this research employs several evaluation metrics, including Root Mean Square
Error (RMSE), Mean Absolute Error (MAE), and the coefficient of determination
(R2 ). Additionally, the Bayesian Information Criterion (BIC_tp) is utilized to deter-
mine the most appropriate noise model for both the original GNSS time series and the
predicted time series under each model. This aids in assessing whether the prediction
results adequately account for colored noise patterns [23]. The specific definitions
of these three evaluation metrics are given as follows:
(1) RMSE:
[
|
|1 ∑ n
RMSE = / (yi − ŷi )2 (5.26)
n i=1
(2) MAE:
1∑
n
MAE = |(yi − ŷi )| (5.27)
n i=1
(3) R2 :
∑
n
(yi − ŷi )2
i=1
R2 = 1 − (5.28)
∑n
(yi − y)2
i=1
110 H. Chen et al.
where yi is the actual GNSS data values, y is the mean of the actual GNSS data
values, ŷi is the predicted results generated by each model, and n denotes the total
number of GNSS data points.
The values of RMSE and MAE serve as crucial evaluation metrics for assessing
model prediction accuracy. Smaller values of RMSE and MAE indicate a higher level
of prediction accuracy in the model, while larger values signify reduced prediction
accuracy.
The coefficient of determination R2 falls within the range of 0 to 1. When R2
approximates 1, it signifies that the prediction model effectively explains the vari-
ability observed in the dependent variable. Conversely, when R2 approaches 0, it
suggests that the predictive model exhibits weak explanatory power.
(4) BIC_tp
n
BIC_tp = −2 log(L) + log( )v (5.29)
2π
where L is likelihood function.
To visually demonstrate the improvement achieved by the hybrid model for each
evaluation metric, this study introduces the concept of the Improvement Ratio (I). The
Improvement Ratio quantifies the degree of enhancement in each accuracy evaluation
metric. By calculating the I value, one can precisely assess the extent to which the
hybrid model has improved accuracy. The formula for calculating the Improvement
Ratio is expressed as follows:
y − ŷ
Iyŷ = (5.30)
y
where y is the accuracy for the initial model’s predictions, whereas ŷ stands for the
accuracy for the predictions generated by the hybrid model.
The magnitude of Iyŷ indicates the degree of improvement observed in the evalua-
tion metric achieved by the hybrid model. In other words, a larger value of Iyŷ implies a
larger enhancement in the evaluation metric, while a smaller value suggests a smaller
degree of improvement.
In this chapter, we have utilized daily time series data consisting of three direc-
tions (i.e. position coordinates), obtained from 8 GNSS stations affiliated with the
Enhanced Solid Earth Science (ESDR) System. The daily solutions were computed
using GAMIT and GIPSY with loose constraints, and subsequently, we employed
5 Deep Learning Based GNSS Time Series Prediction in Presence … 111
(1) Temporal Consistency: The chosen station coordinate time series were required
to encompass data spanning from the year 2000 to 2022. The selection of the
same long-term time series data was essential to maintain experiment consis-
tency and facilitate the optimal noise model to calculate reliable speed parameter
estimates.
(2) Limited Missing Data: Within the time frame spanning from 2000 to 2022, the
selected station data were expected to exhibit an average missing data rate not
exceeding 5%. This stipulation was put in place to ensure the reliability of the
predictive results by minimizing data gaps.
(3) Spatial Evenness: In order to mitigate the influence of inter-regional correla-
tions on the velocity parameters and noise modeling, the selected sites were
deliberately distributed in an evenly spread manner.
For data preprocessing, the Hector software was used to identify and eliminate
outliers through detecting any step discontinuities present in the raw data. When
step discontinuities were identified, a correction process was applied using the
least squares fitting method. Subsequently, the rectified data underwent interpola-
tion, which was achieved using the Regularized Expectation Maximization (RegEM)
algorithm [24].
The RegEM algorithm combines the Expectation Maximization (EM) algorithm
with regularization techniques. This combination allows for the simultaneous opti-
mization of the likelihood function while considering the model’s smoothness and
facilitating noise reduction. As a result, the algorithm effectively addresses the
challenge of interpolating missing data points.
It is important to note that, due to space limitations, only a comparison of the
interpolation results for the “GOBS” station, which exhibited the highest missing
rate in the E (East), N (North), and U (Up) components, is presented in Fig. 5.4.
As illustrated in the figure, the RegEM method generated favorable interpolation
results, and the obtained interpolation followed a rough trend, especially for data
points with missing regions. Impressively, it effectively preserves the underlying
sequence trend even when faced with a significant amount of continuous missing data.
This accomplishment highlights the RegEM method’s ability to overcome the limita-
tions associated with traditional linear interpolation, particularly in areas with contin-
uous data gaps. Additionally, the RegEM method provides high-quality continuous
time series data, which is crucial for the success of subsequent experiments.
5 Deep Learning Based GNSS Time Series Prediction in Presence … 113
in the highest SNR for each time series is selected as the optimal K value [25]. The
SNR is calculated as follows:
∑
N
f 2 (i)
i=1
SNR = 10 lg (5.31)
∑
N
[f (i) − g(i)]2
i=1
where f (i) is the original signal, g(i) is the reconstructed signal, and N is the length
of the time series.
The choice of the penalty factor, denoted as α, also exerts a certain influence on the
outcomes of the VMD data decomposition process. Considering the empirical guide-
line that suggests selecting a penalty factor approximately 1.5 times the magnitude
of the decomposed data is optimal, this study maintains experimental consistency by
setting a penalty factor of 10,000 for all the decomposition procedures.
The outcomes of the K value selection for the three directions at each site are
tabulated in Table 5.2, providing valuable insights into the decomposition process.
To ensure fairness and consistency in the experiments, all deep learning models
employed in this study adhered to a uniform dataset division scheme. The dataset
was segregated into three distinct subsets: the training set (Jan 2000–Sept 2011), the
validation set (Jan 2012–Sept 2014), and the test set (Jan 2015–Sept 2022). Each
subset had a specific role in the modeling process:
5 Deep Learning Based GNSS Time Series Prediction in Presence … 115
Training Set: This set was dedicated to training the model parameters and enabling
the model to learn the underlying data features.
Validation Set: It served as fine-tuning the model’s hyperparameters and
conducting an intermediate evaluation of the model performance.
Test Set: This set played a crucial role in the final assessment of the model’s
performance, serving as the basis for evaluating its effectiveness in practical appli-
cations. Additionally, by obtaining substantial prediction results on the test set, it
becomes possible to evaluate the optimal noise model for prediction accuracy.
The primary aim of this dataset partitioning scheme was to ensure that the model
had access to an adequate amount of training data, enabling it to effectively capture
and comprehend the data’s distinctive features. To visually illustrate the differences
in prediction outcomes between the DVMD-LSTM model and the VMD-LSTM
model, this study conducts a comparative analysis of the prediction results for the
decomposed IMFs and residual terms generated by the two hybrid models. Due to
space constraints, this chapter only presents the prediction results of the IMFs and
residual terms in the U (Up) direction for the SEDR station, as shown in Fig. 5.5.
It can be seen from Fig. 5.5 that the LSTM models excel in delivering commend-
able prediction results for each IMF component. However, a noteworthy distinc-
tion emerges when addressing the residual terms. The VMD-LSTM model faces
challenges in effectively capturing the fluctuation characteristics within the residual
terms, which lack apparent regularity. Consequently, this difficulty in modeling the
residual terms leads to lower prediction accuracy, ultimately impacting the overall
performance of the VMD-LSTM model.
To address this issue, the proposed DVMD-LSTM model conducts a secondary
VMD decomposition on the residual terms obtained after the initial VMD decom-
position. This additional decomposition extracts further fluctuation information
within the residual terms, resulting in a substantial improvement in prediction accu-
racy. Compared to the VMD-LSTM model, the overall prediction results of the
DVMD-LSTM model show a 17.30% increase in RMSE and a 17.65% increase in
MAE.
To assess the potential benefits of conducting multiple VMD decompositions, an
analysis is carried out on the residual terms following the second decomposition.
However, it is observed that these terms lack conspicuous fluctuation characteristics.
Consequently, incorporating these results into the model for prediction fails to yield
significant improvements, and, in some cases, prediction accuracy even decreases.
This suggests that increasing the number of decompositions on the residual terms
may not necessarily enhance the model’s prediction accuracy.
As a result, in this study, the data after the secondary VMD decomposition is used
as the feature input for the subsequent deep learning experiments, as it has proven
to be an effective representation for achieving high prediction accuracy.
116 H. Chen et al.
Fig. 5.5 Prediction results of each IMF and residual under different models after VMD decom-
position in U direction of SEDR station (The blue curve represents the original data as well as
the IMF components and residual terms obtained from VMD decomposition. The orange curve
represents the same prediction results of IMF components by DFVMD-LSTM and VMD-LSTM
models, the green curve represents the prediction results of residual terms by VMD-LSTM model,
and the black curve represents the prediction results of residual terms by DVMD-LSTM model)
Fig. 5.6 Comparison of predictions and prediction errors of position coordinates at SEDR station
under three different models (sub-figures a, b, c show the coordinate prediction results, and sub-
figures d, e, f show prediction errors)
As the fluctuation amplitude of the original data increases, the prediction errors
of the various models also exhibit varying degrees of escalation. The U (Up) direc-
tion consistently presents the largest errors among the three directions. Compared
to the baseline LSTM model, the VMD-LSTM model excels in capturing the fluc-
tuation trends and amplitudes of the data. It also demonstrates smaller variations
and extremes in the prediction error. This suggests that after undergoing VMD
decomposition, the VMD-LSTM model efficiently captures the inherent fluctuation
characteristics of the original data, leading to more accurate predictions.
Both the VMD-LSTM and DVMD-LSTM models display similar patterns in
prediction fluctuations and trends. However, the DVMD-LSTM model exhibits
notably smaller prediction errors, indicating that it not only retains the advantages
of the VMD-LSTM model in forecasting fluctuation trends and amplitudes but also
achieves a higher level of prediction accuracy.
118 H. Chen et al.
Table 5.3 Comparison of the prediction results of each GNSS station in the three directions of E,
N, and U under different models (The units of RMSE and MAE in the table are mm)
(a) Comparison of the prediction results of each GNSS station in the direction of E under
different models
Model ALBH BURN CEDA FOOT GOBS RHCL SEDR SMEL
LSTM RMSE 0.89 1.40 1.73 0.58 1.00 1.62 0.68 0.57
MAE 0.65 1.10 1.35 0.44 0.70 1.28 0.53 0.44
R2 0.65 0.51 0.70 0.13 0.86 0.61 0.66 0.40
VMD-LSTM RMSE 0.76 1.16 1.37 0.51 0.86 1.07 0.58 0.40
I/% 13.91 17.00 20.75 12.91 13.74 34.08 15.00 30.80
MAE 0.55 0.92 1.06 0.38 0.58 0.83 0.45 0.30
I/% 14.03 16.70 21.18 13.51 16.08 34.78 15.13 31.08
R2 0.74 0.66 0.81 0.34 0.90 0.83 0.76 0.71
I/% 13.75 30.37 16.00 157.60 4.10 35.51 14.23 77.69
DVMD-LSTM RMSE 0.67 1.02 1.21 0.45 0.77 0.94 0.50 0.34
I/% 24.56 27.00 29.82 22.12 23.53 41.63 27.07 40.11
MAE 0.49 0.82 0.94 0.34 0.52 0.74 0.39 0.26
I/% 24.31 25.78 30.32 22.27 24.50 41.91 26.76 39.98
R2 0.80 0.74 0.85 0.47 0.92 0.87 0.82 0.79
I/% 22.89 45.61 21.83 256.70 6.66 41.40 24.00 95.60
(b) Comparison of the prediction results of each GNSS station in the direction of N under
different models
Model ALBH BURN CEDA FOOT GOBS RHCL SEDR SMEL
LSTM RMSE 0.73 1.39 1.38 0.59 0.86 3.14 0.85 0.55
MAE 0.57 1.11 1.1 0.43 0.63 2.54 0.63 0.42
R2 0.62 0.55 0.46 0.48 0.78 0.46 0.44 0.45
VMD-LSTM RMSE 0.55 1.07 1.05 0.39 0.63 1.71 0.66 0.47
I/% 24.53 22.74 23.54 33.45 26.95 45.59 22.23 15.62
MAE 0.43 0.85 0.83 0.29 0.46 1.31 0.5 0.35
I/% 24.23 23.37 24.05 31.81 26.6 48.53 21.79 16.54
R2 0.78 0.73 0.68 0.77 0.88 0.84 0.66 0.61
I/% 26.18 32.59 48.72 59.65 13.33 81.39 50.49 35.42
DVMD-LSTM RMSE 0.49 0.95 0.9 0.34 0.56 1.58 0.56 0.41
I/% 32.77 31.65 34.5 41.35 34.86 49.55 34.15 26.53
MAE 0.38 0.76 0.72 0.26 0.41 1.21 0.42 0.3
I/% 32.53 32.13 34.33 39.95 34.1 52.28 33.1 26.91
R2 0.83 0.79 0.77 0.82 0.91 0.86 0.76 0.7
I/% 33.33 43.08 66.97 70.25 16.46 86.19 72.34 56.6
(continued)
120 H. Chen et al.
after VMD decomposition in time series with pronounced fluctuations, which contain
more distinctive fluctuation characteristics.
In summary, the DVMD-LSTM model preserves the advantages of the VMD-
LSTM model in predicting fluctuation trends and frequencies while attaining higher
prediction accuracy. The results across different directional components and stations
substantiate the model’s applicability and robustness, affirming its potential for
extensive utilization in the domain of high-precision time series forecasting.
To further assess whether the DVMD-LSTM model effectively accounts for the noise
characteristics in various datasets during the prediction process, it is essential to
consider the prevailing beliefs among domestic and international scholars regarding
optimal noise models for GPS coordinate time series. Currently, two primary models
are widely considered for describing the noise characteristics of GPS coordinate time
series: White noise + Flicker Noise (FN + WN) and a minor amount of random walk
noise + flicker noise (RW + FN). Furthermore, some scholars have proposed that
5 Deep Learning Based GNSS Time Series Prediction in Presence … 121
the noise in GPS coordinate time series can be considered as power law noise (PL)
and the Gaussian Markov model (GGM).
In this study, we focused on GNSS reference stations in North America with the
same time span. Four combined noise models were considered for analysis, namely
random walk noise + flicker noise + white noise (RW + FN + WN), flicker noise +
white noise (FN + WN), power law noise + white noise (PL + WN), and Gaussian
Markov + white noise (GGM + WN). The training and test data of each station were
examined, and ultimately, eight stations sharing the same optimal noise model were
selected for experimentation. The optimal noise model for each prediction model,
concerning the prediction results for each station, was then determined. The specific
outcomes are presented in Table 5.4.
Table 5.4 The optimal noise model of each station under different models in the three directions
of E, N, and U
Site ENU Optimal noise model
TURE LSTM VMD-LSTM DVMD-LSTM
ALBH E RW + FN + WN PL + WN RW + FN + WN RW + FN + WN
BURN RW + FN + WN PL + WN PL + WN RW + FN + WN
CEDA RW + FN + WN PL + WN PL + WN RW + FN + WN
FOOT PL + WN GGM + WN FN + WN PL + WN
GOBS RW + FN + WN PL + WN RW + FN + WN RW + FN + WN
RHCL RW + FN + WN GGM + WN PL + WN RW + FN + WN
SEDR RW + FN + WN PL + WN PL + WN RW + FN + WN
SMEL FN + WN PL + WN FN + WN FN + WN
ALBH N RW + FN + WN PL + WN RW + FN + WN RW + FN + WN
BURN FN + WN PL + WN PL + WN PL + WN
CEDA RW + FN + WN PL + WN PL + WN RW + FN + WN
FOOT FN + WN GGM + WN FN + WN FN + WN
GOBS RW + FN + WN PL + WN RW + FN + WN RW + FN + WN
RHCL RW + FN + WN RW + FN + WN PL + WN PL + WN
SEDR FN + WN GGM + WN RW + FN + WN FN + WN
SMEL FN + WN PL + WN FN + WN FN + WN
ALBH U PL + WN PL + WN RW + FN + WN FN + WN
BURN PL + WN GGM + WN PL + WN PL + WN
CEDA PL + WN PL + WN RW + FN + WN PL + WN
FOOT PL + WN PL + WN FN + WN FN + WN
GOBS PL + WN GGM + WN PL + WN FN + WN
RHCL FN + WN PL + WN RW + FN + WN FN + WN
SEDR PL + WN PL + WN PL + WN PL + WN
SMEL PL + WN PL + WN FN + WN PL + WN
122 H. Chen et al.
From Table 5.4, it becomes evident that different stations exhibit different optimal
noise models, indicating the presence of inconsistent noise characteristics in the
data. The LSTM model shows significant disparities between its prediction results
and the optimal noise models associated with the original data, with an average
accuracy of only 25% across all three directions. Additionally, it is noteworthy that
the predominant optimal noise models tend to be PLWN and GGMWN. This suggests
that the LSTM model does not adequately consider the inherent noise characteristics
of GNSS time series during the prediction process.
On the contrary, the VMD-LSTM model demonstrates improved accuracy in
capturing the optimal noise models, achieving an average accuracy of 42.67%.
This indicates that VMD decomposition effectively captures the noise character-
istics within the IMF components, although the noise characteristics in the residual
component are not fully addressed, resulting in relatively lower overall accuracy.
The proposed DVMD-LSTM model further enhances the consideration of noise
characteristics within the residual component by applying VMD decomposition once
again. As a result, the DVMD-LSTM model achieves an impressive average accuracy
of 79.17% in capturing the optimal noise models. In summary, the DVMD-LSTM
model effectively takes into account the noise characteristics of the data during the
prediction process by processing both the original data and the decomposed residual
component.
To evaluate the predictive quality of each deep learning model, this study initially
employs these models to predict the original data. Subsequently, the optimal noise
model and corresponding velocities are computed for the prediction results of each
model. These velocities are then compared with the velocities calculated using the
optimal noise model of the original data with the assistance of the Hector software.
By calculating the absolute error between the prediction results of each model and
the original velocities at various measurement stations, we can obtain the average
absolute error between the velocities computed from each deep learning model’s
prediction results and the velocities derived from the original data. This process
allows us to assess the accuracy of the model’s prediction. The velocities computed
from the prediction results of each deep learning model under the optimal noise
model at different measurement stations are presented in Table 5.5.
It can be seen from Table 5.5 that average absolute error of the LSTM model in
velocity prediction varies across the three spatial directions. In the E direction, this
error is 0.068 mm/year, while in the N direction, it increases to 0.093 mm/year. In the
U direction, the error is 0.078 mm/year. On the other hand, the VMD-LSTM model
demonstrates a notable improvement in accuracy, with an average absolute error of
0.031 mm/year in the E direction, 0.060 mm/year in the N direction, and 0.060 mm/
year in the U direction.
Meanwhile, the DVMD-LSTM model outperforms both the LSTM and VMD-
LSTM models, showcasing its remarkable predictive accuracy. Specifically, in the
5 Deep Learning Based GNSS Time Series Prediction in Presence … 123
Table 5.5 Velocity values obtained by each station under the optimal noise model
Site ENU Trend (mm/year)
TURE LSTM VMD-LSTM DVMD-LSTM
ALBH E −0.041 0.020 0.055 −0.044
BURN −0.108 −0.005 −0.051 −0.116
CEDA −0.726 −0.528 −0.693 −0.736
FOOT 0.02 0.015 0.001 0.009
GOBS 0.659 0.656 0.672 0.682
RHCL 0.811 0.666 0.805 0.783
SEDR 0.354 0.341 0.378 0.313
SMEL 0.026 0.009 0.023 0.021
ALBH N 0.327 0.245 0.276 0.295
BURN 0.124 0.080 0.116 0.130
CEDA −0.065 −0.041 −0.227 −0.042
FOOT 0.009 0.029 −0.036 0.005
GOBS 0.063 0.078 0.029 −0.020
RHCL 1.253 0.743 1.132 1.071
SEDR 0.199 0.170 0.212 0.195
SMEL 0.020 −0.001 −0.025 0.017
ALBH U 0.383 0.204 0.131 0.268
BURN 0.241 0.144 0.238 0.216
CEDA 0.016 0.159 0.074 0.137
FOOT 0.194 0.125 0.194 0.202
GOBS 0.301 0.278 0.283 0.262
RHCL 0.298 0.206 0.367 0.264
SEDR 0.017 0.022 0.082 0.04
SMEL 0.195 0.182 0.206 0.183
LSTM model. However, the DVMD-LSTM model stands out with its remarkable
enhancements, reaffirming its exceptional predictive capabilities.
5.5 Conclusion
References
1. Ohta Y, Kobayashi T, Tsushima H et al (2012) Quasi real-time fault model estimation for near-
field tsunami forecasting based on RTK-GPS analysis: application to the 2011 Tohoku-Oki
earthquake (Mw 9.0). J J Geophys Res Solid Earth 117(B2)
2. Cina A, Piras M (2015) Performance of low-cost GNSS receiver for landslides monitoring:
test and results. J Geomat Nat Haz Risk 6(5–7):497–514
3. Meng X, Roberts GW, Dodson AH et al (2004) Impact of GPS satellite and pseudolite geometry
on structural deformation monitoring: analytical and empirical studies. J J Geodesy 77:809–822
4. Altamimi Z, Rebischung P, Métivier L et al (2016) ITRF2014: a new release of the international
terrestrial reference frame modeling nonlinear station motions. J Geophys Res: Solid Earth
121(8):6109–6131
5. Blewitt G, Lavallée D (2002) Effect of annual signals on geodetic velocity. J Geophys Res
Solid Earth 107(B7):ETG 9-1-ETG 9–11
6. Chen JH (2011) Petascale direct numerical simulation of turbulent combustion—fundamental
insights towards predictive models. J P Combust Inst 33(1):99–123
7. Klos A, Olivares G, Teferle FN et al (2018) On the combined effect of periodic signals and
colored noise on velocity uncertainties. J GPS Solut 22:1–13
8. Li Y (2022) Research and application of deep learning in image recognition. In: 2022 IEEE 2nd
international conference on power, electronics and computer applications (ICPECA). IEEE,
pp 994–999
9. Masini RP, Medeiros MC, Mendes EF (2023) Machine learning advances for time series
forecasting. J Econ Surv 37(1):76–111
10. Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent
is difficult. J IEEE T Neural Networ 5(2):157–166
11. Kim HU, Bae TS (2019) Deep learning-based GNSS network-based real-time kinematic
improvement for autonomous ground vehicle navigation. J Sens
12. Tao Y, Liu C, Chen T et al (2021) Real-time multipath mitigation in multi-GNSS short baseline
positioning via CNN-LSTM method. J Math Probl Eng 2021:1–12
13. Xie P, Zhou A, Chai B (2019) The application of long short-term memory (LSTM) method on
displacement prediction of multifactor-induced landslides. J IEEE Access 7:54305–54311
14. Zhao L, Li Z, Qu L et al (2023) A hybrid VMD-LSTM/GRU model to predict non-stationary
and irregular waves on the east coast of China. J Ocean Eng 276:114136
15. Huang Y, Yan L, Cheng Y et al (2022) Coal thickness prediction method based on VMD and
LSTM. J Electron 11(2):232
16. Zhang T, Fu C (2022) Application of improved VMD-LSTM model in sports artificial
intelligence. J Comput Intel Neurosc
17. Han L, Zhang R, Wang X et al (2019) Multi-step wind power forecast based on VMD-LSTM.
J IET Renew Power Gen 13(10):1690–1700
18. Xing Y, Yue J, Chen C et al (2019) Dynamic displacement forecasting of dashuitian landslide
in China using variational mode decomposition and stack long short-term memory network. J
Applied Sci 15:2951
19. Dragomiretskiy K, Zosso D (2013) Variational mode decomposition. IEEE Trans Signal
Process 62(3):531–544
126 H. Chen et al.
20. Carson JR (1992) Notes on the theory of modulation. Proc Inst Radio Eng 10(1):57–64
21. Malhotra P, Vig L, Shroff G et al (2015) Long short term memory networks for anomaly
detection in time series. Esann 2015:89
22. Jin Y, Guo H, Wang J et al (2020) A hybrid system based on LSTM for short-term power load
forecasting. J Energies 13(23):6241
23. He X, Bos MS, Montillet JP et al (2019) Investigation of the noise properties at low frequencies
in long GNSS time series. J Geodesy 93(9):1271–1282
24. Tingley MP, Huybers P (2010) A Bayesian algorithm for reconstructing climate anomalies in
space and time. Part II: comparison with the regularized expectation–maximization algorithm.
J Climate 23(10):2782–2800
25. Mei L, Li S, Zhang C et al (2021) Adaptive signal enhancement based on improved VMD-SVD
for leak location in water-supply pipeline. J IEEE Sens J 21(21):24601–24612
Chapter 6
Autonomous UAV Outdoors
Navigation—A Machine-Learning
Perspective
Abstract Unmanned Aerial Vehicles (UAVs) are increasingly gaining traction due
to their potential and major use case applications. The UAV is typically required to
navigate autonomously in highly dynamic environments to deliver on its intended
applications. Existing UAV positioning and navigation solutions face several chal-
lenges, particularly in dense outdoor settings. To this end, we present various tech-
nological approaches for autonomous UAV navigation outdoors. This chapter aims
to provide an efficient real-time autonomous solution that enables the UAV to navi-
gate through a dynamic urban or suburban environment. Particularly, we evaluate
the performance of Machine Learning (ML)-based techniques in UAV navigation
solutions. The computational complexity involved in standard optimization-based
methods hinders its utilization for UAV navigation in dynamic environments. The
use of ML-based approaches can potentially enable near-optimal UAV navigation,
while providing a practical real-time calculation that is needed in such dynamic appli-
cations. We provide a comprehensive detailed analysis to evaluate the performance
of each of the presented ML-based UAV navigation methods as compared to other
existing navigation approaches that we also discussed in this chapter.
6.1 Introduction
UAVs are increasingly becoming integrated in many applications both in the civilian
and military domains. The use cases of UAVs include goods delivery, surveillance
and reconnaissance missions, traffic monitoring, combat missions, search and rescue
missions, etc. In addition, their role in communication systems is proliferating at
an accelerating pace [1]. It is expected that UAVs will assume a critical role in
non-terrestrial network (NTN) arrangements in 6G and beyond.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 127
K. Yu (ed.), Positioning and Navigation Using Machine Learning Methods, Navigation:
Science and Technology 14, https://doi.org/10.1007/978-981-97-6199-9_6
128 G. Afifi and Y. Gadallah
Several types of UAVs exist with different aerodynamic and robotic features.
The four main types of UAVs include the single rotor, multirotor, fixed wing and
hybrid vertical take-off and landing UAVs. The most widely used type of UAVs
is the Multirotor type. This type can be further classified into different categories
depending on the number of rotors on the UAV e.g., tri-rotor, quadrotor, etc.
The simple kinematic model of a UAV is given by
where (x(t), y(t)) corresponds to the initial 2D position of the UAV at time t, while v
and θ represent the velocity and heading angle of the UAV [2].
To characterize the motion of a quadrotor UAV, we use two coordinate systems,
namely, the space coordinate system and the body coordinate system [3]. The used
body coordinate system’s elements are the pitch, roll and yaw angles which describe
the angle of rotation of the UAV around the x, y and z, respectively. The rotational
axis model is therefore given by
⎡ ⎤
cos∅ sin∅ 0
Rx (∅) = ⎣ −sin∅ cos∅ 0 ⎦, (6.2a)
0 0 1
⎡ ⎤
1 0 0
Ry (θ ) = ⎣ 0 cosθ sinθ ⎦, (6.2b)
0 −sinθ cosθ
⎡ ⎤
cosψ 0 −sinψ
Rz (ψ) = ⎣ 0 1 0 ⎦, (6.2c)
sinψ 0 cosψ
where ∅, θ and ψ correspond to the pitch, roll, and yaw angles, respectively.
The human-controlled UAVs are usually piloted remotely by an operator without
the need of having a pilot onboard. Autonomous UAVs, on the other hand, are self-
operated with no human intervention at all. This autonomy is achieved via an onboard
autopilot system, computing systems, and a set of sensing systems in addition to the
other onboard devices that are required for the mission at hand. It is therefore required
to devise the systems, both hardware and software, that enable the autonomous UAV
to conduct the aviation duties such as the self-localization and navigation, in addition
to the other duties that constitute the mission for which the UAV was launched. It can
be safely stated that the failure to properly localize and navigate the UAV accurately
would result in compromising the mission or application for which it was launched.
Therefore, ensuring the robustness and accuracy of the localization and navigation
techniques used by autonomous UAVs is central to the success of the entire mission
that they support.
6 Autonomous UAV Outdoors Navigation—A Machine-Learning … 129
Since there is a need for the autonomous UAV to self-localize and self-navigate,
this implies the need to communicate with external resources to help in the local-
ization and navigation process. For this purpose, there are many possibilities for
this communication. One of the most widely used approaches is the one relying
on the Global Navigation Satellite System (GNSS) in conjunction with the Inertial
Measurement Unit (IMU) that is installed onboard the UAV. The GNSS provides
autonomous geospatial localization services that utilize satellites.
The use of GNSS systems requires direct line-of-site (LoS) communication with
the supporting satellites. Therefore, the use of such a system would only be possible
in outdoor environments. Another candidate communication solution to use for
the UAV localization and navigation is Wi-Fi. It can provide a proper solution in
closed (indoor) areas and open areas with limited space such as malls and univer-
sity campuses. The cellular communication system also presents a strong contender
for such tasks. The wide prevalence of such systems in urban areas and the general
robustness of its operations nowadays enable it to be a reliable alternative in outdoor
and indoor environments, depending on its level of coverage in a given area where
the UAV is required to operate. The localization and navigation techniques that can
be used with autonomous UAVs generally need to have the following characteristics:
• Accuracy
• Real-time operation
• Efficiency
• High reliability
• Ability to respond to obstacles and threats.
There are many methodologies that have been followed in the literature and practi-
cally to devise the navigation techniques for autonomous UAVs, as will be discussed
throughout this chapter. One of the most promising methodologies that are currently
widely explored for this purpose is the one that depends on machine-learning (ML)
techniques. ML techniques have the potential to provide real-time calculation with a
close-to-optimal solution. They operate in many different configurations, depending
on the type of used technique. In this chapter, we discuss the different aspects that
relate to the development of navigation techniques for autonomous UAVs. We detail
the challenges of the UAV navigation solutions and present some proposed solutions
to address the existing limitations.
Various wireless technologies are currently utilized for enabling UAV navigation
applications. The main enabling requirement for the UAV to deliver on its intended
missions is its capability to determine its location at any point in time [4]. The UAV
navigation solutions face many challenges in outdoor environments. The navigation
techniques require a high level of accuracy to function correctly. Furthermore, the
130 G. Afifi and Y. Gadallah
The UAV must be able to accurately determine its position with high accuracy. We
present in the following a brief overview of some of the main wireless enabling
technologies commonly used for UAV localization and navigation applications.
where s corresponds to the speed of the GNSS signal (speed of light) and [tr (T2 ) −
ts (T1 )] represents the signal propagation time to reach the receiver. The pseudo range
estimated by the receiver incorporates the geometric distance di between the satellite
and the receiver as well as synchronization clock errors and other error terms due to
6 Autonomous UAV Outdoors Navigation—A Machine-Learning … 131
the signal propagation through the atmosphere. More specifically, the pseudo range
equation can be rewritten as such
where Δ tr and Δ ts correspond to the receiver and satellite clock offsets respectively,
T is the tropospheric delay, and αf STEC corresponds to a frequency dependent
ionospheric delay term. Kp,r and Kps represent receiver and satellite instrumental
delay terms respectively, Mp corresponds to effect of multipath, and εp represents
the receiver noise term. The signal propagation time can also be more accurately
estimated through carrier phase measurements. Once the receiver determines the
locations of 4 satellites along with its distance to each of them, it uses the triangulation
principle [5, 6] to calculate its position on earth as shown in Fig. 6.2.
The GNSS can provide submeter localization accuracy in open outdoor environ-
ments. However, the GNSS does not perform well in dense urban and suburban
environments due to signal blockage and reflections from high rise structures. As
a result of this blockage, there may not be sufficient satellites to estimate the posi-
tion of the UAV. Moreover, strong multipath signals would significantly degrade the
positioning performance.
The global positioning system receivers suffer positioning errors due to some of
the following potential sources of error:
132 G. Afifi and Y. Gadallah
This method relies on the use of camera images and videos for location determination.
It determines poses from the camera images relative to a coordinate system of the
surrounding environment, which may or may not be known in advance. Several
image feature options are possible to use in the localization task, namely, points,
lines, circles, etc. Of these feature options, points are the most widely used. For
known environments, the determination of the camera pose location using a cloud
of points is known as the problem of perspective-n-point (PnP) [7]. If n ≥ 6 then
the problem is linear. If n = 3, 4 or 5, then we have a non-linear problem. If n = 2,
then the problem has no solution. When n = 3, the problem has at most 4 solutions.
6 Autonomous UAV Outdoors Navigation—A Machine-Learning … 133
One of the important UAV trajectory planning solutions relies on the existing cellular
infrastructure to navigate the UAV. Some of such solutions associate the UAV with the
cellular network. Specifically, the cellular network is responsible for navigating the
UAV from a source to a destination. The trajectory planning algorithms in this case
are executed at the cellular base station side. Other UAV solutions rely on cellular
134 G. Afifi and Y. Gadallah
signals to navigate the UAV without actually interacting with the cellular network.
Particularly, in such solutions, the UAV detects the periodic broadcast signals trans-
mitted by the existing cellular infrastructure. For illustration purposes, we present the
periodic broadcast signals for the 5G technology as an example shown in Table 6.1.
Specifically, the 5G cellular stations broadcast the Synchronization Signal Block
(SSB) periodically for synchronization purposes and to enable communication with
the mobile users. The UAV can detect and utilize such broadcast signals for localiza-
tion and navigation applications. The mission objectives determine the requirements
of the UAV navigation solution.
The navigation solution aims to optimize a specific trajectory planning cost metric.
The formulation of the trajectory planning cost metric depends on the application
requirements. Many trajectory planning solutions aim to determine the shortest path
or minimize the mission duration time with collision avoidance [10–12]. Other UAV
navigation solutions formulate the navigation metric as a composite joint objective
cost metric incorporating multiple objectives as follows
∑
J (n|πsd ) = wi Ji (n), (6.5)
i
network and involve the transmission of detectable signals which might comprise
the success of certain missions.
The imaging-based localization and navigation solutions, on the other hand,
require substantial training and knowledge of the environmental terrain. Furthermore,
vision-based solutions cannot be used in unknown dynamic environments.
Cellular networks are widely available worldwide providing an attractive alterna-
tive to GNSS, and imaging-based solutions. Furthermore, cellular networks typically
have geometrically favorable configurations suitable for UAV navigation solutions
in outdoor urban environments. As such, we focus our attention in the remainder of
this chapter on cellular-based UAV navigation solutions.
To solve the UAV navigation problem, optimization-based techniques including
Graph Search Algorithms, Ant Colony Optimization (ACO) and Genetic Algorithms,
are commonly used. However, such optimization-based methods are typically itera-
tive in nature and involve a high computational complexity. For example, the compu-
tational complexity of the exact optimization-based methods, e.g., Exhaustive Search
(ES), is in the order of O(nA) where n represents the variable problem size corre-
sponding to the number of steps to reach the destination and A represents the 3-D
flight area of the UAV. Similarly, the complexity of the heuristic optimization-based
methods is in the order of O(nt) where t represents the number of iterations needed for
the algorithm to converge to the optimum solution. As such, iterative optimization-
based techniques do not satisfy the real-time calculation objective governed by the
dynamics of the environment and are impractical for use given the limited computa-
tional power onboard the UAV. Machine Learning (ML)-based methods can alterna-
tively be utilized to solve the UAV navigation problem with near optimal accuracy.
ML-based methods provide an attractive alternative to optimization-based techniques
given their potential to satisfy the real-time calculation requirement.
The UAV is typically required to navigate through a complex and dynamic environ-
ment. The UAV should have the ability to determine a possible path from a starting
point to a destination by optimizing a given trajectory planning cost metric. For
this purpose, the UAV navigation solution should consider the environmental condi-
tions and the UAV dynamic constraints. This necessitates the development of a UAV
136 G. Afifi and Y. Gadallah
navigation solution that is suitable for dynamic outdoor urban and suburban environ-
ments. The UAV positioning solutions should provide the ability to localize the UAV
up to a decimeter 3D accuracy for security and control reasons. While GNSS and
Imaging-based solutions can be utilized to this end, they face several challenges in
urban environments. As detailed in Sect. 6.2, cellular signals provide a favorable alter-
native for UAV navigation solutions in such environments. Accordingly, we utilize
cellular signals to navigate the UAV along its route given the practicality and geomet-
rically convenient configurations of such signals in outdoor environments. The UAV
navigation solution is responsible for determining the optimal path to navigate to
the destination. Moreover, the UAV navigation solution needs to consider the limi-
tations of the UAV itself. The UAV has limited power and computational capability
hence the need for an efficient navigation technique. Typically, the UAV is required
to find the shortest path to reach its destination. For cellular-based UAV navigation,
the UAV is also required to maintain connectivity to the cellular network along its
path. In addition, it is imperative that the UAV determines a collision free path for
successful mission completion. The UAV navigation solution needs to be capable of
detecting and reacting to dynamic obstacles and threats in real time. Specifically, the
UAV navigation solution should aim to determine a trajectory from a given starting
point to a destination in such a way that optimizes a navigation cost metric while
observing the real-time calculation limits. The navigation cost metric incorporates
several objectives including, but not limited to, minimizing the path length, avoiding
collision with dynamic threats, and maintaining cellular connectivity.
We consider dense urban and suburban outdoor environments for our use case appli-
cation. This environment generally contains static and dynamic objects/obstacles that
will normally be faced by a flying UAV. It is expected that the UAV will be tasked
to conduct a given mission where it needs to fly from an initial point to a destination
and possibly return back. We assume the presence of an existing cellular infrastruc-
ture that can be accessed for use in UAV positioning and navigation. We assume the
cellular base stations (gNBs) are randomly placed 200–1000 m apart, as shown in
Fig. 6.4. While flying in this environment, the UAV is expected to be exposed to
different environmental conditions that would normally affect the communication
signals on which the UAV relies for its localization and navigation. The chosen path
loss model will depend on the technique being used and the operational conditions
of the UAV [14].
In case of UAVs flying at low altitude, we assume a probabilistic path loss model
given by
where ∅LOS and ∅NLOS represent the mean path losses in case of line-of-sight (LoS)
and non-line-of-sight (NLoS) situations, respectively. PLOS is the probability of LoS
situation with a given cellular base station, gNBi , whereas PNLOS is the probability
of NLOS situation. ∅LOS and ∅NLOS are calculated as
( )
4π Fdi
∅LOS = 20log + ηLOS (6.7a)
c
( )
4π Fdi
∅NLOS = 20log + ηNLOS , (6.7b)
c
where F is the cellular carrier frequency, c is the speed of light and di is the Euclidian
distance between the UAV and gNBi . ŋLoS and ŋNloS represent the mean additional
losses for the LoS and NLoS communications, respectively. PLOS and PNLOS are
dependent on the environmental parameters, a and b, and are given by
1
PLOS = ( ) , (6.8)
1+ aexp(−b( 180
π
tan−1 hi
ri
− a))
In case the UAV is flying at high altitude, we consider a Rician channel propagation
model to represent the path loss between the cellular base stations and the UAV given
by
minCπ (6.11)
π
subject to
π (0) = s, (6.11a)
π (T ) = d , (6.11b)
where s and d correspond to the starting point and the destination, respectively,
while t = T represents the mission completion time. The navigation technique aims
to optimize the formulated multi-objective optimization problem considering the
limitations of the UAV and the environment. The constraints (6.11a) and (6.11b)
allow the UAV to fly from a given starting point, s, to the destination, d , within time
T . Finally, constraint (6.11c) limits the UAV velocity within the allowable range.
For example, we incorporate a multi-objective UAV navigation approach to opti-
mize several objectives such as finding the shortest path, avoiding collisions with
dynamic threats, maintaining connectivity with the cellular network as well as estab-
lishing a different return path for the UAV from the destination back to the start
to conclude its mission. We therefore formulate the navigation cost metric Cπ as a
multi-objective metric composed of several constituents given by
∑
Cπ = wi Ci , (6.12)
i
6 Autonomous UAV Outdoors Navigation—A Machine-Learning … 139
where a(t) corresponds to the distance from the starting point to the current position
of the UAV while b(t) is a heuristic corresponding to the Euclidean distance to
the destination. We denote the collision avoidance objective by C2 . Particularly, C2
penalizes the cost of the trajectory when the UAV is close to a possible collision
according to
∑( 1, if |X − o| < dmin
C2 = , (6.14)
0, otherwise
o∈Ot
where o∈Ot corresponds to the set of obstacles detected within the vicinity of the UAV
at time t, while dmin represents the minimum distance to be maintained for collision
avoidance. We let C3 correspond to the objective of maintaining connectivity with
the cellular network. We aim to maintain a certain signal-to-noise ratio (SNR) level
with the nearest n base stations, denoted by gNBi ∀i = 1..n. Consequently, we let
C3 be given by
∑
n
γth − γi
C3 = , (6.15)
i=1
δi
where γi represents the instantaneous SNR measured by the UAV from gNBi while
γth represents the desirable SNR threshold to be maintained. We let δi correspond to
a normalization factor given by
(
n(γth − γi ), if (γth − γi ) > 0
δi = . (6.16)
∞, otherwise
return path requirement. For this purpose, we assign a binary value to penalize trajec-
tories that do not satisfy the different return path objective. Particularly, C4 is given
by
(
1 if X ∈ πsd
C4 = , (6.17)
0 otherwise
Our objective is to solve the formulated UAV navigation problem to determine the
optimal path the UAV should follow to reach its destination within practical real-time
boundaries. Opportunely, machine learning-based methods can potentially provide
a near-optimal solution with practical real-time calculation that is needed in such
dynamic applications.
ML-based navigation techniques can adopt an offline or online learning approach
depending on the UAV computational capability and the dynamics of the environ-
mental state. Particularly, the deep supervised learning and reinforcement learning-
based methods are commonly applied to solve the presented UAV navigation
problem. To this end, we present a deep supervised learning-based UAV naviga-
tion algorithm and a reinforcement learning-based algorithm to solve the formulated
UAV navigation problem in a real-time fashion with near optimal accuracy. Further-
more, we analyze the efficiency of each of the presented approaches under a set of
various operational conditions.
In case the deep supervised learning-based approach is utilized, a deep neural network
learns the environmental mapping of the suburban outdoor environment. Particularly,
the UAV navigation problem is formulated as a black-box mapping between the inputs
and the outputs. The inputs of deep neural network correspond to the state of the
environmental trajectory planning whereas the output of the algorithm corresponds
to the optimized trajectory the UAV should follow. This mapping is used to navigate
the UAV to its destination, as shown in Fig. 6.5.
This approach requires prior knowledge of environmental maps as well as training
data to learn the mapping process. The deep neural network architecture is composed
of n-hidden layers each comprised of h nodes per layer. The deep neural network
hyperparameter settings including the weights and dimensions, and the learning
parameters, e.g., the performance goal e, the batch size B, and the control parameter
MU, are updated during the offline training phase as summarized in Algorithm 6.1.
6 Autonomous UAV Outdoors Navigation—A Machine-Learning … 141
1: for epochs 1 to E
2: while (performance> ℯ) do
3: Sample random batch from the training data set.
4: Evaluate network performance for the test data set.
5: Implement back propagation training.
6: Update hidden layer weights and biases.
7: if (gradient< ) then
8: break // end Epoch
9: if ( ) then
10: break // end Epoch
11: end
12: end
13: end
14: end
142 G. Afifi and Y. Gadallah
Specifically, the Q-learning agent aims to maximize the Q-value of the observed
system state as follows
[ ]
Q(s, a) ← (1 − α)Q(s, a) + α r(s) + γ maxQ(s', a') , (6.18)
a'
where s is the system state including information pertaining to the position of the UAV
and the surrounding environment. The Q-learning agent aims to minimize the cost
of the trajectory. Hence, the reward, r, corresponds to the negative of the formulated
trajectory cost metric.
To enforce the UAV system and dynamic constraints, the penalty method may be
utilized. The penalty method is a method commonly used to reformulate a constrained
objective function to an unconstrained optimization problem to facilitate calculation.
As such, the penalty method can be utilized to enforce the constraints given by
6 Autonomous UAV Outdoors Navigation—A Machine-Learning … 143
where pi is a penalty assigned for each constraint violation. The action, a, represents
the trajectory that the UAV navigates to its destination while α is the learning rate of
the reinforcement learning agent. The discount factor, γ , is parameter for adjusting
the weight of anticipated future rewards in the learning process.
The Q-learning agent utilizes a deep Q-network (DQN) to predict the Q-value for
a given action based on the system state. As shown in Fig. 6.6, the input to the DQN
is composed of a state and action pair while the output is a scalar value proportional
to the expected system reward. The trajectory that maximizes the cumulative reward
is selected as the optimal trajectory.
Several field tests have been conducted to verify the performance of cellular networks
and machine learning-based methods in UAV positioning and navigation solutions.
For example, a field test was conducted to demonstrate that cellular signals can
improve the UAV’s position root-mean squared error and the maximum position error
by 30.69% and 58.86% respectively as compared to GNSS-based UAV localization
144 G. Afifi and Y. Gadallah
[11]. Another filed experiment was conducted to evaluate the effectiveness of cellular-
based UAV localization using deep learning. Specifically, cellular field data were
collected, and an augmentation process is proposed to train a deep neural network to
localize the UAV in an Urban environment [28]. The authors aim to enhance the accu-
racy of fingerprint map-based localization methods through generating synthetic data
that reflects the typical pattern of wireless localization information. The experimental
results of this study provided promising outcomes. Another filed test is conducted to
measure the performance of a UAV positioning and navigation solution in a 3-D setup
utilizing carrier phase measurements while assuming limited GNSS presence. The
authors proposed to leverage the relative stability of cellular base transceiver station
(BTS) clocks to enable precise navigation with cellular carrier phase measurements.
According to the conducted experimentations, this technique realizes a location esti-
mation Root Mean Square Error (RMSE) of 0.8 m using 7 CDMA BSs and 0.36 m
using measurements from 9 CDMA BSs [26]. Another experiment is conducted
to analyze the performance of mobile device localization using cellular networks.
The authors utilize multiple features and metrics of the LTE networks to generate a
fingerprint grid map to localize the mobile device. The authors utilize a one-to-many
augmenter to generate synthetic data to improve the performance of the proposed
localization solution. According to the field experiments, the proposed technique
achieves a localization accuracy of 13.7 m in an outdoor environment [25].
To validate the performance of the proposed ML approaches, we first find the optimal
bound of the solutions. Then we compare the performance of the ML solutions against
this optimal bound. In addition, we also present a GNSS-based technique from the
literature and compare the proposed ML based techniques’ performance against it.
The Optimal Bound
In order to assess the quality of the proposed ML solutions from the perspec-
tive of the optimality of the solution, we present a benchmark optimization-based
technique. The optimal solution of the UAV navigation problem formulated in
(6.11) can be found by utilizing classic optimization methods. However, exact
optimization-based methods cannot be used as the formulated UAV navigation
problem becomes intractable in high dimensional spaces [26]. Therefore, we resort
to heuristic optimization-based techniques to determine the optimal bound of the
solution. Particularly, we utilize the Ant Colony Optimization (ACO) method as
a heuristic optimization technique [27]. The ACO technique iteratively solves for
the optimal bound utilizing several ants which search the action space in parallel,
as summarized in Algorithm 6.3. The ant movements through the action space are
governed by a state transition probability which is proportional to the concentration
of the ant pheromone levels. The state transition probability is thus given by
6 Autonomous UAV Outdoors Navigation—A Machine-Learning … 145
β
φijα (t)μij (t)
Pij(k) (t) = ∑ β
, (6.20)
j∈allowedk φijα (t)μij (t)
where ρ < 1 is the global pheromone volatility coefficient and Δ φ ij (t) is the
pheromone increment amount.
where Ni,k (u) is the NURBS basis function while k represents the degree of the
curve. The weight of each waypoint comprising the NURBS curve is given by hi .
146 G. Afifi and Y. Gadallah
The technique then utilizes a Bayesian filtering algorithm to further improve the
localization accuracy of the incoming GPS signals.
Performance Tuning of the ML-Based Positioning Techniques
We present the effect of various technique-specific hyper parameter settings on
the performance of the presented ML-based UAV navigation techniques. For this
purpose, we conduct MATLAB experiments to simulate the navigation of the UAV
in a 500 × 500 × 300 m3 outdoor suburban environment.
(1) Performance of the deep supervised learning-based technique
We determine the most suitable technique-specific hyper parameter settings of
the presented deep supervised learning-based approach to deliver best results. As
shown in Fig. 6.7, we demonstrate the effect of various neural network architec-
tures on the performance of the deep learning-based technique [28]. Specifically,
we vary the number of nodes per layer and the number of layers of the deep neural
network. The performance of the presented deep supervised learning-based technique
improves with increasing the deep neural network dimensions. This is expected as the
neural network learns the environmental mapping more effectively with increasing
the number of layers. Furthermore, the performance improves with increasing the
number of nodes per layer allowing the network to accurately capture the non-linear
relationships between the inputs and the outputs. The results demonstrate that a deep
neural network architecture with 6 layers composed of 120 nodes per layer delivers
optimized results.
(2) Performance of the reinforcement learning-based technique
We now investigate the effect of the various technique specific hyper parameter
settings on the performance of the reinforcement learning-based UAV navigation
solution. We demonstrate the effect of varying the DQN architecture to determine
Fig. 6.7 Deep Learning-based results a Trajectory length vs. no. nodes per layer. b Trajectory
length vs. no. of hidden layers
6 Autonomous UAV Outdoors Navigation—A Machine-Learning … 147
Fig. 6.8 Reinforcement Learning-based results a Trajectory length vs. no. nodes per layer.
b Trajectory length vs. no. network layers
the suitable number of layers and nodes per layer needed to deliver the best results.
The results demonstrate that the performance improves with increasing the number
of nodes per layer of the DQN network, as shown in Fig. 6.8. This is expected as
the DQN agent is able to more accurately map the Q-value function to each state-
action pair as it learns through direct interaction with the environment. However, the
performance of the proposed solution declines as the number of layers of the DQN
increases beyond 3 layers as it becomes prone to overfitting.
Fig. 6.9 Overall results a Traversed distance vs. trajectory norm b Collisions versus trajectory
norm c Cellular connectivity versus. gNB concentration d Return path length over initial trajectory
versus vicinity
solution does not provide an effective mechanism for establishing a different return
route. Figure 6.9d shows the results.
(2) Computational complexity evaluation
We analyze the computational complexity of the presented ML-based UAV naviga-
tion solutions. We illustrate the computational complexity of each of the presented
solutions in the form of the big-O notation. The big-O notation is a mathematical illus-
tration that is commonly used to describe the limiting behavior of a function when the
argument tends towards a particular value or infinity. Heuristic optimization-based
methods, including the ACO-based solution, are iterative in nature.
The computational complexity of the ACO-based solution is proportional to the
number of iterations needed to determine the optimal bound. The Big-O complexity
of the ACO-based solution is given by
where n is the variable problem size, m is the number of ants and j corresponds to
the number of needed iterations. Hence, this optimization-based method cannot be
used in large problem size dimensions in real time.
The online computational complexity of deep supervised learning-based approach
is proportional to the deep neural network architectural size and dimensions. The
Big-O complexity of this technique is given by
where n is the variable problem size corresponding to the length of the trajectory and
C > 0 is a constant corresponding to the network dimensions and is given by
where ∂ corresponds to the dimensions of the Q-network and A represents the action
space proportional to the degrees of freedom of the UAV. Consequently, the ML-based
solutions may potentially satisfy the real-time calculation objective as they involve
lower computational complexity as compared to the optimization-based methods.
Simulations results demonstrate that the presented machine-learning based UAV
navigation solutions perform closely to the optimal bound while providing much
150 G. Afifi and Y. Gadallah
faster computational results as shown in Fig. 6.10. The UAV can therefore determine
a feasible path to navigate to its destination efficiently within real-time bounds.
(3) Remarks on the comparative performance between the deep learning and the
reinforcement learning based solutions
According to our results, the ML-based techniques can be practically used for
UAV navigation applications. Both the deep supervised learning and reinforcement
learning-based solutions perform closely to the optimal bound in terms of traversed
trajectory length while maintaining connectivity with the cellular network. The ML-
based solutions also managed to fulfill the collision avoidance and different return
path objectives of the multi-objective UAV navigation problem.
Based on our analysis, the computational complexity of the deep supervised
learning-based solution is proportional to the dimensions of the deep neural network
needed to learn the environmental map whereas the complexity of the reinforce-
ment learning agent is governed by degrees of freedom of the UAV. As such, the
deep supervised learning-based technique is effective to navigate through a small
complex terrain with a high degree of freedom. However, the reinforcement learning-
based solution is recommended for use over the deep supervised learning-based
technique in large navigation areas where the computational complexity of the deep
learning-based technique to learn the environmental mapping becomes significantly
large.
Field test experimentation results show that the use of cellular networks to support
the UAV navigation produces promising results, particularly in outdoor urban envi-
ronments. Several experimental studies have shown that the utilization of ML-based
6 Autonomous UAV Outdoors Navigation—A Machine-Learning … 151
techniques for UAV localization and trajectory planning can actually be used in
real life applications. Specifically, the ML-based algorithms have the potential to
enable the UAV to adapt to the dynamics of the environment in real-time fashion.
Based on the simulation results that we also presented in this chapter, it is clear that
there is similarity in the findings with the experimental results that were obtained
from field experiments. Specifically, the simulations conducted also demonstrate the
effectiveness of utilizing cellular signals for UAV navigation. This is attributed to the
fact that we have modelled the performance of ML-based and optimization-based
UAV positioning and navigation algorithms in an urban environment with actual
cellular infrastructure parameters. The cellular signals from the surrounding infras-
tructure can be utilized to position and navigate the UAV to its destination effectively.
Both experimental and simulation results show that ML-based solutions can provide
near optimal results with lower computational complexity, as compared to standard
optimization-based methods. We can therefore safely conclude from these observa-
tions that the use of simulation can be a reasonable alternative to resorting to field
tests when these field trials cannot be conducted for one reason or the other. However,
it is imperative that the final stage of testing a given technique prior to enabling it
for field use should be done in real life field environments.
6.5 Conclusions
References
1. Alsuhli G, Fahim A, Gadallah Y (2022) A survey on the role of UAVs in the communication
process: a technological perspective. Comput Commun 194:86–123. https://doi.org/10.1016/
j.comcom.2022.07.021
2. Li B, Li Q, Zeng Y, Rong Y, Zhang Y (2022) 3D trajectory optimization for energy-efficient UAV
communication: a control design perspective. IEEE Trans Wirel Commun 21(6):4579–4593.
https://doi.org/10.1109/TWC.2021.3131384
3. Zhi Z, Liu L, Liu D (2020) Enhancing the reliability of the quadrotor by formulating the control
system model. In: International conference on sensing, measurement & data analytics in the
era of artificial intelligence (ICSMD), Xi’an, China 242–246. https://doi.org/10.1109/ICSMD5
0554.2020.9261660
4. Afifi G, Gadallah Y (2021) Autonomous 3-D UAV localization using cellular networks:
deep supervised learning versus reinforcement learning approaches. IEEE Access 9:155234–
155248. https://doi.org/10.1109/ACCESS.2021.3126775
5. Tippenhauer NO, Pöpper C, Rasmussen KB, Capkun S (2011) On the requirements for
successful GPS spoofing attacks. In: Proceedings of the 18th ACM conference on computer
and communications security, vol 18, pp 75–86
6. Patil V, Atrey PK (2020) GeoSecure-R: secure computation of geographical distance using
region-anonymized GPS data. In: IEEE sixth international conference on multimedia big data
(BigMM), vol 6, 28–36. https://doi.org/10.1109/BigMM50055.2020.00015
7. Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with
applications to image analysis and automated cartography. Commun ACM 24:381–395
8. Wang J, Gu D, Liu F (2019) Research on autonomous positioning method of UAV based
on binocular vision. In: Chinese automation congress (CAC), Hangzhou, China 3588–3593.
https://doi.org/10.1109/CAC48633.2019.8996413
9. Zhou Y, Tang D, Zhou H, Xiang X, Hu T (2019) Vision-based online localization and trajectory
smoothing for fixed-wing UAV tracking a moving target. In: IEEE/CVF international confer-
ence on computer vision workshop (ICCVW), Seoul, Korea. https://doi.org/10.1109/ICCVW.
2019.00024
10. Banerjee P, Corbetta M (2020) In-time UAV flight-trajectory estimation and tracking using
Bayesian filters. In: IEEE aerospace conference, pp 1–9. https://doi.org/10.1109/AERO47225.
2020.9172610
11. Ragothaman S, Maaref M, Kassas ZM (2019) Multipath-optimal UAV trajectory planning for
urban UAV navigation with cellular signals. In: IEEE 90th vehicular technology conference
(VTC2019-Fall), Honolulu, HI, USA, pp 1–6. https://doi.org/10.1109/VTCFall.2019.8891218
12. Ge J, Liu L, Dong X, Tian W (2020) Trajectory planning of fixed-wing UAV using Kinodynamic
RRT* algorithm. In: International conference on information science and technology (ICIST),
Bath, London, and Plymouth, United Kingdom, vol. 10, pp 44–49. https://doi.org/10.1109/ICI
ST49303.2020.9202213
13. Lachow I (1995) The GPS dilemma: balancing military risks and economic benefits. Int Secur
20(1):126–148. JSTOR, www.jstor.org/stable/2539220. Accessed 5 June 2021
14. Afifi A, Gadallah Y (2022) Cellular network-supported machine learning techniques for
autonomous UAV trajectory planning. IEEE Access 10:131996–132011. https://doi.org/10.
1109/ACCESS.2022.3229171
15. Susarla P et al (2020) Learning-based trajectory optimization for 5G mmWave uplink UAVs.
In: IEEE international conference on communications workshops (ICC workshops), Dublin,
Ireland, pp 1–7. https://doi.org/10.1109/ICCWorkshops49005.2020.9145194
16. Zeng Y, Xu X (2019) Path design for cellular-connected UAV with reinforcement learning.
arXiv:1905.03440
17. Bast SD, Vinogradov E, Pollin S (2019) Cellular coverage-aware path planning for UAVs.
In: IEEE international workshop on signal processing advances in wireless communications
(SPAWC), Cannes, France, vol 20, pp 1–5. https://doi.org/10.1109/SPAWC.2019.8815469
6 Autonomous UAV Outdoors Navigation—A Machine-Learning … 153
Abstract The spatially discernible indoor magnetic field indicates locations through
different magnetic readings at various positions. Therefore, magnetic positioning has
garnered attention due to its promising localization accuracy and infrastructure-free
nature, significantly reducing the investment in localization. Since the magnetic field
covers all indoor environments, magnetic positioning holds the potential to create
a ubiquitous indoor positioning system. This chapter investigates the stability of
the magnetic field concerning factors such as devices, testers, materials, and dates.
Compensation methods for different types of magnetic features are studied based on
fluctuation patterns to achieve accurate positioning results. Evolutionary algorithm-
based optimization strategies are proposed for online localization, tailored to the types
of used magnetic features. Testing experiments validate the feasibility and efficiency
of utilizing evolutionary algorithms to enhance magnetic positioning performance.
7.1 Introduction
The indoor geomagnetic field is a composite of the Earth’s magnetic field and those
of electromagnetic sources (e.g., power supplies) and/or ferromagnetic materials
(e.g., iron furniture, central air conditioners) [1]. The uneven distribution of such a
combined magnetic field has been exploited for indoor localization. Starting with
robot localization in corridor environments, Suksakulchai et al. [2] used magnetic
field disturbances as recognition signatures for localizing a robot. Experimental
outcomes validated the feasibility of using sequential magnetic disturbance data
M. Sun (B) · K. Yu
China University of Mining and Technology, No. 1 Daxue Road, Xuzhou 221116, China
e-mail: [email protected]
K. Yu
e-mail: [email protected]
J. Bi
Shandong Jianzhu University, Fengming Road No. 1000, Shandong Province, Jinan 250101,
China
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 155
K. Yu (ed.), Positioning and Navigation Using Machine Learning Methods, Navigation:
Science and Technology 14, https://doi.org/10.1007/978-981-97-6199-9_7
156 M. Sun et al.
for positioning in corridor environments. Gozick et al. [3] investigated the stability
and long-term variation of ambient indoor magnetic field, affirming the utility in
distinguishing indoor landmarks, guideposts, and rooms. They confirmed that indoor
magnetic maps can be developed for localization with only a mobile phone. Li et al.
[4] observed that the characteristics of the magnetic field change with locations,
making magnetic fingerprinting possible. They indicated the limited available infor-
mation of magnetic data as a significant drawback of magnetic positioning. Accord-
ingly, numerous efforts have focused on developing precise and stable magnetic posi-
tioning methods. Magnetic localization methods can be categorized into two main
groups based on the utilization of magnetic features: single-point-based magnetic
positioning (SPMP) and sequence-based magnetic positioning (SBMP) [5].
SPMP utilizes discrete features including 3-axis components, horizontal intensity,
and total intensity of the magnetic field for positioning. As depicted in Fig. 7.1, SPMP
requires an offline pre-built magnetic features database (also known as magnetic
maps) for online matching. During the offline stage, the fingerprint associated with
the ith position can be described as:
{ }
Ξi = Bx , By , Bz , Bh , B, xi , yi (7.1)
where xi and yi denotes the coordinates of the ith indoor position, Bx , By , Bz , Bh and B
are the triple-axis, horizontal, and total intensities of the magnetic field, respectively,
which are also magnetic features. A discrete magnetic database is constructed by
generating fingerprints at all the arranged indoor positions. The database can also be
generated by crowdsourcing methods described in [6, 7], or the SLAM approach in
[8, 9]. These methods aim at reducing labor-intensive measurement workload.
For online matching, the k-nearest neighbors (KNN) algorithm [10], and mean
square deviation (MSD) [11] are commonly used. However, directly employing
KNN and MSD with low-dimensional magnetic feature data does not yield optimal
positioning results. Researchers have tackled this issue by proposing optimized
approaches or introducing supplementary information to magnetic features. For
example, [12] proposes the multi-magnetic fingerprint fusion (MMFF) method,
which combines coarse estimation with fine estimation for accurate positioning.
In [13], the multi-parameter matching model of least magnetic distance (MPMD)
is proposed to optimize the low accuracy problem of MSD and offer better noise
immunity than MSD. Moreover, various optimization works are explored by using
the particle filter, such as the integrated particle filter [14], genetic particle filter (GPF)
[10], improved particle filters (IPF) [15, 16], sensitivity-based adaptive particle filter
[17], and more. Generally, PF can well handle the problem of low accuracy caused
by low-dimensional magnetic data, yet it always suffers from high computational
complexity because lots of particles are required to perform filtering. To enhance the
specificity of magnetic fingerprints, researchers have integrated phone attitudes with
magnetic features to generate an augmented magnetic vector [18], aiming to reduce
mismatching and improve localization accuracy. In [7], orientation-aided magnetic
fingerprints are designed for processing crowdsourced magnetic data, improving
fingerprint fidelity when massive data are used.
SBMP relies on sequential magnetic data to estimate indoor positions. As indi-
cated in Fig. 7.1, SBMP needs an offline pre-built magnetic database for online
matching. Generally, the sequential magnetic features contain the total intensity of
the magnetic field along the designated routes. To construct a sequential magnetic
database, the tester should hold a device while moving along the predefined routes
to simultaneously measure total magnetic intensities and position information. The
ith sequential fingerprint with n magnetic intensities can be expressed as:
{ }
Ξi = B1 , x1 , y1 , B2 , x2 , y2 , . . . , Bj , xj , yj , . . . Bn , xn , yn (7.2)
where Bj is the jth total magnetic intensity of the magnetic sequence, j ∈ {1, . . . , n},
n denotes the number of total magnetic intensities, respectively. During practical
usage, n should be determined according to positioning accuracy. Generally, the
higher the precision required, the longer the length of the magnetic sequence, which
means that n has a greater number. For online matching, the length of the real-time
measured consecutive magnetic data should be consistent with n. Therefore, online
matching evaluates the similarity between the measured magnetic intensities and the
fingerprint sequence, while the positions in the fingerprint defined by Eq. (7.2) will
not be involved in the matching process. However, if one piece of sequential magnetic
fingerprint is recognized, the n − th position will be output as the positioning result.
Since measuring the magnetic sequence only requires walking along the planned
routes, building the sequential magnetic database offers time advantages over gener-
ating a discrete magnetic database, which requires repeated measurement works at
reference positions. However, the construction of a sequential magnetic database is
restricted by users’ walking patterns and speeds. To address this problem, Kuang
et al. [19] and Asraf et al. [20] proposed to construct sequential magnetic data by
connecting discrete magnetic features at reference points. Although this approach is
158 M. Sun et al.
free from moving along the planned roads, it diminishes the specificity of magnetic
sequence and lacks adaptability to varying walking speeds.
Among online matching algorithms of SBMP, dynamic time warping (DTW)
[21, 22], waveform-based DTW [23], and least squares method (LS) [11] are popular
methods for evaluating sequence similarity. In [2], a sequential least-squares approx-
imation method is used to match real-time measured magnetic data with stored signa-
tures. DTW and LS are easy to implement but DTW faces high time costs with large
datasets. Other efforts like the Gauss–Newton iterative (GNI) method [19], binary
grid (BG) [24], leader–follower mechanism [25], and bags of words (BOW) [26]
have been made. These methods try to find connections between magnetic sequences
and positions by processing sequence data. The sequential magnetic data describes
the data patterns of the specific locations or routes. Therefore, the specific patterns
provide favorable conditions for machine learning. For instance, in [27], convolu-
tional neural networks (CNN) is adopted to learn relationships between magnetic
patterns and indoor positions, achieving localization accuracy better than 1.01 m in
75% of the cases. Moreover, recurrent neural networks (RNN) [28], deep recurrent
neural networks (DRNN) [29], and probabilistic neural networks (PNN) [30] are
also introduced for SBMP, yielding promising positioning results. Although these
works demonstrate good performance using machine learning models for sequence
matching, the problem of the high time cost should be always considered for practical
application.
It is observed that classic distance-based matching in SPMP is inadequate
for high-precision positioning due to limited magnetic information. Although the
machine learning-based SBMP models achieve promising results, the high algo-
rithm complexity would not be suited for implementation on mobile phones. To
tackle these problems, this chapter will concentrate on the optimizations of the SBMP
and SPMP methods. Additionally, the chapter will investigate the factors influencing
magnetic data measurements and compensation methods (explained in Sect. 7.2). To
address the limitations of distance-based SPMP and the high algorithm complexity of
the machine learning-based SPMP, we introduce the mind evolutionary algorithm (a
machine learning method, described in Sect. 7.3) for SPMP and the enhanced genetic
algorithm-based extreme learning machine (EGA-ELM, discussed in Sect. 7.4) for
SBMP, respectively.
equipment, attitudes, dates, and diverse indoor media (iron, wood, electronic devices,
etc.).
(1) Different mobile phones. The magnetic data measured by one type of mobile
phone is usually different from that measured by another type of mobile phone. Even
for different mobile phones of the same type, variations still exist due to production
differences. Figure 7.2 shows the magnetic sequences measured by three mobile
phones along the same indoor route. It can be seen that there are significant differ-
ences in the recorded geomagnetic curves. Taking the sequence measured by Honor 9
as the reference, the other two curves display a uniform upward or downward shift. If
the magnetic sequence database is constructed based on the measurements of Honor
9, leveraging other mobile phones to perform magnetic positioning will introduce
substantial errors. However, the three magnetic curves in Fig. 7.2 still reveal consis-
tent changing tendencies, demonstrating that performing sequence matching using
the changing trend of the magnetic sequence will be better than directly using discrete
individual magnetic measurements.
(2) Different attitudes of devices. The 3-axis magnetic readings are recorded in
the device frame, influenced by the manner in which individuals hold their devices.
Within the same area, Fig. 7.3 presents the generated 3-axis magnetic maps corre-
sponding to three distinct device attitudes (30°, 45°, and 60°, respectively). It can
be found that identical geomagnetic component maps exhibit distinct patterns across
varying device postures. Even in the same area, the same particular component show-
cases significant fluctuations due to different device orientations. This suggests that
constructing a magnetic database for positioning based on a fixed device orienta-
tion can not accommodate the diversity of phone-holding styles. Therefore, device
orientation raises a crucial challenge that demands primary consideration during both
offline database construction and online positioning, which need magnetic field data
independent from device attitudes.
(3) Long-time observations. Earth’s magnetic field is susceptible to solar influ-
ence, which also in turn impacts the indoor magnetic field. This phenomenon results
from the amalgamation of the earth’s magnetic field and the magnetization generated
by indoor magnetic materials. As illustrated in Fig. 7.4, within the same indoor area,
the maps of the total magnetic intensity were generated with an 18 month interval.
During this time interval, the structure of the testing area is fixed. Figure 7.4 reveals
160 M. Sun et al.
Fig. 7.3 The magnetic maps of the 3-axis components of the magnetic field under three attitudes
within the same area
(a) (b)
Fig. 7.4 The magnetic map of the same area with an 18-month interval. a map first generated;
b map generated after 18 months. The units of the axes in the figure are decimeters
alterations in the magnetic field’s strength after 18 months. Therefore, due to the
fluctuation of the magnetic field’s strength, it is necessary to regularly update the
magnetic database for accurate localization.
(4) Magnetic sequences measured on different dates. To observe the temporal
variations in magnetic sequences, the total intensities were measured on three
different dates with one-month interval using the same device along the same indoor
route. As depicted in Fig. 7.5, the measured magnetic intensity sequences exhibit
similar distributions to those in Fig. 7.2. This observation indicates that although
the magnetic sequence has better differentiation compared to the discrete magnetic
features, the magnetic intensity sequences still dynamically change over time because
the magnetic intensity readings have been changed, as observed in Fig. 7.4. This kind
of sequential change exhibits periodic upward or downward shifts, yet the consistent
changing trend of magnetic strength along the route remains for different devices
(see Fig. 7.2) and dates.
(5) Magnetic sequences measured by different users. In real-life scenarios,
users have different heights, leading to different heights of the device relative to
the ground surface during data collection. Figure 7.6 shows the magnetic sequences
collected by three users with heights of 1.92, 1.82, and 1.60 m, respectively. All
three users walked along the same indoor route using the same device and very
7 Magnetic Positioning Based on Evolutionary Algorithms 161
similar holding posture. It can be seen that the magnetic curves exhibit a consis-
tent changing trend among users. However, similar to the patterns in Figs. 7.2 and
7.5, magnetic sequences obtained at different heights display very similar variations
trends, exhibiting an overall upward or downward shift. Therefore, inferring from the
changing trend of magnetic curves would yield better results than direct utilization
of measured magnetic data. This inference applies across varying dates and devices.
(6) Different materials. An indoor scenario such as an office building has various
devices or materials. Figure 7.7 illustrates the investigated materials including wood
furniture, iron cabinets, fire hydrants, concrete walls, computers, and power supply
equipment. An experiment was conducted to collect magnetic data near these mate-
rials. Initially, magnetic data were collected at a fixed distance from the materials
(e.g., 1 m) for 100 s at a sampling rate of 50 Hz. Subsequently, the mobile phone is
brought closer to the materials for another 100 s measurement, followed by returning
to the original position for an additional 100 s of data collection. This approach aims
to identify materials that impact the stability of the magnetic field, aiding in deter-
mining whether compensation or mitigation is necessary for positioning when the
user is near any of these materials.
Figure 7.8a illustrates the magnetic intensity remains relatively stable when a
person moves in close proximity to the phone. In Fig. 7.8b, as the phone approaches
the wood material, there are no discernible fluctuations in the measurement sequence.
This indicates that both wood material and the human body have negligible influence
162 M. Sun et al.
Fig. 7.7 The investigated materials. a wood furniture, b iron cabinet, c fire hydrant, d concrete
wall, e computer, f power supply equipment
on the intensity of the magnetic field. However, the measured magnetic intensities
decrease sharply when the mobile phone approaches iron cabinets, fire hydrants,
and power supply devices within a certain distance, as shown in Fig. 7.8c, d, and g,
respectively. Conversely, when the phone nears a concrete wall or computer hard-
ware, the magnetic intensity shows a sudden increase, as shown in Fig. 7.8e and f,
respectively. These experimental results reveal that common indoor materials have a
certain range of influence on geomagnetic measurements. Once the mobile phone is
within a certain range of some materials, the measured magnetic intensity undergoes
significant change, but there is no effect when the phone is outside this range. This
result emphasizes the necessity of considering intensity fluctuations when locating
the user who is near such materials.
(7) Summary of influencing factors. The impact of several factors and patterns
on indoor magnetic field intensity is summarized as follows:
➀ The indoor magnetic field intensity exhibits variability over time. These changes
are related to Earth’s inherent characteristics of geomagnetic field and ferromag-
netic materials inside buildings. However, in structurally stable indoor buildings,
the interrelations of magnetic field intensities among various points within the
area remain consistent.
➁ Factors such as device height, dates, and hardware models can introduce overall
deviations in magnetic field measurements. Variations in device attitudes lead
to dynamic changes in the three-axis components. However, compared to the
fluctuations in the three-axis components, the total intensity of the geomagnetic
field is relatively stable.
➂ Non-ferrous materials such as wood, and the human body do not generate
influences on geomagnetic measurements. On the other hand, concrete walls,
iron materials, electronic devices, and similar ferromagnetic elements can make
impacts. However, this impact is local and momentary. It becomes negligible
when moving away from the magnetic materials’ impact range (e.g., beyond
1 m), having no bearing on the measurement of magnetic field data.
7 Magnetic Positioning Based on Evolutionary Algorithms 163
(a)
(b)
(c)
(d)
(e)
(f)
(g)
Fig. 7.8 The magnetic intensity readings influenced by different materials. a human b wood, c iron
cabinet, d fire hydrant, e concrete wall, f computer, g power supply equipment
164 M. Sun et al.
Section 7.2.1 introduced the dynamic changes of the magnetic field measurements.
Considering the dynamic nature of the data, compensation is necessary before online
positioning. This section will present methods for compensating different types of
magnetic data.
Discrete magnetic data transformation. Figure 7.9 shows the geographic coor-
dinate system (GCS) and the carrier coordinate system (CCS). Generally, the
measured 3-axis components are in the CCS, which varies with the holding styles
of phones. Since GCS is a system of latitude, longitude and altitude coordinates,
the 3-axis components of the magnetic field in the GCS will be independent of the
holding styles of phones. Therefore, using the magnetic data in the GCS will intro-
duce fewer errors because the data is more stable than that in the CCS. This involves
the transformation between GCS and CCS.
As Fig. 7.9b shows, the angles at which the smartphone rotates around the x, y,
and z axes are called pitch, roll, and yaw, respectively. The rotation matrix from GCS
to CCS is defined as:
⎡ ⎤⎡ ⎤⎡ ⎤
cos γ 0 − sin γ 1 0 0 cos ψ − sin ψ 0
C bg = C 3n C 2n C 1n = ⎣ 0 1 0 ⎦⎣ 0 cos θ sin θ ⎦⎣ sin ψ cos ψ 0 ⎦
sin γ 0 cos γ 0 − sin θ cos θ 0 0 1
(7.3)
where C 1n , C 2n , and C 3n represent the rotation matrices along the z, x, and y axes, C bg
is an orthogonal matrix and denotes the rotation from GCS to CCS, respectively.
Equation (7.3) reveals the rotation sequence as: z → x → y. To make the rotation,
the three angles should be first obtained. The magnetic and gravity data are utilized
(a) (b)
Fig. 7.9 The geographic coordinate system and carrier coordinate system. a geographic coordinate
system; b carrier coordinate system
7 Magnetic Positioning Based on Evolutionary Algorithms 165
for the angle calculation, that is, the pitch and roll are computed using gravity data
and the yaw is derived from magnetic data.
Since gravity always points to the center of the earth, the gravity readings
can be expressed as G = [0, 0, g]T and when the phone is placed horizontally,
g = 9.8m/s2 . If the gravity is measured in the CCS, the gravity vector is denoted as
Gb = [gbx , gby , gbz ]T . The relationship between G and Gb is as follows:
⎡ ⎤ ⎡ ⎤⎡ ⎤
gbx cos γ cos ψ + sin γ sin ψ sin θ − cos γ sin ψ + sin γ cos ψ sin θ − sin γ cos θ 0
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎣ gby ⎦ = ⎣ sin ψ cos θ cos ψ cos θ sin θ ⎦⎣ 0 ⎦ (7.4)
gbz sin γ cos ψ − cos γ sin ψ sin θ − sin γ sin ψ − cos γ cos ψ sin θ cos γ cos θ g
According to Eq. (7.4), pitch, roll, and gravity have the following relationship:
⎡ ⎤ ⎡ ⎤
gbx − sin γ cos θ
⎣ gby ⎦ = ⎣ sin θ ⎦g (7.5)
gbz cos γ cos θ
Therefore, pitch and roll can be calculated by using gravity data as follows:
( /
γ = atan(−g/bx gbz )γ ∈ [−π,
/ π/]
(7.6)
θ = arcsin(gby g)θ ∈ [−π 2, π 2]
Assuming the 3-axis magnetic data in the CCS and GCS are B = [bx , by , bz ]T
and Bg = [bgx , bgy , bgz ]T , the yaw is computed as:
( )
sinγ sinθ bx + cosby − cosγ sinθ bz
ψ = −arctan ψ ∈ [0, 2π ] (7.7)
cosγ bx + sinγ bz
Based on (7.6), (7.7), magnetic data, and gravity data, the rotation angles and C bg
can be obtained. The relationship between B and Bg is defined as:
B = C bg Bg (7.8)
which describes the transformation of the magnetic data from GCS to CCS. However,
the required transformation for magnetic positioning is from CCS to GCS. According
to the rules of the matrix operations, the following equation can be obtained:
C b = (C bg )−1 = (C bg )T
g
(7.10)
166 M. Sun et al.
Fig. 7.10 The 3-axis magnetic components transformed into the geographic coordinate system
Figure 7.10 presents the transformed maps of the 3-axis components in Fig. 7.3.
It can be seen that the intensity of the x component is nearly zero (×10–3 µT ), and
the maps of y and z components illustrate the same overlapping maps, which are
different from the separate maps in Fig. 7.3. Therefore, utilizing the above process
for discrete magnetic data compensation can produce more stable 3-axis data and
standard magnetic maps for online positioning.
Deviation mitigation of magnetic sequence data. As stated in Sect. 7.2.1, the
sequential magnetic data has an overall deviation caused by factors like different
mobile phones, dates, testers, etc. For a planned indoor route, although the measured
magnetic patterns have differences, the same changing trend independent of the
influence factors exists. Therefore, extracting this trend for online positioning is
important for high-precision localization. We propose to use the wave-like features
of the original magnetic sequences. If a magnetic sequence S has n samples, we
construct a new data sequence, termed slope data S' , by:
(
(S(i + m) − S(i))/m, i + m ≤ n
S' (i) = (7.12)
(S(n) − S(i))/(n − i), i + m > n&i < n
(a)
(b)
Fig. 7.11 The extracted magnetic slope curves. a different dates; b different mobile phones
The SPMP methods (e.g., KNN, MSD, etc.) face challenges in accurately evalu-
ating similarities due to similar magnetic readings being captured at different indoor
locations. To address this issue, we propose utilizing the enhanced mind evolu-
tionary algorithm (EMEA) for location estimation. Unlike traditional distance-based
methods, EMEA employs a global search strategy to find optimal positioning results
rather than directly assessing the Euclidean distance between the real-time measured
magnetic data and the magnetic fingerprint. Based on the high sampling rate of
the magnetometer, a large amount of geomagnetic data can be gathered within one
second, providing favorable conditions for applying learning algorithms. Theoret-
ically, within this data, there should always be a piece of collected magnetic data
that closely matches the ground-truth position. Following this principle, EMEA is
employed to search for the optimal position using all the collected magnetic data.
The following parts will introduce the theory behind EMEA and define the EMEA-
based SPMP. Experimental tests are presented to show the advantages of EMEA-
based SPMP against distance-based methods.
168 M. Sun et al.
where Smax and Smin denote the maximal and minimal scores of the subgroup, respec-
tively, and ς is a threshold. If one subgroup is mature, the highest score will be posted
on the global billboard. The local competitions within all the subgroups are executed
until all the subgroups mature.
Dissimilation. This step is also called global competition. It involves comparing
scores on the global billboard to determine which subgroups to discard or regenerate.
The superior subgroup with the lowest score will be replaced by a temporary subgroup
with a better score. A new temporary subgroup will be regenerated in the search space.
Similartaxis and dissimilation are operated independently, with the global bill-
board capturing evolutionary information from each generation, steering the process
(a) (b)
toward an optimal direction. The evolutionary process finishes if the number of iter-
ations reaches the predefined number or no superior subgroup is replaced. Under this
condition, the best individual of the best superior subgroup is output as the searching
result.
The performance of MEA depends on the number of subgroups, which is defined
before the evolution starts. However, after the subgroups are generated, the subgroups
may intersect (Fig. 7.13a) or the individuals of one subgroup are relatively scattered
(Fig. 7.13b). In these cases, fixed numbers of superior or temporary subgroups will
not cover all search space, which will be detrimental to the optimal search. Therefore,
the enhanced MEA is proposed to address these shortcomings.
EMEA dynamically assigns the numbers of superiors or temporary subgroups
using variance calculation and center control. The definitions are as follows:
Variance calculation. After generating the subgroups, the variance of indi-
viduals within one subgroup should be evaluated. If a subgroup is expressed as
{zi , i = 1, 2, 3, . . . , n}, the variance is calculated as:
1∑
n
var = (zi − z)2 (7.14)
n i=1
where z is the mean value of {zi }, which are the characteristics of the individuals. A
large variance suggests distributions similar to those in Fig. 7.13b, indicating a need
for subdivision of that subgroup.
Center control. Once the subgroup centers are confirmed, the distance between
each pair of centers should be calculated. This helps determine whether a pair of
subgroups are too close or intersect, as shown in Fig. 7.13a. In this case, the algorithm
combines two subgroups and generates a new group in the search space.
To perform MEA optimization, both variance calculation and center control need
to define thresholds empirically. For example, in the case of magnetic positioning,
coordinates can be considered as the characteristic of individuals. Therefore, the
170 M. Sun et al.
T =M +N (7.15)
(
N = N + 1, simax < δ
(7.16)
M = M + 1, simax > δ
where T , M , N denote the numbers of total, temporary, and superior subgroups, simax
is the maximal score of the ith group, δ is a predefined threshold, respectively.
As illustrated in Fig. 7.14, EMEA independently determines the numbers of
superior and temporary subgroups through the above strategies, then followed by
executing similartaxis and dissimilation for evolution, ultimately leading to the final
optimal result.
where G(k) represents the population of EMEA, gk (xi , yi ) is the ith individual of
G(k), xi and yi are the coordinates of the ith individual, i ∈ {1, 2, . . . , n}. The
following step is to score individuals. For single magnetic localization, scoring is
related to the previous true magnetic position M(k − 1), as shown in Fig. 7.15. The
ith individual’s score is calculated as:
1
s{k, i} = √ (7.18)
(Xk−1 − xi )2 + (Yk−1 − yi )2
where (Xk−1 , Yk−1 ) denotes the coordinates of M(k − 1). As defined in Sect. 7.3.1,
the individuals with the highest scores are selected as the centers of subgroups.
Then, the variance calculation is made by calculating the coordinates variance of the
subgroups:
1 ∑[ ]
n
var = (xi − x)2 + (yi − y)2 (7.19)
n i=1
where x and y represent the mean coordinates of the individuals. If the variance
surpasses a predefined threshold, a subgroup is divided. During our test, the threshold
of variance is 0.5. The next step is center control. With the coordinates of the selected
centers, the distance between two subgroups is computed as:
/
d= (xci − xcj )2 + (yci − ycj )2 (7.20)
( )
where (xci , yci ) and xcj , ycj represent the positions of the ith and jth centers, respec-
tively. Two close subgroups will be combined if their center distance is below a
certain threshold, which is set as 1 m.
172 M. Sun et al.
To evaluate EMEA’s feasibility for magnetic positioning, testing was made at China
University of Mining and Technology (CUMT), Xuzhou. The magnetic database is
constructed in the form of Eq. (7.1). The 3-axis components of magnetic field data are
transformed to GCS through coordinate transformation as described in Sect. 7.2.2.
Comparative evaluations included state-of-the-art approaches such as KNN [10],
multi-magnetic fingerprint fusion (MMFF) [12], and mean square differences (MSD)
[11], alongside the MEA and EMEA-based magnetic positioning methods. The
localization results are shown in Fig. 7.16.
The red lines in the error boxes reveal the median errors of different approaches.
The median errors of distanced-based algorithms (KNN, MSD, and MMFF) are
greater than 2 m. Generally, distance-based methods take the mean value of the
measured magnetic data for positioning calculation. Although this operation can
reduce the influence of random errors and measurement noise, it also loses the diver-
sity of data. Therefore, mismatching always occurs, leading to poor positioning
accuracy. This can be validated in Fig. 7.16, where large errors are always generated
using the three distance-based methods.
As shown in Fig. 7.17, an extreme learning machine (ELM) operates with a simpli-
fied structure, featuring only an input layer, a single hidden layer, and an output
layer. Unlike the complex structures of CNN or RNN models, ELM utilizes a
single layer for learning, ensuring low time cost. Constructing an ELM involves
the definition of three key parameters: ω, b, β. The connection between the input
layer and the hidden layer is determined by the weight vector ω and the activa-
tion threshold b. The hidden and output layers are linked via the weight vector β.
Given the condition that an ELM model has n input nodes and l hidden nodes, with
[ ]T [ ]T
input data X = x1 , x2 , . . . , xQ , xj = x1j , x2j , . . . , xnj ∈ Rn , and output data
[ ]T [ ]T
Y= y1 , y2 , . . . , yQ , yj = y1j , y2j , . . . , ymj ∈ Rm , the model is expressed by the
174 M. Sun et al.
following equation:
∑
l
yj = β ij Gi (ωi , bi , X), j = 1, 2, . . . Q (7.21)
i=1
where β i = [βi1 , βi2 , . . . βil ]T , ωi = [ω1i , ω2i , . . . , ωni ]T , and Gi (ωi , bi , X) is the
output of the i−th hidden node, respectively. Gi (ωi , bi , X) is an entry of the hidden
layer’s output matrix G, which is described as:
( )
G ω1 , ω2 , . . . , ωl , b1 , b2 , . . . , bl , x1 , x2 , . . . , xQ
⎡ ⎤
f (ω1 • x1 + b1 )f (ω2 • x1 + b2 ) · · · f (ωl • x1 + bl )
⎢ f (ω1 • x2 + b1 )f (ω2 • x2 + b2 ) · · · f (ωl • x2 + bl ) ⎥
⎢ ⎥
=⎢ .. ⎥ (7.22)
⎣ . ⎦
f (ω1 • xQ + b1 )f (ω2 • xQ + b2 ) · · · f (ωl • xQ + bl ) Q×l
where T ' is the transpose matrix of T. In theory, an ELM model with Q input training
data samples can approximate any desired training error ε under the condition that
f (•) is infinitely differentiable within any interval. Therefore, the training error ε is
expressed as:
|| ||
||GQ×l β l×m − T ' || < ε (7.24)
7 Magnetic Positioning Based on Evolutionary Algorithms 175
which indicates that an ELM network is defined when parameters ω, b and β are
known. Moreover, β can be calculated using ω, b and T as follows:
β=H † T (7.25)
The output matrix of the hidden layer is related to ω and b, which are involved
in the calculation of the weight vector β. Therefore, an ELM network is uniquely
determined by the assigned ω and b. However, ensuring optimal parameters remains
a challenge. Better initial parameters contribute to ELM approaching the training
target with smaller errors. The relationship between initial parameters and training
errors can be represented as a nonlinear function g(•) as follows:
⎧
⎪
⎪ g(ω11 , ω12 , ..., ω1n , b11 , b12 , ..., b1n ) → e1
⎪
⎨ g(ω21 , ω22 , ..., ω2n , b21 , b22 , ..., b2n ) → e2
⎪ .. (7.27)
⎪
⎪ .
⎩
g(ωn1 , ωn2 , ..., ωnn , bn1 , bn2 , ..., bnn ) → en
Equation (7.27) denotes that g(•) is a nonlinear function of the independent vari-
ables ω and b. Therefore, the problem of determining the optimal parameters is
transformed into solving the extreme value of the nonlinear function.
where psi is the probability, gi is the fitness of the ith individual, respectively. For
every round of selection, the roulette-wheel method is performed to select several
candidates. The best individual is then extracted from candidates based on their
fitness values. Better individuals have a larger probability of being selected. Since a
population has N individuals, the above process will be executed N times.
Adaptive crossover. This operation is made according to a crossover probability
pc , which remains constant in classic GA. Crossover exchanges genes between two
individuals, aiming to produce improved offspring during evolution. To facilitate the
passage of beneficial genes to the next generation, a dynamic crossover probability
is calculated as follows:
n × (pcmax − pcmin ) × Fi − m
pc = pcmax − (7.29)
m × Fi
where pcmax , pcmin , and Fi represent the maximal and minimal crossover probability,
and the average fitness of the ith generation of the population; and m and n denotes
the maximal and current numbers of evolutions, respectively. Equation (7.29) links
the current crossover probability with the mean fitness of the population and the
number of evolution. Given the following crossover condition:
pc ≤ r, pc , r ∈ [0, 1] (7.30)
the adaptive crossover probability, the adaptive mutation probability is defined as:
n × (pmmax − pmmin ) × Fi − m
pm = pmmax + (7.31)
m × Fi
where pmmax , pmmin , and Fi represent the maximal and minimal mutation probability,
and the average fitness of the ith generation of the population, and m and n denotes
the maximal and current numbers of evolutions, respectively. Given the mutation
condition:
pm ≤ t, pm , t ∈ [0, 1] (7.32)
To develop the localization model, the most important thing involves employing the
EGA to estimate the optimal initial parameters ω and b for the extreme learning
machine (ELM). This problem has been stated in Sect. 7.4.1. Following the EGA
process, the EGA-based ELM can be constructed as follows:
Chromosome formation. Every individual of EGA represents a potential solution
for optimal parameter estimation. ω and b are used for the formation of chromosomes.
The genes of chromosomes are formed by using the value encoding method [34].
If the ELM has N input nodes and l hidden nodes, the number of elements of ω is
N × l. The length of an individual’s chromosome is calculated as:
where ωij represents the weight connecting the ith input node and the jth hidden
node, and bj is the jth activation value.
Fitness function definition. To evaluate an individual’s adaptability, EGA needs
a fitness function to assign scores. As defined in (7.27), the individuals are scored by
g(•). If the training data has Q samples, the training error is calculated by:
178 M. Sun et al.
/
∑
Q
F=1 |yi − t i | (7.35)
i=1
where F denotes the fitness of the individual, yi and t i , i = {1, 2, . . . , Q}, represent
the ELM prediction and the target labels, respectively.
Convergence definition. The convergence condition is defined by specifying the
maximum number of iterations, which means that EGA stops evolution if the number
of iterations reaches the predefined number. If setting a training goal, the condition
can be defined by using the mean fitness of the population as follows:
∑
N
F= Fj (7.36)
j=1
segments depends on the changing trend (see Figs. 7.2, 7.5 and 7.6) of the magnetic
field, and should be consistent with the number of input nodes of the ELM model.
One piece of magnetic sequence should be labeled by only one indoor position.
As Fig. 7.19 shows, with these definitions, testers should collect magnetic data
and make partitioning according to segment methods, which can be achieved by
using a sliding window. After labeling magnetic segments, inputting data into the
EGA model obtains optimal initial parameters. Then, the ELM-based positioning
model is built using the optimal initial parameters.
times longer than that of the EGA-ELM model, respectively. Although RBF model
construction is fast, its mean positioning accuracy is about 1.5 m, which is worse than
that of the EGA-ELM. It can be concluded that the proposed EGA-ELM method can
achieve high-precision localization with low time cost. This feature is significant for
real-time updating of the sequence-based magnetic positioning models.
7.5 Summary
This chapter provides extensive studies for magnetic positioning using evolutionary
algorithms. The factors that affect the magnetic measurements are investigated. Devi-
ation mitigation methods for different types of magnetic features are presented. To
improve magnetic positioning, an enhanced MEA and an EGA-ELM are proposed
for performing single-point-based magnetic positioning (SPMP) and sequence-based
magnetic positioning (SBMP), respectively. The testing results demonstrate that
the EMEA-based SPMP outperforms the classic distance-based methods such as
KNN, MSD, etc., and the EGA-ELM-based SBMP performs better than popular
machine learning models (such as BP, CNN, etc.) in terms of positioning accuracy
and time cost of models. The experimental results verify the feasibility of using evolu-
tionary algorithms for enhanced magnetic positioning. Future research will focus
7 Magnetic Positioning Based on Evolutionary Algorithms 181
(a)
(b)
Fig. 7.20 Comparisons of positioning errors and model construction time of different models.
a average positioning error comparison; b average model construction time comparison
on extending and integrating the proposed methods with other indoor localization
methods.
References
5. Sun M et al (2021) Indoor geomagnetic positioning using the enhanced genetic algorithm-based
extreme learning machine. IEEE Trans Instrum Meas 70:1–11
6. Chen L, Wu J, Yang C (2020) MeshMap: a magnetic field-based indoor navigation system with
crowdsourcing support. IEEE Access 8:39959–39970
7. Hou L et al (2019) Orientation-aided stochastic magnetic matching for indoor localization.
IEEE Sens J 20(2):1003–1010
8. Solin A et al (2018) Modeling and Interpolation of the ambient magnetic field by gaussian
processes. IEEE Trans Robot 34(4):1112–1127. https://doi.org/10.1109/tro.2018.2830326
9. Viset F, Helmons R, Kok M (2022) An extended Kalman filter for magnetic field SLAM using
gaussian process regression. Sensors 22(8):2833
10. Sun M et al (2020) Indoor positioning integrating PDR/geomagnetic positioning based on the
genetic-particle filter. Appl Sci-Basel 10(2):668
11. Kang R, Cao L (2017) Smartphone indoor positioning system based on geomagnetic field. In:
2017 Chinese automation congress (CAC), pp 1826–1830
12. Liu GX et al (2020) Focusing matching localization method based on indoor magnetic map.
IEEE Sens J 20(7):10012–10020. https://doi.org/10.1109/JSEN.2020.2991087
13. Wang J et al (2019) Performance test of MPMD matching algorithm for geomagnetic and RFID
combined underground positioning. IEEE Access 7:129789–129801
14. Shi LF et al (2023) Pedestrian indoor localization method based on integrated particle filter.
IEEE Trans Instrum Meas 72:1–10
15. Huang H et al (2018) An improved particle filter algorithm for geomagnetic indoor positioning.
J Sens 2018. https://doi.org/10.1155/2018/5989678
16. Zhang M et al (2017) Indoor positioning tracking with magnetic field and improved particle
filter. Int J Distrib Sens Netw 13(11). https://doi.org/10.1177/15501477177418
17. Zheng M et al (2017) Sensitivity-based adaptive particle filter for geomagnetic indoor
localization. In: 2017 international conference on communications in China (ICCC), pp 1–6
18. Lee S, Chae S, Han D (2020) ILoA: indoor localization using augmented vector of geomagnetic
field. IEEE Access 8:184242–184255
19. Kuang J et al (2018) Indoor positioning based on pedestrian dead reckoning and magnetic field
matching for smartphones. Sensors 18(12):21. https://doi.org/10.3390/s18124142
20. Ashraf I et al (2019) GUIDE: smartphone sensors-based pedestrian indoor localization with
heterogeneous devices. Int J Commun Syst 32(15):e4062. https://doi.org/10.1002/dac.4062
21. Subbu KP, Gozick B, Dantu R (2011) Indoor localization through dynamic time warping. In:
2011 IEEE international conference on systems, man, and cybernetics, pp 1639–1644
22. Qiu K et al (2018) Indoor geomagnetic positioning based on a joint algorithm of particle filter
and dynamic time warp. In: 2018 ubiquitous positioning, indoor navigation and location-based
services (UPINLBS), pp 1–7
23. Hui L et al (2014) TACO: a traceback algorithm based on ant colony optimization for geomag-
netic positioning. China conference on wireless sensor networks. Springer, Berlin Heidelberg,
pp 208–222
24. Ashraf I, Hur S, Park Y (2018) MPILOT-magnetic field strength based pedestrian indoor
localization. Sensors 18(7):22. https://doi.org/10.3390/s18072283
25. Stein P et al (2014) Leader following: a study on classification and selection. Robot Auton Syst
75:79–95
26. Montoliu R, Torres-Sospedra J, Belmonte, O (2016) Magnetic field based indoor positioning
using the bag of words paradigm. In: 2016 international conference on indoor positioning and
indoor navigation, pp 1–7
27. Ashraf I et al (2020) MINLOC: magnetic field patterns-based indoor localization using
convolutional neural networks. IEEE Access 8:66213–66227
28. Bae HJ, Choi L (2019) Large-scale indoor positioning using geomagnetic field with deep neural
networks. In: IEEE international conference on communications (ICC), pp 1–6
29. Bhattarai B et al (2019) Geomagnetic field based indoor landmark classification using deep
learning. IEEE Access 7:33943–33956
7 Magnetic Positioning Based on Evolutionary Algorithms 183
30. Chen Z et al (2023) Geomagnetic vector pattern recognition navigation method based on
probabilistic neural network. IEEE Trans Geosci Remote Sens. https://doi.org/10.1109/TGRS.
2023.3273552
31. Chengyi S, Yan S, Keming X (2000) Mind-evolution-based machine learning and applications.
In the 3rd World Congress on Intelligent Control and Automation, Vol 1, pp 112–117
32. Subbu KP, Gozick B, Dantu R (2013) LocateMe: magnetic-fields-based indoor localization
using smartphones. ACM Trans Intell Syst Technol 4(4):73
33. Bajpai P, Kumar M (2010) Genetic algorithm–an approach to solve global optimization
problems. Indian J Comput Sci Eng 1(3):199–206
34. Kumar A (2013) Encoding schemes in genetic algorithm. Int J Adv Res IT Eng 2(3):1–7
Chapter 8
Indoor Acoustic Localization
8.1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 185
K. Yu (ed.), Positioning and Navigation Using Machine Learning Methods, Navigation:
Science and Technology 14, https://doi.org/10.1007/978-981-97-6199-9_8
186 Z. Wang et al.
The chapter then explores the core algorithms that drive indoor acoustic localiza-
tion systems. It details the mathematical models and algorithmic strategies designed
to address and mitigate common challenges like NLOS conditions and multipath
interference, which are prevalent obstacles in indoor localization.
In the experimental results section, the chapter presents empirical data and case
studies that validate the theoretical approaches discussed earlier. This part not only
demonstrates the practical application and effectiveness of the technology but also
highlights areas requiring further research and development.
The chapter concludes with a summary that synthesizes the key discussions, reit-
erates the potential of indoor acoustic localization technology, and outlines future
directions for research. This final section solidifies the chapter’s contribution to the
field, advocating for continued advancement and application of this promising tech-
nology in satellite-denied environments.
In this section, the positioning base stations use Chirp signals. The base station is con-
trolled by STM32 and comes with a built-in GNSS chip for timing. All localization
algorithms are based on Time of Arrival (TOA) values for localization. At the same
time, the Lora chip is used to synchronize and calculate the TOA value between the
base station and the positioning terminal. Chirp signals are signals whose frequency
increases linearly with time, also known as Linear Frequency Modulation (LFM)
signals [11, 12]. They have good correlation properties and are widely used in the
field of communications. The expression for Chirp signals is as follows:
s(t) = cos 2π f 0 t + π kt 2
. (8.1)
In Eq. 8.1, . f 0 represents the starting frequency, .k denotes the frequency change
slope of the Chirp signal, which controls the rate at which the frequency changes over
time, and .T signifies the duration of the signal. To ensure Chirp signals exhibit better
correlation properties and higher signal recognition rates, this chapter selects a Chirp
signal duration of 45ms, with a modulation frequency range from 16 to 19 KHz, and
a signal frequency of 1 Hz.
When positioning devices emit near-ultrasonic high-frequency signals, audible
noise may occur at the start and end phases of the sound, known as low-frequency
leakage. To address this issue, this chapter introduces the use of Blackman window
functions and rectangular window functions at the start and end phases of the signal.
By attenuating the energy at these phases, the low-frequency leakage is reduced. At
the same time, the energy in the signal’s core area is maintained to ensure transmission
distance. The signal after the window function transformation can be obtained as:
where . X ( f ) and . S( f ) are the Fourier transforms of .x(t) and .s(t) respectively, and
S ∗ ( f ) is the complex conjugate of . S( f ). The variable .τ represents the time delay
.
between the received signal and the transmitted signal.
Using the waveform formed by the GCC, the arrival time of the signal can be
inferred from the position of the peak as:
Using the aforementioned algorithm, the GCC of Chirp signals with different
signal-to-noise ratios and the original Chirp signals can clearly identify the position
of the signal’s maximum peak. Further, by estimating the signal propagation delay,
the TOA values can be obtained, and the position coordinates can be solved. The
TOA estimation is ensured by synchronize base stations and ranging tags.
However, in real environments, due to the influence of noise, multipath propa-
gation, and NLOS, the peaks formed by GCC are not pronounced and it is difficult
to directly calculate the signal’s arrival time through the cross-correlation peak. To
address this issue, a frame-by-frame normalized GCC detection method is proposed,
which can get over the multipath effect in long range estimation as shown in Fig. 8.1.
The frame-by-frame normalized cross-correlation detection involves dividing the
signal into frames, performing cross-correlation calculations for each frame, and
normalizing the results of the cross-correlation signals. This process filters out the
effects of multipath and NLOS on the original signal, thereby better identifying the
peak values in the cross-correlation results. Figure 8.2 illustrates the frame-by-frame
normalized GCC detection process.
The steps for frame-by-frame normalized cross-correlation are as follows [10]:
1. Set the signal sliding window frame, length . L, and the sliding window step length
is .Δ, which should be greater than the length of the Chirp signal.
2. Calculate the cross-correlation coefficient for each frame according to the step
length. Take the first half of each frame to combine, resulting in a new combined
cross-correlation signal frame, with a frame length of . L/2.
8 Indoor Acoustic Localization 189
Fig. 8.1 Waveform (a), spectrum (b) and GCC diagram (c) of the received Chirps signal for the
NLOS case
3. Normalize the maximum amplitude of each frame signal,. Amax , and the maximum
value of the cross-correlation of each combined frame, .Cmax .
4. Use wavelet decomposition and the derivative method to find the position of the
peak.
Wavelet transform can be represented as:
dn ( ∗ )
. W f (u, s) = s n f θ̄s (u) (8.6)
du n
190 Z. Wang et al.
where .θ¯s is the rapidly decaying function of the wavelet signal’s vanishing moments,
and .s(t) is the original signal. It is evident that wavelet transform can remove noise at
different frequencies. The algorithm for peak finding using wavelet decomposition
is as follows:
1. Perform continuous wavelet transform on the output cross-correlation signal to
obtain the wavelet transform coefficient spectra at different decomposition scales;
the wavelet base signal is ‘db2’.
2. Reconstruct the wavelet transform coefficient spectra at different decomposition
scales.
3. Use the first derivative to find the extreme points for peak detection and merge
the results of peak detection to form a sequence of peaks.
4. Determine the appropriate correlation peaks from the peak detection results using
thresholding.
The results of the frame-by-frame normalized detection of the signal belonging
to Fig. 8.1 are shown in Fig. 8.3. It can be seen that the original intricate and difficult
to distinguish the waveforms of the inter-correlation diagram appeared six equally
spaced, highly recognizable wave peaks, corresponding to the signal 1 Hz refresh
rate. Therefore, based on the frame-by-frame normalized inter-correlation detection
can well overcome the multipath effect brought by the bending space, and the arrival
time of the signal can be calculated more accurately.
Based on the signal arrival time, the distance between the sending end and the
receiving end can be calculated as:
d = t0 c
. (8.7)
where .t0 is the TOA value, .c is the velocity of the sound in air.
Supposed there is .n base station, and the coordinate of each base station is
.(x i , yi , z i ), i = 1, 2, ..., n. The localization target coordinate is .(x t , yt , z t ), with
⎢ d2 ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎢
(x2 − xt )2 + (y2 − yt )2 + (z 2 − z t )2 ⎥
⎥
.X = ⎢ . ⎥ = .. (8.8)
⎣ .. ⎦ ⎢⎣√ .
⎥
⎦
dn (xn − xt )2 + (yn − yt )2 + (z n − z t )2
Assume that the distance from the device to each base station calculated using
Time Difference of Arrival (TDOA) is:
⎡ ⎤ ⎡ ⎤
d̂1 t1 c
⎢ d̂2 ⎥ ⎢ t2 c ⎥
⎢ ⎥ ⎢ ⎥
. X̂ = ⎢ . ⎥ = ⎢ . ⎥ (8.9)
⎣ .. ⎦ ⎣ .. ⎦
d̂n tn c
. X T D O A = X̂ − X̂ (1) (8.10)
Common TDOA positioning systems are typically composed of nodes and tags. After
receiving signals from nodes, the tags calculate the difference in signal arrival time,
further generating positioning results. This process relies on a premise: the distance
estimations of the nodes are known. In satellite navigation systems, satellites can
be considered as nodes, whose positions are provided by precise atomic clocks and
ephemerides. However, indoor positioning systems do not have determined orbits
or fixed placement positions, and thus, after system deployment, the nodes’ posi-
tions need to be calibrated. Given that the TDOA algorithm requires the nodes’
positions as known inputs, their accuracy directly impacts the overall precision of
the positioning system. Therefore, accurately determining the nodes’ positions is
a core step for TDOA positioning. The calibration of nodes can be divided into
manual calibration and self-calibration. Manual calibration requires the use of mea-
suring tools such as tape measures, laser rangefinders, and total stations, which is
not only time-consuming and labor-intensive but also prone to introducing calibra-
tion errors. When the area to be covered by the positioning system is large and
requires a number of nodes, the difficulty and time required for precise calibration
increase significantly. Adopting self-calibration technology based on signal inter-
actions between nodes can avoid the errors introduced by manual calibration. Its
192 Z. Wang et al.
accuracy depends on the signal interaction mode and the self-calibration algorithm,
avoiding errors introduced by human factors, which is beneficial for improving the
efficiency and precision of node calibration. It also overcomes difficulties faced with
manual calibration in adverse environments. Therefore, the use of self-calibration
technology based on signal interactions between nodes has significant importance in
the practical application of positioning systems. In recent years, there have been a
series of attempts to use Building Information Modeling (BIM) to improve the accu-
racy of self-calibration or positioning. These attempts have improved self-calibration
accuracy in experiments but lack theoretical support and analysis. It is necessary to
theoretically prove that map information constraints can reduce calibration errors
caused by ranging errors. In the field of positioning, the CRLB is commonly used as
the theoretical limit of positioning performance. The aim of this section is to quan-
titatively analyze the theoretical performance limits of self-calibration technology
with BIM and, based on this, design a self-calibration algorithm that integrates BIM.
The performance of the self-calibration algorithm will be analyzed and compared
with CRLB, optimizing the self-calibration algorithm under the condition that the
building information model remains unchanged.
For simplicity, consider a two-dimensional plane model. The main conclusions
of this section can also be extended to three-dimensional situations, which will not
be repeated here. The distance estimations of a series of unknown nodes are shown
as Eq. 8.10.
Nodes with known positions (anchor nodes) are denoted as:
a = {a1 , a2 , a3 , . . . , am }.
. (8.11)
The ranging result between unknown nodes .xi and .x j is denoted as .di j (1 ≤ i < j ≤
n), and the ranging result between unknown node .xi and known node .a j is denoted
as .di j (1 ≤ i ≤ n, 1 ≤ j ≤ m).
The self-calibration technique uses the known node distance estimations.a, as well
as all ranging results .di j (1 ≤ i < j ≤ n), di j (1 ≤ i ≤ n, 1 ≤ j ≤ m) to calculate
all the position node distance estimations . X . To simplify notation, consolidate all
unknowns into a vector . Z , which includes all unknown distance estimations in . X .
The total two-dimensional distance estimations of .n unknown nodes amount to .2n,
that is, the dimension of vector . Z .
. Z = [z 1 z 2 z 3 . . . z 2n ]T . (8.12)
To further simplify notation, consolidate all ranging results into a vector .d, which
contains a total of . N = n(n−1)
2
+ mn ranging values,
The cooperative self-calibration problem is to solve for all the unknown node distance
estimations using the ranging vector .d.
8 Indoor Acoustic Localization 193
In typical indoor environments such as malls and museums, BIM can provide
constraint models of the walls on which nodes are placed. When nodes are placed on
walls, their two-dimensional distance estimations will inherently have constraints.
The general model can be represented as follows:
f (z s1 , z s2 , z s3 , . . . , z s p ) = 0, i = 1, 2, . . . , l.
. i (8.14)
Here,. f i (·) represents the.i-th constraint function, with a total of.l constraint functions.
S represents the set of all constrained unknowns.
.
The above provides a universal constraint model, which often degenerates into
linear constraints in actual application scenarios. A single linear constraint can be
represented as:
. f = ei Z = 0, 1 ≤ i ≤ 2n.
T
(8.16)
Here, .ei is a .2n-dimensional vector with the .i-th element as 1 and the rest as 0.
This vector represents the constraint produced when the .i-th coordinate in the to-be-
determined coordinate vector . Z is known. In contrast, a full constraint appears as
pairs of linear constraints, and a pair of full constraints can be represented as:
. f = eiT en+i
T
Z = 0, 1 ≤ i ≤ n. (8.17)
Suppose the unbiased estimator . Ẑ of the parameter . Z satisfies .k (.k < 2n) contin-
uous and differentiable constraints
. f ( Ẑ ) = 0. (8.18)
The .k × 2n matrix . F(Z ) is the gradient matrix of these constraints, satisfying the
relationship:
∂ f (Z )
. F(Z ) = . (8.19)
∂ZT
Assuming the gradient matrix . F(Z ) is row full rank (meaning .k constraints are
independent), then there exists a matrix .U ∈ R 2n×(2n−k) , whose columns form an
orthogonal basis for the null space of . F(Z ), implying the following relationship:
. F(Z )U = 0. (8.20)
. E( Ẑ − Z )( Ẑ − Z )T ≥ U (U T J U )−1 U T . (8.21)
. x T (F J −1 F T )x = (F T x)T J −1 (F T x) ≥ 0. (8.24)
From the above, it follows that. F J −1 F T is a non-negative definite matrix, and thus its
inverse .(F J −1 F T )−1 is also a non-negative definite matrix. For .∀x ∈ R 3n , we have
C R L B( Ẑ ) = J −1 − J −1 F T (F J −1 F T )−1 F J −1 ≤ J −1 .
. (8.26)
This proves that the Cramer-Rao matrix of the estimator . Ẑ , subject to BIM con-
straints, is less than or equal to the unconstrained Cramer-Rao matrix in the positive
definite sense. The diagonal elements of .C R L B( Ẑ ) are the lower bounds of the
variance of . Ẑ . This proves that BIMs are helpful in improving the performance of
self-calibration.
In practical scenarios, another typical setup is for nodes to be fixedly placed at cor-
ners of walls. In this case, since the distance estimations of nodes placed at corners can
be completely determined by BIM, it forms a “full constraint”, equivalent to increas-
ing the number of anchor nodes. Additionally, when the number of nodes placed
at corners is sufficient, it is possible to eliminate the pre-arranged anchor nodes. A
sufficient number of nodes at corners can serve the role of anchor nodes, considered
as “self-constraint”, implying that effective constraints can be formed solely with
nodes, without the need for anchor nodes. This section’s simulation experiment aims
to compare the impact of BIM on CRLB under different constraint scenarios with
the same number of constraints. The design of node and anchor node positions for
this experiment is set in a 20 m .× 15 m rectangular room. Stars numbered 1 to 8
represent nodes uniformly distributed on the four walls (considering the practical
scenario where nodes are as evenly distributed as possible, with two nodes fixedly
allocated to each wall); small squares represent known anchor nodes with positions at
(5,5), (10,10), and (15,5) units; the rectangular outline represents walls. Simulation
layouts and results show the comparison of the average variance of node coordinate
errors under unconstrained, semi-constrained, fully constrained, and self-constrained
8 Indoor Acoustic Localization 195
scenarios (this metric is directly related to the CRLB, reflecting the theoretical lower
limit of self-positioning errors), comparing 2, 4, 6, and 8 constraints respectively.
For semi-constraints, the experiment sequentially adds groups of constraints: the x-
coordinates of nodes 1 and 2, the y-coordinates of nodes 3 and 4, and so on. As a
contrast, full constraints progressively turn the coordinates of nodes 1, 3, 5, and 7 into
known constraints, similarly increasing the number of constraints from 2 to 8. After
10,000 Monte Carlo simulations, the average variance under unconstrained condi-
tions is used as the baseline to compare the performance under semi-constrained and
fully constrained conditions, with results shown in Fig. 8.4. The experimental results
indicate that self-constraint, semi-constraint, and full constraint can effectively lower
the theoretical error lower limit, with self-constraint being the least effective and full
constraint being the most effective, but all significantly reduce the theoretical error
lower limit. With two constraints, semi-constraint can reduce the error lower limit
by 25.1%, and with eight constraints (under the most ideal semi-constraint condi-
tion, where every node has one coordinate determined), it can reduce the unknown
coordinate error lower limit by 47.5%.
In the TDOA scenario we’re focusing on here, the target that needs to be located sends
out a signal, which is then picked up by multiple sensor nodes. These nodes work
on the signals they receive, comparing them in pairs using something called GCC to
figure out where the target is. The usual way of doing TDOA positioning picks one
node as a reference, and subtracts the signal arrival times at the other nodes from it
to get a set of TDOA values. When we know what the original signal’s waveform
looks like, we can cross-correlate the received signals with an ideal version of the
signal to pinpoint the signal’s arrival time. We can also cross-correlate the received
signals with each other. However, when the original signal’s waveform is unknown—
like with sounds of explosions, underwater vessels, or bird calls, which are not
artificially modified signals - then we only can cross-correlate the received signals
with each other to calculate the differences in signal arrival times. But, the existing
GCC techniques might not always work perfectly because signals can get distorted
8 Indoor Acoustic Localization 199
as they travel, and the cross-correlation values do not always meet the zero-sum
condition. This is where the full set TDOA comes in handy, as it can significantly
improve positioning accuracy in these situations. The research on the following
techniques are diving into to further explore full set TDOA positioning [23–25].
In a passive localization system, the signal received by each node is denoted as
.u t . This is the signal sent from the source, .s(t), which after traveling a period of .di , is
recorded at the node. The signal during its journey will be influenced by the channel
conditions, introducing an error, .ηi . Therefore, under ideal conditions, the received
signal can be modeled as follows:
u (t) = s(t − di ) + η
. i (8.27)
The error .ηi should follow a Gaussian distribution when channel conditions are
good. This error introduces some inaccuracies in the TDOA values when cross-
correlating the received signals. Furthermore, depending on the GCC technique
used, there can be an impact on the TDOA measurements. This impact relates to
the signal’s spectral intensity . S(w) and the noise’s spectral intensity . N (w). The the-
oretical expression varies with the specific technique used, which we will not detail
here. We denote the function that calculates the position of the maximum value
after cross-correlation processing as .Φ(•), so the TDOA value .di j obtained from
cross-correlating signals .u i and .u j can be calculated as:
d = Φ(u i , u j ).
. ij (8.28)
the TDOA measurement values .di j . The coordinates of sensor nodes are acquired
through manual calibration or self-calibration techniques, and their accuracy will
ultimately affect the precision of the full set TDOA; TDOA values are obtained
through GCC between signals received by pairs of nodes. With .n nodes, this method
can yield up to .n(n − 1)/2 linearly independent TDOA values. Generally, these
TDOA values contain line-of-sight errors introduced by signal distortion, cross-
correlation algorithms, etc., which can be calculated after numerous measurements
to obtain their cross-correlation matrix . Q.
For uniform representation, subsequent parts will use the measurement .ri j , con-
sidering the signal propagation speed instead of TDOA measurements, with its true
value denoted as .rioj , which can be defined as:
Here, .u o and .si respectively represent the true position of the target to be located
and the positions of the .ith node. Squaring both sides of the above equation yields a
linear expression:
. Bn ≈ h − Gθ, (8.32)
where .θ represents a vector containing the true coordinates .u and auxiliary variables.
. B, .G, and .h are respectively matrices or vectors that can be calculated through node
coordinates . S and measurement values .ri j . Therefore, the unknown vector .θ can be
directly calculated using weighted least squares:
.θ = (G T W G)−1 G T W h, (8.33)
where .W is the known error weight matrix. This yields a closed-form weighted least
squares solution, upon which further optimization can be performed to enhance its
positioning performance, making it reach the CRLB performance. This solution can
serve as an initial estimate for a more accurate solution.
Simulation results indicate in Fig. 8.6 that the SDP approach can achieve superior
performance under low noise conditions and is among the closest to the MLE among
various schemes. However, this outcome is only ideal when the channel conditions are
perfect, and the collected TDOA values contain only LOS information. When the set
8 Indoor Acoustic Localization 201
includes NLOS information, relying solely on SDP techniques can result in signifi-
cant errors. Moreover, in real-world scenarios, NLOS noise conditions, influenced by
the channel and the signals used, and caused by reflection, refraction, diffraction, and
other factors, vary. Thus, using a probability distribution to model NLOS noise is not a
universally applicable solution. Further research will focus on developing a universal
NLOS signal discrimination scheme to broaden the applicability of this approach.
10m
30m 10m
50m
(a)Experimental Scene
Localization Station
UW
B
Speaker
Signal
Processing
Model
and UWB positioning, as well as Figure 8.8, it can be concluded that in the narrow,
metal-shielded, and bending space of cable tunnels, the testing performance of near-
ultrasound base stations exceeds the existing UWB positioning technology in both
accuracy and stability.
The evaluation study uses the root √mean square error (RMSE) as a performance
∑
C
measure, and it is defined as . R M S E ||u(i) − uo ||2 /C, where .u(i) is the estimate
i=1
from the .i-th run by the Monte Carlo (MC) method, and .C is the number of MC
experiments. If there is no extra explanation, the RMSEs of different methods are
generated by 1000 MC trials.
We start with the source localization scenarios in 2D space (the dimension. K = 2),
where the source and . M sensors are uniformly distributed in a square with a side
length of 100 m. In each MC trial, the source broadcasts ranging signal . M − 1 times,
where . M sensors can hear the signals from the source. In the . j−th broadcast, sensor
. j is the reference sensor and the TDOA measurements .r j+1, j , r j+2, j , . . . , r M, j are
recorded. Thus we can obtain a suitable full TDOA set, where the noise correlation
matrix .R consists of . M − 1 diagonal blocks and the diagonal elements in each
block are 1 and the other elements 0.5. If there is no additional explanation, the
measurements in the full TDOA set have the same variances, so the covariance
matrix .Q = σ 2 R, where .σ 2 is the measurement noise power. The above scenario
is just one way to obtain a full TDOA set in the simulations. In the real world the
measurements are from the results of GCC, and the corresponding covariance matrix
can be evaluated by real data.
A practical NLOS rejection method’s performance should be guaranteed in dif-
ferent scenarios, including the LOS environments. Although localizing using LOS
measurements is not a challenge for the current approaches, they may cause consid-
204 Z. Wang et al.
30
20
10
10log10 (MSE(m2 ))
-10
-20 GS-RSC
GS-RSC(SDP)
G2+G3
-30 R-DeN
DS
CRLB
-40
-30 -25 -20 -15 -10 -5 0 5 10
20log10 ( (m))
Fig. 8.9 The algorithms’ MSEs versus the LOS noises’ levels
erable performance loss since many LOS TDOAs are mistaken as NLOS. Therefore,
it is essential to investigate whether the proposed GS-RSC scheme’s performance is
promising with LOS measurements.
LOS noises with known variance .σ 2 are added to the true range difference mea-
surements to simulate the noisy TDOA measurements. Besides the proposed method,
we additionally introduce several state-of-the-art full-set algorithms. The G2+G3 [7],
R-DeN [22] and data-selective (DS) [1] are included for comparison. Besides, we
also perform the CRLB in Section as the criterion. The G2+G3and R-DeN are just
NLOS measurement detection algorithms without localization capability. In order
to give a fair comparison, we use a closed-form solution, Chan algorithm [6] or
the CFS solution in this chapter to localize the source after removing the NLOS
measurements.
The results of the compared full-set algorithms are shown in Fig. 8.9 versus mul-
tiple noise levels, which are generated by 1000 MC trials in a network of 7 sensors.
Besides the CFS, we also provide the results generated by the SDP-FS solution, as
the ‘GS-RSC(SDP)’ in Fig. 8.9. Due to their data-cleaning strategies, the DS and
R-DeN methods have a little localization performance loss. The DS only chooses
a stable number of measurements and discards the other useful ones that are from
LOS, causing performance loss, especially when the LOS noise power is equal to or
larger than .0.1 m 2 . The R-DeN method needs to decide a stopping threshold when
calculating, which limits its performance for a small LOS noise level. GS-RSC and
G2+G3 perform similarly relative to the CRLB since their rejection modules nearly
keep all the LOS signals, especially when the noise level is insignificant. GS-RSC
(SDP) is more robust when.σ 2 is bigger than.1 m 2 , i.e.,.20log10 (σ (m)) = 0. Since the
8 Indoor Acoustic Localization 205
GS-RSC (SDP) outperforms the other methods only when the LOS noise is signifi-
cant, if its performance is close to the GS-RSC method, we do not reveal its results
in the subsequent figures.
The localization scenarios with outliers have been examined in the former numer-
ical examples. Besides, NLOS propagation is also a common phenomenon worthy
of discussion. Specifically, if the path between a sensor, e.g., .s1 , to the source is
occupied by obstacles, the corresponding range measurement will contain a positive
bias. Therefore, the TDOA measurements related to .s1 share the same NLOS bias. In
the following numerical examples, we consider a randomly selected sensor .si which
is suffering from the NLOS propagation, and the corresponding range measurement
.ri is the sum of its true value .ri and an NLOS bias .ηi . The NLOS-path error .ηi is
o
uniformly distributed, which satisfies .ηi ∼ (20, 40) m. Due to the NLOS propaga-
tion, the full set contains six relevant NLOS measurements. Besides, we randomly
chose 0 to 4 TDOAs to become outliers to simulate a more complex situation. First,
we examine the full-set algorithms’ ability to search outliers. Table 8.2 reveals the
identification accuracy of different methods with the ability to classify when the
variance of LOS noise is .0.1 m 2 . We also show whether they can accurately remove
the measurements polluted by the NLOS-path error. The results reveal that although
the G2+G3 and R-DeN methods can still find the outliers under the interference of
the NLOS path, the NLOS path is a latent harmful factor that they cannot detect,
significantly limiting their localization performance.
Furthermore, in Fig. 8.10, we show the algorithms’ localization performance ver-
sus the LOS noise’s level. Since the G2+G3 and R-DeN methods cannot eliminate
the influence carried by the NLOS path, their performance is dominated by the unde-
tected NLOS measurements, which means that the NLOS path will significantly
aggravate the localization error, even though they can deal with outliers. On the con-
trary, the GS-RSC and DS approaches can still maintain high accuracy and efficiently
identify the NLOS measurements. Especially, the GS-RSC still achieves the CRLB
performance in the presence of the NLOS path and outperforms the DS method in
most cases.
206 Z. Wang et al.
30
20
10
10log10 (MSE(m2 ))
-10
-20
-30 GS-RSC
G2+G3
R-DeN
-40 DS
CRLB
-50
-40 -35 -30 -25 -20 -15 -10 -5 0
20log10 ( (m))
Fig. 8.10 The algorithms’ MSEs versus the LOS noise levels with an NLOS path
8.6 Conclusion
References
15. Liu K, Liu X, Li X (2015) Guoguo: enabling fine-grained smartphone localization via acoustic
anchors. IEEE Trans Mob Comput 15(5):1144–1156
16. Liu Y, Zhang W, Yang Y, Fang W, Qin F, Dai X (2019) RAMTEL: robust acoustic motion
tracking using extreme learning machine for smart cities. IEEE Internet Things J 6(5):7555–
7569
17. Shang Y, Ruml W (2004) Improved MDS-based localization. In: IEEE INFOCOM 2004, vol 4.
IEEE, pp 2640–2651
18. Shang Y, Ruml W, Zhang Y, Fromherz MP (2003) Localization from mere connectivity. In:
Proceedings of the 4th ACM international symposium on mobile ad HOC networking and
computing, pp 201–212
19. Shi Q, He C, Chen H, Jiang L (2010) Distributed wireless sensor network localization via
sequential greedy optimization algorithm. IEEE Trans Signal Process 58(6):3328–3340
20. Simonetto A, Leus G (2014) Distributed maximum likelihood sensor network localization.
IEEE Trans Signal Process 62(6):1424–1437. https://doi.org/10.1109/TSP.2014.2302746
21. Tomic S, Beko M, Dinis R (2014) RSS-based localization in wireless sensor networks using con-
vex relaxation: noncooperative and cooperative schemes. IEEE Trans Veh Technol 64(5):2037–
2050
22. Velasco J, Pizarro D, Macias-Guarasa J, Asaei A (2016) TDoA matrices: algebraic proper-
ties and their application to robust denoising with missing data. IEEE Trans Signal Process
64(20):5242–5254
23. Wang Y, Ho K, Wang Z (2023a) Robust localization under NLOS environment in the presence
of isolated outliers by full-set TDoA measurements. Signal Processing 212:109159
24. Wang Y, Sun P, Wang Z (2023b) Towards low-complexity state estimation for rigid bodies
based on range difference measurements. Electron Lett 59(22):e13020
25. Wang Y, Sun P, Wang Z (2023c) Towards robust and accurate cooperative state estimation for
multiple rigid bodies. IEEE Trans Veh Technol
26. Wang Z, Zheng S, Ye Y, Boyd S (2008) Further relaxations of the semidefinite programming
approach to sensor network localization. SIAM J Optim 19(2):655–673
27. Wang Z, Zhang H, Lu T, Gulliver TA (2018) Cooperative RSS-based localization in wireless
sensor networks using relative error estimation and semidefinite programming. IEEE Trans
Veh Technol 68(1):483–497
Chapter 9
Scalable and Accurate Floor
Identification via Crowdsourcing
and Deep Learning
Fuqiang Gu, You Li, Yuan Zhuang, Jingbin Liu, and Qiuzhe Yu
9.1 Introduction
F. Gu (B)
College of Computer Science, Chongqing University, Chongqing, China
e-mail: [email protected]
Y. Li · Y. Zhuang · J. Liu
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing,
Wuhan University, Wuhan, China
Q. Yu
Meituan Co., Beijing, China
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 209
K. Yu (ed.), Positioning and Navigation Using Machine Learning Methods, Navigation:
Science and Technology 14, https://doi.org/10.1007/978-981-97-6199-9_9
210 F. Gu et al.
Floor identification has attracted the attention of researchers in recent years. An early
method for floor identification is SkyLoc [5], which uses widely available cellular
signals to identify the current floor of a user in multi-floor buildings. SkyLoc can
achieve a floor identification accuracy of about 73% by selecting features with high
relevance for fingerprint matching. However, it requires to manually build a radio map
consisting of cellular RSS and corresponding floor levels, which is time-consuming
and labor-intensive. Also, the achieved accuracy is not quite satisfactory. Ai et al. [7]
propose a method that uses WiFi signals to locate the floor and accelerometer and
barometer readings to detect the floor change. While it reaches a high accuracy of
99%, it requires an intensive site survey process to construct a radio map. In [17], a
deep learning-based AP-independent floor identification method is introduced, which
leverages WiFi signals to generate images that are then fed to a convolutional neural
network for floor identification. Zhang et al. [6] presents a floor identification method
using cellular signals, which first uses denoising autoencoder for data noise reduction
and feature extraction, and then utilize a Long Short-Term Memory (LSTM) network
for floor identification. Qi et al. [9] introduce the confidence interval of WiFi signals,
on which they further develop a fast floor identification method. However, such
methods still require troublesome site survey.
To expedite the troublesome site survey process, researchers have made several
efforts. Khaoampai et al. [8] propose a method called FloorLoc-SL, which collects
WiFi fingerprints via a self-learning algorithm. While FloorLoc-SL does not require
a site survey process, it achieves only an accuracy of 87% and asks the user to
input the initial floor number when starting the system. FTrack [18] locates the floor
by using smartphone accelerometer readings to detect the traveled floors. While
FTrack reports an accuracy of 90%, it is not robust to varying device orientation
and user’s motion states, and requires the knowledge of initial floor level. F-Loc
[19] improves the FTrack by considering both WiFi signals and accelerometer read-
ings, and reports an accuracy of 95%. F-Loc constructs the WiFi radio map through
212 F. Gu et al.
Figure 9.2 provides an overview of the proposed UnFI method, which consists of a
training phase and an identification phase. During the training phase, various sensor
readings from smartphones are collected to create a fingerprint database. Specifically,
GNSS, WiFi, and barometer data are gathered through a crowdsourcing approach,
requiring no effort from the user, such as a site survey or manual input of the initial
floor level. In the identification phase, the user’s current floor number is determined by
comparing the measured WiFi RSS with the data stored in the fingerprint database.
The testing device can be a low-end phone equipped only with WiFi. To ensure
high floor identification accuracy, we have developed a deep learning-based method,
which will be detailed later.
UnFI consists of three key components: ground floor detection, floor association,
and floor identification. The ground floor detection and floor association occur during
the training phase, while floor identification takes place in the identification phase.
Ground Floor Detection: This component detects the ground floor by utilizing
features extracted from GNSS and magnetometer measurements. To establish the
ground floor’s pressure value, the user must walk on the ground floor of the building
at least once, recording sensor data (GNSS, WiFi, magnetometer, and barometer
data) for a period of time, such as 5 min. The barometric pressure recorded on the
ground floor serves as a reference value for determining different floors.
9 Scalable and Accurate Floor Identification via Crowdsourcing and Deep … 213
Floor Association: Using the reference barometric pressure value, the floor associ-
ation component associates the collected WiFi RSS measurements with the corre-
sponding floor levels. To expedite the construction of the fingerprint database, we
employ semi-supervised learning with sensor data sequences that may not include
ground floor data.
Floor Identification: After the fingerprint database is built, it is used to identify the
current floor level of a user via a deep learning method.
The following subsections will elaborate on each of these three components.
We identify the ground floor by utilizing features extracted from GNSS and magne-
tometer measurements to detect the transition between indoor and outdoor environ-
ments. This ground floor information is crucial for associating WiFi RSS measure-
ments with the corresponding floor levels. Although the light sensor has been effec-
tively used for indoor/outdoor (IO) switch detection [21], it is influenced by the
phone’s orientation and weather conditions, requiring the user to hold the phone so
214 F. Gu et al.
Fig. 9.3 An example of using the change in the number of visible satellites for IO switch detection.
a The number of visible GNSS satellites changes as the user exits and enters a building. b Indoor,
semi-indoor (entrance or exit areas), and outdoor scenarios
that the screen faces the ceiling or sky. To develop a more robust ground floor detec-
tion method, we rely on GNSS and magnetometer signals to detect the IO switch
(Fig. 9.3).
Let ΦGNSS denote the sequence of the GNSS measurements, namely
where gt is the GNSS measurements at time t, and T is the ending time of collecting
this sequence. Similarly, the magnetometer measurement sequences and barometric
pressure sequences are expressed as:
where mt and pt represent the magnetic field and barometric pressure collected at time
t, respectively. It should be noted that the length of the three sequences are usually
different due to the varying sampling rates of different sensors. However, we can use
9 Scalable and Accurate Floor Identification via Crowdsourcing and Deep … 215
time interpolation method to easily make the three sequences with the same length.
Therefore, we use the same measurement index in Eqs. (9.1)–(9.3). To achieve the IO
switch detection, we need to segment each of these three measurement sequences into
N shorter sequences using a sliding window. Thus, we can obtain three { measurement }
{ 1 }
sequence sets SGNSS = SGNSS , . . . , SGNSS from ΦGNSS , SMag = SMag
N 1
, . . . , SMag
N
{ 1 }
from ΦMag , and SPressure = SPressure , . . . , SPressure
N
from ΦPressure .
To achieve robust IO detection, we extract 9 different features from satellite
measurement sequences, including the number of visible GNSS satellites, the mean,
variance, standard deviation, maximum, minimum, median, range, and interquartile
of visible satellite carrier-to-noise ratio (CNR). Similarly, 8 different features are
obtained from magnetometer sequences, including mean, variance, standard devia-
tion, maximum, minimum, median, range, and interquartile. Based on these extracted
features, we use the popular ResNet [22] neural network to detect IO switch due to
its excellent performance. However, the original ResNet was proposed for dealing
with images, which is not directly suitable for our case. Therefore, we modify the
popular ResNet network from 2D network into 1D network to adapt the WiFi-based
floor identification.
The ground floor identification algorithm using the GNSS and magnetometer
measurements is described in Algorithm 1. This algorithm takes as input the sensor
reading sequences ΦGNSS , ΦMag , and ΦPressure , and outputs the reference pressure
pg and time Tg on the ground floor. Firstly, the GNSS measurement sequence,
magnetometer sequence, and pressure sequence are segmented into sequences
using a sliding window. Based on the resulting sequences of GNSS and magne-
tometer measurements, we extract time-series features and statistical features. After
extracting features, the ResNet1D method is used to detect IO scenarios. If the user
is detected moving from indoor to outdoor or from outdoor to indoors, the median
timestamp in the several GNSS measurement sequences, which corresponds to being
indoor before or behind the switch happens, is considered as the reference time Tg
on the ground floor. Note that n1 and n2 between line 11–15 are constant parameters
to avoid detection error, which are both set to 3 in this work. Finally, based on the
reference time, the sequence of pressure whose timestamps is closest to the reference
time is selected, and the average value of this pressure sequence is considered as the
reference pressure Pg on the ground floor.
216 F. Gu et al.
Fingerprint-floor level association consists of two stages: initial labeling and finger-
print expansion. During the initial labeling stage, RSS measurements from sequences
that traverse the ground floor and multiple floors are labeled using the reference pres-
sure and time data collected on the ground floor. In the fingerprint expansion stage,
RSS measurements from all sequences, even those not including ground floor data,
are labeled through semi-supervised learning.
We first obtain the reference pressure and time on the ground floor, and then associate
the RSS measurements in the sequence with corresponding floor levels according
to the change of barometric pressure. Figure 9.4 shows the change of barometric
pressure across multiple floors. It is observed that different phones witness different
barometric pressure values though they are placed on the same floor and the baro-
metric pressure varies obviously as the floor changes. This implies that one cannot
directly use the absolute values of barometric pressure to identify floors. However,
the pressure difference between two floors is relatively stable and independent of
9 Scalable and Accurate Floor Identification via Crowdsourcing and Deep … 217
Fig. 9.4 The change of barometric pressure across multiple floors (the user carrying two phones
of different models at the same time and walking across multiple floors). It shows that the relative
pressure difference is stable between different floors regardless of phone models
phone models. This means that we could recognize different floors by using the rela-
tively stable pressure difference. It should be also noted that the barometer readings
are affected by the environment temperature, humidity, altitude (or height), and the
used devices [11]. Figure 9.5 shows that the barometric pressure varies over time
and different devices report different barometric values even when they are put on
the same desk to measure the barometer readings during the same period of time.
Fortunately, the barometric pressure in indoor environments is relatively stable
during a short period of time (e.g., 10 min). The pressure variation (around 0.1 hPa)
on the same floor during the short period of time is much smaller than the difference
(around 0.4 hPa given the floor height is about 3 m) between different floors. Thus,
it is feasible to use the pressure difference during a short period of time to recognize
the floor change.
After obtaining a small set of labeled fingerprints from initial labeling stage, we use
semi-supervised learning to obtain more labeled fingerprints in order to achieve a
higher floor identification accuracy (fingerprint expansion). In the fingerprint expan-
sion stage, only WiFi RSS measurements and barometer readings collected from the
traces crossing multiple floors are required.
Here we first give the pseudo-code of the fingerprint expansion algorithm in
Algorithm 2, and then present our analysis. This algorithm takes as input a dataset
consisting of RSS measurements (represented by ΦRSS ) and barometer readings
(represented by ΦPressure ), along with labeled fingerprints acquired during the initial
218 F. Gu et al.
Fig. 9.5 The change of barometric pressure over time at the same location point. It implies that the
barometric pressure is only relatively stable at the same point during a short period of time (e.g.,
10 min)
labeling stage (denoted by L). Initially, it identifies the entry and exit points of stairs
and elevators using pressure measurements, with the respective timestamps noted as
Tb . Subsequently, data captured while traversing stairs and elevators are excluded
to ensure the fidelity and accuracy of the fingerprint repository. The remaining RSS
data are then divided into sequences {SRSS }Ni=1 , where N signifies the sequence count.
The relative floor labels Fr are derived from pressure readings and are sequentially
numbered either upwards or downwards (increasingly for descending and decreas-
ingly for ascending floors). The estimated floor labels for each RSS sequence are
recorded as Fi . Here, Fr is a vector of length N and Fi is a vector of length ni , where
ni is the number of samples in the ith RSS sequence.
It should be noted that the acquired floor labels Fi may be prone to inaccuracies
due to classification errors, necessitating refinement through neighboring constraints
from Fr . For instance, fingerprints collected on a single floor might be erroneously
categorized into different floors. To mitigate this, the algorithm simultaneously eval-
uates the proximity relationships inferred from pressure measurements and those
derived from the classified results of fingerprint sequences. During the refinement
process, each floor label within the initial sequence’s floor set is sequentially consid-
ered as the starting floor. The corresponding probability of accurate classification
(denoted by pj ) is calculated by dividing the number of fingerprints with labels
matching the presumed label l1 by the total number of fingerprints in the sequence.
Subsequently, this probability pj is iteratively adjusted through a similar process
applied to subsequent sequences. Upon updating all probabilities, the initial floor
with the highest probability is identified. Consequently, the absolute labels corre-
sponding to the unlabeled fingerprints are obtained and integrated into the fingerprint
database L alongside the existing fingerprints.
9 Scalable and Accurate Floor Identification via Crowdsourcing and Deep … 219
After constructing the required fingerprint database, we can identify the floor level
for incoming RSS measurements. Let x represent a fingerprint, which is a vector of
WiFi RSS measurements, namely
With M denoting the total number of visible WiFi APs in the environment, Since
raw WiFi RSS values are negative, we adopt the positive RSS data description method
in [23] and transform raw RSS values into positive values, namely
(
(rssi − rssmin ), if APi ∈ x and rssi > τ
prssi = (9.5)
0, otherwise
220 F. Gu et al.
where τ is a RSS threshold, which is set to −100 dBm in this study, indicating
whether a WiFi AP is detected in a fingerprint, and rssmin is the minimum RSS from
WiFi APs. These APs with RSS lower than τ are considered as not-detected. Thus,
x can be re-written as:
labeling and fingerprint expansion) and two sets for testing. In the shopping mall,
nine sets of data were collected, where five were used for training, and four for
testing. The dataset profiles are given in Table 9.1.
We first evaluate the proposed IO switch detection method with other popular
methods including Support Vector Machines (SVM), Random Forests (RF), and
Naive Bayes (NB) based on the measurements of GNSS and magnetometer. These
measurements are first segmented into sequences using a window of 3 s, and the
resulting sequences are then split into training data (80%) and test data (20%). The
proposed IO detection is based on the popular ResNet network (specifically ResNet-
18). However, ResNet is used to deal with images and cannot be directly used for
dealing with GNSS and magnetic data. Therefore, we modify the popular ResNet
network from 2D network to 1D network to adapt our case, and re-train the whole
network. We use accuracy as the performance metric for IO switch detection and
floor identification. The accuracy is defined as the probability of detection, which is
defined as the ratio of correctly predicted samples over the total number of samples.
Table 9.2 shows that the proposed method performs the best, achieving an accu-
racy of 97.7%, which is higher than other methods. The proposed method can also
make use of multi-modal data, which is not true for RF, NB, and SVM methods.
It can be also seen that GNSS-based methods usually have better performance than
magnetometer-based methods. This might be attributed to that GNSS measurements
in outdoor environments are more distinguishable from indoor environments due to
the obstacles such as buildings and trees. The high-accuracy IO detection ensures that
222 F. Gu et al.
we can accurately detect the ground floor by voting the results from several sequences,
and further enables us to automatically and accurately label WiFi fingerprints.
We have first conducted experiments to evaluate the floor association accuracy, and
experimental results show that the proposed floor association method can correctly
associate all the fingerprints with corresponding floors with an accuracy of 100%.
This is due to the use of the stable characteristics of the barometer readings that the
pressure difference between two floors is stable during a short period of time.
We then show the floor identification performance of the proposed method using
different amount of training data and compare its accuracy with that of state-of-the-
art methods. Note that this training data is not labeled manually, but is automatically
labeled by detecting the ground floor and using the relatively stable air pressure differ-
ence between two floors. The baseline methods are K-Nearest Neighbors (KNN) [25]
(the number of k is set to 3), SVM [26], NB [27], RF [28], and Autoencoder (AE) [29].
For the AE, we use two layers for pretraining with 512 and 256 neurons, respectively,
and add a softmax layer on the top to identify floors.
Figure 9.6 shows the performance of the proposed method and the five baseline
methods on the data collected in the office building. We can see that the floor identi-
fication accuracy of different methods generally increases when the amount of used
training data rise. When using the data collected from one trajectory for training,
KNN performs the best (84.4%), which is followed by AE (84.1%) and our method
(83.8%). When using more training data, the performance of the proposed method is
9 Scalable and Accurate Floor Identification via Crowdsourcing and Deep … 223
Fig. 9.6 The floor identification accuracy of different methods in the office building
significantly improved, and reaches up to 99.3% with training data from four trajec-
tories, which is higher than the baseline methods. This is because our method uses
more layers and needs more data to be well trained.
Figure 9.7 demonstrates the accuracy of these methods on the data collected in
the shopping mall. One can find that the proposed method outperforms the baseline
methods except for the case of using the data from three trajectories for training,
where the AE method (91.9%) performs slightly better than our method (91.3%).
When using the data from four trajectories for training, our method can achieve
an accuracy of 98.6%, which is significantly higher than AE (86.1%), RF (85.8%),
SVM (83.8%), KNN (80.7%), and NB (62.4%). However, using more training data
does not necessarily improve the floor identification accuracy of some methods since
more training data means that there are more APs visible, resulting in a higher input
dimension. This can be justified by the decrease in the floor identification accuracy
of KNN and RF when the training data increases from three trajectories to four
trajectories.
Fig. 9.7 The floor identification accuracy of different methods in the shopping mall
were used for collecting training data and test data. During the data collection, both
phones were held in hand together by the same participant in the two cases. Figures 9.8
and 9.9 show the floor identification accuracy of different methods using one set of
training data and multiple sets of training data, respectively. We can see that all the
methods perform better when the test data are collected from the same phone used to
collect training data. Moreover, the proposed method outperforms significantly the
baseline methods, and is much more robust to hardware diversity/heterogeneity of
different phones. When using multiple sets of training data, the effect of hardware
diversity is significantly reduced, which is justified by the accuracy improvement
for both methods shown in Fig. 9.9 compared to that shown in Fig. 9.8. It is also
observed that the achieved accuracy with phone 2 is higher than that with phone 1.
This might be because the WiFi sampling rate of phone 2 is about 1.5 times higher
than that of phone 1 and the WiFi signal strength of phone 2 is more stable.
We finally compare the computational cost of the proposed method with the baseline
methods. These algorithms were run on a PC with an Intel i9-10900K CPU and a
NVIDIA GeForce RTX 2080 GPU. From Fig. 9.10, we can see that AE witnesses
the lowest computational cost due to its shallow structure, which is followed by NB
and SVM. The consumed time of the proposed method is about 5.8 s, which is about
1.5 times higher than the KNN method that has highest computational cost among
9 Scalable and Accurate Floor Identification via Crowdsourcing and Deep … 225
Fig. 9.8 The floor identification accuracy of different methods using one set of training data
Fig. 9.9 The floor identification accuracy of different methods using multiple sets of training data
226 F. Gu et al.
the five baseline methods. Given that the number of test data samples is 4878, the
computational time of the proposed method is about 1.2 ms per sample, which is still
very low. Since the computational capability of modern smartphones is powerful, we
think the computational cost is acceptable for real-time localization.
In this section, we compare the proposed UnFI method with the state-of-the-art
methods in terms of identification accuracy, sensors required for collecting training
data and test data, requirement for site survey, and other constraints or assumptions.
Note that the accuracy for the state-of-the-art methods is from the corresponding
papers and obtained from different datasets in different environments. From Table 9.3,
we can see that the proposed UnFI method can achieve a very competitive floor
identification accuracy (about 99%) and has no requirement for site survey. Existing
methods have either the requirement for site survey, which is time-consuming and
labor-intensive, or other constraints such as initial floor knowledge and user encoun-
ters. For example, B-Loc achieves a similar accuracy as our method, but it assumes
that users meet each other at the elevators, which is not a realistic assumption. Also,
it needs to give the initial floor information during the construction of barometric
map.
9 Scalable and Accurate Floor Identification via Crowdsourcing and Deep … 227
9.5 Conclusion
In this chapter, we present a novel method that can achieve high-accuracy floor iden-
tification without any effort from the user. Different from existing methods, which
suffer from varying limitations, our method does not require site survey, user encoun-
ters, initial floor knowledge, and other assumptions. Experimental results show that
the proposed UnFI can achieve an accuracy of about 99% in floor identification
outperforming a number of state-of-the-art methods.
Funding This paper is supported by the National Natural Science Foundation of China (No.
42174050, 41874031, 42111530064), and Venture & Innovation Support Program for Chongqing
Overseas Returnees (No. cx2021047).
228 F. Gu et al.
References
23. Torres-Sospedra J, Montoliu R, Trilles S, Belmonte OB, Huerta J (2015) Comprehensive anal-
ysis of distance and similarity measures for wi-fi fingerprinting indoor positioning systems.
Expert Syst Appl 42(23):9263–9278
24. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
25. Hu X, Shang J, Gu F, Han Q (2015) Improving wi-fi indoor positioning via ap sets similarity
and semi-supervised affinity propagation clustering. Int J Distrib Sens Netw 11(1):109642
26. Zhang S, Guo J, Wang W, Hu J (2018) Floor recognition based on svm for wifi indoor
positioning. In: China satellite navigation conference, pp 725–735
27. Ashraf I, Hur S, Shafiq M, Park Y (2019) Floor identification using magnetic field data with
smartphone sensors. Sensors 19(11):2538
28. Zhang X, Sun W, Zheng J, Xue M, Tang C, Zimmermann R (2022) Towards floor identification
and pinpointing position: a multistory localization model with wifi fingerprint. Int J Control
Autom Syst 20:1484–1499
29. Gu F, Khoshelham K, Valaee S, Shang J, Zhang R (2018) Locomotion activity recognition
using stacked denoising autoencoders. IEEE Internet Things J 5(3):2085–2093
Chapter 10
Indoor Floor Detection and Localization
Based on Deep Learning and Particle
Filter
10.1 Introduction
The growing demand for location-based services (LBS) has catalyzed active research
in the field of localization. Although satellite-based schemes offer robust LBS out-
doors, they are inefficient indoors due to obstructions caused by buildings. Because
most people spend the majority time inside buildings, the research related to indoor
localization become an important research topic. As the most commonly used elec-
tronic devices, smartphones possess a combination of micro-electromechanical sys-
tems (MEMS) sensors and progressively improving computational capabilities. This
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 231
K. Yu (ed.), Positioning and Navigation Using Machine Learning Methods, Navigation:
Science and Technology 14, https://doi.org/10.1007/978-981-97-6199-9_10
232 C. Lin and Y. Shin
makes them promising platforms for the development of indoor localization solu-
tions. Among these, the most straightforward method to obtain a smartphone user’s
location is to use received wireless signal information, including ultra-wideband
(UWB), Bluetooth low energy (BLE), and Wi-Fi, to name a few [2–7]. The funda-
mental idea is to utilize the received signal strength (RSS) readings taken online from
anchor nodes, and the smartphone then matches these readings to the location of the
sample point bearing the closest RSS fingerprint [4]. However, these technologies are
not always available in specific situations due to dependence on infrastructure (e.g.,
disasters, power outages), and manual setup and maintenance are time-consuming
and labor-intensive.
As an alternative viable in indoor localization systems, pedestrian dead reckon-
ing (PDR) methods refer to techniques that can perform localization tasks without
reliance on external infrastructure. PDR utilizes the inertial measurement unit (IMU)
components (e.g., accelerometer, gyroscope, magnetometer) on a smartphone as
input devices to iteratively update the user’s location. However, since PDR’s current
estimation is derived from previous states, the estimation errors accumulate as the
location changes. Thus, conventional PDR is used in conjunction with a dedicated
IMU to ensure accuracy in practical applications. To improve the performance of
cheap IMU-based PDR, fusion algorithms have been proposed, such as the gradient
descent algorithm (GDA) [8], complementary filter (CF) [9, 10] and Kalman filter
(KF) [11–13]. These fusion algorithms efficiently combine data from multiple sen-
sors, mitigating individual sensor errors and reducing noise. They can improve the
accuracy of estimated parameters, such as location and orientation, and hence result
in a more precise and reliable localization outcome.
Moreover, environmental information can be used as prior knowledge for bound-
ary constraints and location calibration. Since the map information fits conveniently
into particle propagation, an efficient way that combines the particle filter (PF)
and spatial information was successfully implemented in [14–16]. However, the PF
approach suffers from two notorious problems: multimodality and sample impover-
ishment [17–19]. The multimodality issue in positioning arises from building struc-
tures that allow multiple propagation possibilities, especially when the initial loca-
tion is not provided [20]. On the other hand, particles tend to collapse at one or
a few locations when the PF relies too heavily on measurement data, resulting in
a loss of diversity, known as sample impoverishment. This can cause failure when
particles cross walls due to inevitable noise and get eliminated. Resampling further
concentrates particle distribution [21]. These problems are more likely to occur when
measurements are insufficient.
Furthermore, the initial state is vital for relative methods like PDR. Although the
initial location and height can be acquired when the user enters a building, in most
cases the localization process begins indoors where the user’s exact location is typi-
cally unknown. PF can compute user location without an initial state, by distributing
particles on a floor plan, updating them via PDR estimation, and assigning weights
until convergence [15]. This requires no infrastructure but high computation because
it needs enough particles to cover relevant state space areas. Additionally, since
10 Indoor Floor Detection and Localization Based … 233
IMU data lacks absolute location information, many iterations are needed for par-
ticle convergence. These problems arise due to insufficient information from PDR,
necessitating the inclusion of more user movement data.
Another drawback of PDR is that it only calculates the two-dimensional (2D)
location, while most buildings are multi-floor structures. The user’s altitude in a
multi-floor building can be represented as a floor number, and a three-dimensional
(3D) location can be derived by integrating the 2D location with the floor number.
Nowadays, most modern smartphones come equipped with barometers, enabling the
widespread use of barometric-based methods for floor detection. Numerous floor
localization methods based on barometric pressure have been developed [22–25],
which can broadly be classified as reference station-based or pressure difference
measurement-based floor localization methods [26]. Since station-based methods
require the deployment of infrastructure within the environment, they are not con-
sidered. On the other hand, a relationship between atmospheric pressure and height
.h was constructed in [27] as
[ ( 1 ]
) 5.255
p
. h( p, p0 ) = 44330 · 1 − , (10.1)
p0
where . p and . p0 are the current barometer reading and standard pressure at sea level
in mbar. However, due to atmospheric pressure error caused by weather factors,
the altitude calculated by (10.1) is inaccurate. Therefore, it is generally accepted to
estimate the change in altitude through the pressure difference, instead of directly
calculating the altitude. Another challenge of the pressure difference measurement-
based method is that barometric measurement is also affected by smartphone usage
and environmental factors. Although the pressure distribution of each floor, even
containing various noises, has a range in an overall view, waiting for enough data to
characterize the pressure distribution in a specific region will result in a significant
delay, which is not conducive to cooperating with 2D locations. Moreover, height
information can not only be combined with 2D location to provide 3D location, but
can also be used to optimize 2D localization since both are derived from the user’s
motion measurements. However, only a few solutions took advantage of this feature
to optimize the performance [17, 28]. The reasons could be as follows.
• Delays that result from slow floor transition detection lead to a loss of correlation
between height information and 2D location.
• Most floor detection solutions merely compute altitude without the capacity to
extract features associated with floor transitions, while altitude alone does not
suffice to correct the user’s 2D location.
Therefore, the use of height information to improve 2D location requires the ability to
detect fast floor transitions and extract vertical motion features. Moreover, despite the
wide usage of the PF in indoor localization, handling the state of particles when the
user changes floors is a tricky issue, and its optimal strategy for multi-floor scenarios
remains an open problem.
234 C. Lin and Y. Shin
PDR has gained widespread adoption on smartphone platforms due to its lightweight
implementation and usability in areas lacking wireless signal coverage. A typical
PDR system comprises three components: step detection, step length calculation,
and heading direction estimation. It provides polar coordinates, such as {step length,
heading direction}, to represent steps. The location of a step can be calculated by
summing these vectors. Step detection is usually achieved by identifying peaks in
vertical acceleration [30]. Other methods such as threshold detection and adaptive
detection are also in use [31, 32]. Step length estimation can be derived based on
features of the acceleration data, such as peaks/valleys, variance, walking speed,
etc. [33–35]. Heading estimation through IMU sensors is generally achieved in two
10 Indoor Floor Detection and Localization Based … 235
ways: (1) by integrating gyroscope readings to compute the change in angle over a
short period of time, and (2) by determining the absolute orientation relative to the
north using readings from accelerometers and magnetometers [36]. However, PDR
is a recursive method, which inevitably leads to errors accumulating over time. These
errors come from missing steps, inappropriate step length compensation coefficients,
and distorted directions. Particularly, consumer-grade sensors are subject to cost and
size limitations, which results in smartphone sensor-based PDR only maintaining
good accuracy over short paths. To overcome these limitations, data fusion tech-
niques are commonly used to combine various sensor inputs from smartphones and
additional accessible information about the user.
When pedestrians move across different areas, smartphone sensor readings exhibit
corresponding distinct patterns, which can be utilized for floor detection. There are a
few PF approaches that assist in determining the user’s vertical movement by prop-
agating particles to predefined vertical transition areas [15, 37]. However, particles
tend to move to broader areas, while floor transition zones are generally narrow.
When a user moves to another floor, only a small number of correct particles enter
the transition zone. Reference [38] demonstrated that magnetic field signals from the
smartphone sensor exhibit temporal stability and spatial resolution, proving signifi-
cantly beneficial for magnetic field cartography. Considering the ubiquitous presence,
reliability, and low cost of magnetic field signals, floor detection based on magnetic
field measurements have been presented in [39–41]. Altitude calculation based on
features extracted from different IMU modes during the user’s vertical movement
was successfully implemented in [42–44]. Reference [42] proposed two acceleration
integration methods to determine height difference, and a mapping table was formed
from distinct movement patterns for floor change estimation using travel time and
step count [43]. An inherent issue with IMU-based floor detection is that unpre-
dictable actions from the user severely impact IMU measurements. Consequently,
these systems typically maintain optimal performance under constant user behaviors.
The barometer sensor avoids this problem because its measurement is dominated by
atmospheric pressure instead of user motion. Reference [22] used crowdsourcing
to develop a barometric fingerprint map for floor localization. This map clustered
barometric readings from each floor, using shared timestamps to gather real-time
fingerprints. Reference [44] tracked changes in floor location by identifying user’s
ascending or descending activities based on changes in atmospheric pressure and
altitude. Because a floor number only provides a rough range of information, it
needs to be combined with other data (e.g., 2D locations) when higher location pre-
cision is required. Therefore, besides height estimation, the challenge of the floor
detection approach also lies in its effective incorporation with other solutions when
implemented within varied systems.
236 C. Lin and Y. Shin
In recent years, DL has been widely used in the analysis and processing of sen-
sor data, creating significant advances in data feature engineering and providing
many solutions in LBS [2, 45–50]. Reference [2] combined UWB localization with
a long short-term memory (LSTM) network to predict user locations based on dis-
tance information derived from a time of arrival (ToA) distance model, offering
enhanced accuracy in UWB localization systems. Reference [48] utilized a bidirec-
tional LSTM architecture to map IMU signals to varying rates of motion, offering
robust and accurate velocity estimates even under dynamically changing IMU orien-
tations. Experimental results show that it achieved an error rate less than 0.10 m/s for
instantaneous velocity and less than 29 m/km for travelled distance. Reference [49]
presented a hierarchical Seq2Seq model, termed DeepHeading, which utilizes spa-
tial transformer networks (STNs) and LSTM technologies. DeepHeading’s encoder
operates by taking in sensor data over time intervals of one step, while its decoder
predicts heading based on state vectors received from the encoder. Reference [50]
presented a StepNet, a suite of deep learning-based approaches for predicting step-
length or change in distance, which surpassed traditional methods in the trajectories
examined. An important aspect of the DL schemes in LBS is that, since the platform
for LBS is generally a mobile device, the power consumption and computational
complexity must be considered.
Figure 10.1 illustrates an overview of the proposed scheme, which consists of two
modules: DL-based floor detection and PDR-PF with clustering. As illustrated in
Fig. 10.1, the scheme reads barometer and IMU sensor data, while the user is walking
in a building. The DL-based floor detection receives barometric data as the input to
perform the floor tracking. Meanwhile, the PF incorporates the PDR estimation based
on the IMU data, the prediction from floor detection, and the data from the smartphone
database to calculate the 2D location. Finally, the results of floor detection and PF are
combined to achieve indoor multi-floor localization. The DL-based floor detection
is introduced in this section, which is responsible for floor transition detection and
floor number calculation.
We divide multi-floor localization into two stages: the first stage where the user enters
the building and moves around freely, and the second stage where the localization
begins. To provide the height information required for initializing 2D localization,
10 Indoor Floor Detection and Localization Based … 237
it is necessary during the first stage to obtain the floor number from the entrance
or other technologies (e.g., GPS) [51, 52] and track the floor, while the user moves
around with the smartphone. The floor number will be used to provide the correct
floor plan in the second stage.
The barometer measurement, primarily based on altitude, is also affected by short-
term noise from user activity and the ambient environment, as well as long-term
drift caused by weather in practical applications. We presume that for most cases,
the height of the device relative to the user’s body is within a specific range, as
shown in Fig. 10.2. There are several representative modes of smartphone usage
listed: (a) calling, (b) typing, (c) swinging, and (d) pocket [53]. Here, (a) represents
the highest case of a smartphone, (c) represents the lowest case, and (d) represents
the different surroundings. The data in Fig. 10.3 show an example of barometric
data collected including the above cases. From Fig. 10.3, means and variances of
barometric data collected on the same floor differ due to height and environmental
changes, with outliers appearing during usage changes. Our previous work [29]
Fig. 10.3 Examples of raw and smoothed barometric data and associated time lag effect
leveraged time-series pressure data for step action recognition, proposing a multi-
layered perceptron (MLP)-based method to detect floor transitions from noisy data.
However, this assumed the user consistently held the smartphone in front, making the
user behavior is easily recognized as the stair step under the free activity scenarios.
The typical approach to smoothing these transient pressure fluctuations is to utilize
a lowpass filter, such as simple moving average (SMA) or weight smoothing, as
follows [54]. ∑t
i=t−m+1 x i
.xt = ,
d
(10.2)
m
x d = (1 − β) · xt−1
. t
d
+ β · xt . (10.3)
Here, .xt and .xtd indicate the .tth sampling data and smoothed sampling data, respec-
tively, .m is the size of the average window, .β is the smooth factor, and they are used
to control the smoothing effect. The trade-off between delay and smoothing effect
is a known problem with the smoothing algorithm. Figure 10.3a and b demonstrate
the smoothing and delay effect of (10.3) with different values of .β, and they were
smoothed under a sampling frequency of 20 Hz. In Fig. 10.3b, the barometer reading
with .β = 0.6 only smoothed out a few severe outliers, and caused a delay of less than
0.05 s; while the smoothed barometer reading with .β = 0.03 exhibited a clear height
correlation, but the trade-off is causing a delay of 5.1 s, which means that the user
may take 6–8 steps before the pressure measurement shows the characteristics of the
flat floor. These missing steps constitute a significant error in our system because
the floor transition signal generated from floor detection is exploited in the PF com-
ponent to correct the PF’s estimation (to be described in Sect. 10.4.2.6). The time
difference between the step action and the 2D position should be as small as possible
to ensure their correlation. Therefore, we design a Seq2Seq model that can predict
the correct step action from barometric data containing noise and outliers instead
of heavily relying on the smoothing filter. In addition, because a delay of 0.05 s is
acceptable, we utilize (10.3) with .β = 0.6 to preliminarily smooth the data.
10 Indoor Floor Detection and Localization Based … 239
We found the potential of the Seq2Seq DL model for handling noisy time-series
pressure data. The Seq2Seq is an encoder-decoder framework model using recurrent
neural network (RNN) [55] and consists of three components: encoder, decoder, and
state vector that connects them. The encoder is responsible for compressing the input
sequence into a state vector as the initial hidden state of the decoder, and then the
decoder predicts the probability of each class from the state vector. The Seq2Seq
model can deal with various tasks such as many-to-many, many-to-one, and one-to-
many, as shown in Fig. 10.4. We apply the Seq2Seq model to the many-to-one task.
Unlike MLP that is the most widely employed DL model, the output of Seq2Seq is
determined by both current and previous inputs, thus it is well suited for handling
sequences such as time-series data. In this chapter, the Seq2Seq model receives
time-series pressure data that includes the previous and current barometer readings
to confirm whether the current barometric fluctuation is caused by noise or height
change.
The training data was collected from Hyeongnam Engineering Building which
is a typical large multi-floor building at Soongsil University, with a height of 17 cm
for each stair. Regarding the data collection, a barometer reading is recorded in
the smartphone database once a step is detected. The step’s label is determined
based on the region where the user is located. For example, if a user enters the
upstairs at the 30th step and exits the staircase at the 60th step, then the labels
for the 30th to the 60th steps would be assigned as “Going up.” Moreover, unlike
IMU sensors, barometric measurements are primarily driven by changes in altitude.
Thus, the impact of individual user characteristics (e.g., weight, gender, height)
on barometric measurements is negligible compared to the noise induced by user
activity. The primary impact of different users on barometric measurements comes
from their walking styles on the staircase. Therefore, we ensure the inclusion of
features from various movement patterns in the training data by randomly taking
one or two stairs while climbing stairs during data collection. There were 3,726
barometric data collected for model training, which included 14 events of ascending
stairs and 14 events of descending stairs. Next, we applied a data augmentation
method to the collected data, as follows.
δ = xi+1
. i
d
− xid , (10.4)
x a = xlast
. 1
d
− δ1 , (10.5)
x a = xi−1
. i
a
− δi , (10.6)
where .δ is the pressure difference of adjacent steps, .xkd is the .kth smoothed pressure
d
data, .xlast is the last pressure data of the collected dataset, and .x a is the barometer
data generated by data augmentation. Through data augmentation, the ascending and
descending pressure data can be mutually transformed. A thorough analysis on the
reasoning behind, feasibility, and optimization strategies for ensuring high dataset
quality in this data augmentation method can be found in [29]. We concatenated
d a w
. x and . x as training data . x . By performing data augmentation, the size of the
training dataset was increased to 7,098, and the number of events for ascending and
descending stairs was expanded to 28.
Subsequently, a sliding window method was used to convert the data into learnable
forms, as follows.
w w w
. X k = {x k−s+1 , . . . , x k }, (10.7)
where .s stands for the window size. . X kw is a subset of the dataset, which contains
the pressure change from the previous .s steps. Its label is determined by the label of
the .kth step. It is recommended that the value of .s be between 10 and 20 to ensure
that the barometer sequence of size .s is sufficient to represent the pressure change
information over a short period, and .s = 15 in this study. Next, mean centering is
used to shift the feature’s center to 0.
. X k = X kw − μk = {xk−s+1 , . . . , xk }, (10.8)
where .μk is the mean of . X kw and . X k is the input of the model. Our model predicts
the step action based on the pressure changes over the past .s steps. The reason
for employing mean centering instead of normalization or standardization lies in
our desire to shift the data close to 0 to aid the model training, while refraining
from scaling operations that modify the data’s original units. Furthermore, the main
advantage of using fixed-length . X k as the input (i.e., many-to-one) to predict a step
action instead of generating the output whenever each input is read (i.e., many-to-
many), is that the Seq2Seq model is capable of fitting the trajectories of different
lengths well and eliminates the effect of outliers that accumulate over time and the
weather factors. The performance of the many-to-many approach becomes unstable
10 Indoor Floor Detection and Localization Based … 241
in long path scenarios, which results from the accumulation of outliers in previous
inputs. Additionally, early barometer data have little correlation with the current step
action. In contrast, the fixed-length input means that the model’s prediction only
depends on past .s measurements and has approximate performance for a sequence
with arbitrary lengths. Furthermore, since pressure fluctuations typically require tens
of minutes to hours to produce a significant altitude drift [56], a pressure sequence
with a size of 15 which corresponds to a pressure change over a 10 s-period, enables
the avoidance of long-term errors arising from weather factors.
Table 10.1 lists the hyperparameters adopted in the Seq2Seq model. The hyper-
bolic tangent (tanh) was used instead of sigmoid for faster and better training. The
Xavier uniform initialization was utilized as a weight initializer to make the variance
of the output of each layer roughly equal to the variance of its input, to prevent the
gradients from becoming too large or too small during training [57]. They are both
hyperparameters commonly used in RNN training, and their definitions are given in
(10.9) and (10.10), where the . f an in and . f an out indicate the number of input units
and output units in the weight tensor, respectively.
e2x − 1
. tanh(x) = , (10.9)
e2x + 1
( / / )
6 6
. Wi, j ∼ U − , . (10.10)
f an in + f an out f an in + f an out
Figure 10.5 provides an overview of the proposed Seq2Seq DL model. The decoder
and encoder of the model are both composed of an LSTM layer with 16 hidden
units [58]. The model is initialized using the Xavier uniform initialization, then
sequentially receives past .k barometric data. As the model processes the time-series
data, it utilizes the hidden state transitions of the Seq2Seq model to extract the
temporal dependencies in the data, thereby effectively extracting the inherent features
242 C. Lin and Y. Shin
associated with ascending or descending stairs. Subsequently, the dense (or fully-
connected) layer outputs the probabilities of three distinct classes: “Normal,” “Going
up,” and “Going down.” The model then updates its weights according to the ground
truth. Each of these actions represents a potential pattern of walking behavior.
Figure 10.6 shows simple step action recognition examples. The test data was
collected within a building with a floor height of 3.5 m, where the tester ascended the
stairs and then returned via the elevator. From Fig. 10.6, the barometric data exhibits
notable noises even when the tester is walking on a flat floor, making it difficult to
differentiate between pressure differences caused by changes in altitude and those
caused by noise. For such data, the Seq2Seq model (left in Fig. 10.6) correctly iden-
tifies the majority of step actions in real-time. Additionally, the proposed Seq2Seq
model exhibits sensitivity to sudden changes in barometric data, which allows to
immediately detect the first step after the user takes the elevator, as indicated by
Algorithm 2 UpdateFloor()
Input: relative pressure map RM, pivot floor, pdelta
Output: floor number
1: // Obtain pressure value of pivot floor from RM
2: p pi vot ←RM[pivot floor]
3: // Calculate new pressure value after floor transition
4: pnew ← p pi vot + pdelta
5: floor number ← the floor with the closest pressure value to pnew in RM.
6: return floor number
the red arrow in Fig. 10.6. This characteristic is crucial for our approach because it
enables the immediate detection of an elevator event from the first step through the
floor decision algorithm described in the following subsection.
244 C. Lin and Y. Shin
In our method, the user’s vertical movement is represented by step actions instead of
directly calculating the altitude. The advantage of this approach is that, due to factors
such as pressure drift and user behavior, the barometer reading can vary even if the
altitude is the same (i.e., the user is on the same floor). These short-term and long-
term noises in barometer measurements cause the height calculation to be inaccurate.
The step action sequence is a form of data without atmospheric pressure value, thus
avoiding the above problems. The only requirement is to ensure that DL predictions
are accurate and robust enough. Based on the step action sequence, we know the
exact step of the floor transition that occurred. Therefore, this study estimates the
height change based on barometric pressure difference only when the region changes
are detected.
Algorithm 1 explains the proposed floor decision algorithm. We obtain the step
action sequence according to the prediction of the Seq2Seq model, which implies
the user’s vertical movement. For example, a sequence of “Going up” steps means
that the user is climbing the stairs. We can update the floor number according to the
number of such steps. However, this method has two drawbacks: (1) the number of
steps required to walk up a floor varies depending on the user’s climbing method,
and (2) false floor transitions on a flat floor are also recognized as floor transitions.
Therefore, it is necessary to calculate the height difference through the barometer
reading. Although the atmospheric pressure drifts due to weather conditions, the
pressure difference in a short time interval is credible [59].
Before applying the step action sequence, incorrect DL model predictions need
to be eliminated. Outlier data is typically isolated and unordered, whereas height-
induced pressure changes are persistent and ordered. Therefore, confirming a floor
transition through multiple step actions can eliminate most of the incorrect pre-
dictions. When detecting a different step action with previous steps, the proposed
method does not immediately confirm the region changed, but instead enqueues the
step data in memory and waits for the prediction of new step actions until the queue
length exceeds .n wait . At the point the current region is changed, a floor transition
signal is generated.
The floor decision algorithm first calculates the pressure difference . pdelta when a
region change is confirmed. In particular, the pressure value . p of the previous floor is
calculated through a lowpass filter (line 2 in Algorithm 1), while the current pressure
value. pa is obtained from the average of the data in the queue (line 10 in Algorithm 1).
This ensures that the time interval in calculating the pressure difference is minimized
to avoid the impact of long-term drift errors and smoothens the pressure values of the
previous floor and current floor against short-term noises. At line 12 in Algorithm
1, a step action not equal to “Normal” indicates that the user moves from the flat
floor to an elevator or stairs, and the decision between elevator or staircase is made
based on the pressure difference . pdelta . Otherwise, if the step action is “Normal,” it
means the user is returning to the flat floor, and the floor number must be updated
accordingly.
10 Indoor Floor Detection and Localization Based … 245
and steps in the queue is updated. The proposed floor detection can immediately
estimate floor number when a step is detected, with only a delay of .n wait steps when
the region changes.
the clustering results. PDR is described first because it drives the entire scheme.
Then, PF will be introduced in detail, including map constraint, clustering, and CN
matching-based location correction.
The Android and iOS operating systems respectively provide the SensorEvent and
CoreMotion classes to report motion information from the onboard sensors of devices
[60, 61], which enables us to estimate the location through PDR. The PDR approach
suggests that a step can be expressed as a distance and an angle referring to the
previous state, i.e., the current location is determined by the current displacement
and previous location. Thus, the location of the.kth step. Pk (xk , yk ) can be expressed as
[ ] [ ] [ ]
xk x sin(αk )
. Pk (xk , yk ) = = k−1 + λk , (10.11)
yk yk−1 cos(αk )
where .λk and .αk are the stride length and heading direction of the .kth step. Next,
we introduce the step detection, stride length calculation, and heading direction
estimation of PDR.
When a pedestrian walks, the vertical acceleration presents periodic sine waves,
with each step represented by a local peak or valley in the acceleration. This pat-
tern enables step detection by recognizing these peaks and valleys in vertical accel-
eration. To counteract the impact of device tilts on sensor measurements, rotation
transformation needs to be performed to convert the accelerometer readings from the
local coordinate system (LCS) to the global coordinate system (GCS). The rotation
matrix . R can be calculated through several methods, including quaternions and sen-
sor fusion [62, 63]. We utilized getRotationMatrix() function in SensorManager class
to compute the rotation matrix by cross-product of accelerometer and magnetometer
measurements. The acceleration vector in the GCS . AtG can then be determined as
where . AtL are the acceleration vectors in LCS. Subsequently, a valid step is defined as
.{at > a upper , at+Δt < a lower , 0.15s < Δt < 0.6s}, (10.13)
where .at is the vertical acceleration, and .Δt is the time interval between the peak
and valley. We established the amplitude thresholds .a upper = 1.0 m/s.2 and .a lower =
−0.8 m/s.2 in this study.
10 Indoor Floor Detection and Localization Based … 247
where .amax,k and .amin,k denote the maximum and minimum vertical acceleration
value during the .kth step, and .τk is the coefficient that can be specified for different
subjects. Due to the influence of gravity, the variance of acceleration for steps taken
in stairs is typically greater than that for steps taken on flat ground, and causes errors
in the stride estimation within stairs. To improve the step length calculation, we
adjust the value of .τk based on the region where the user is located using (10.15).
The region information is obtained through the floor detection method.
(
ρ · τ, if stairs
.τk = . (10.15)
τ, otherwise
Here, .ρ denotes a scale factor used to compensate for step length calculation in stairs.
It is recommended that the value of .ρ ranges from 0.5 to 0.8, and we empirically
set .τ = 0.43 and .ρ = 0.6 in this study. In fact, the exact value of .ρ is not strictly
required in our scheme, since a correction will be made both when detecting entering
and exiting stairs (to be described in Sect. 10.4.2.6).
The accuracy of heading direction in PDR is crucial because the main source of
error comes from distortions of direction. The accelerometer/magnetometer orien-
tation .α m is obtained from getOrientation() in SensorManager class. For the iOS
platform, .α m can be retrieved from the CLHeading class in the CoreLocation frame-
work [64, 65]. The orientation calculated by integrating the gyroscope reading tends
to slowly drift away from the actual orientation, while the orientation derived from
the accelerometer/magnetometer can be easily distorted by surrounding electronic
devices. Thus, the common practice is to fuse these two measurements based on
certain criteria rather than relying on a single angle source. A typical orientation
fusion can be expressed as follows [8].
where .ω is the gyroscope reading, and .γ is a coefficient that determines the fusion
proportion, with its value ranging between 0 and 1. A larger value of .γ indicates a
stronger influence of the angle calculated by the gyroscope in the direction update.
248 C. Lin and Y. Shin
The PF is activated when the user starts localization (e.g., press the localization
button). Since the user’s location is not given, the PF first acquires the current floor
plan based on our floor detection, and then. N0 particles are uniformly dispersed on the
entire map. The attribute of the .ith particle at the .kth step, including 2D coordinates,
heading, weight, and cluster number, is as follows.
where . Pk(i) = (xk(i) , yk(i) ) denotes the 2D location and .θk(i) denotes the heading direc-
tion. We assume that the orientation measured by accelerometer/magnetometer sen-
sors is Gaussian-distributed around the true orientation and thus generates .θ0(i) ∼
N (α m , σ ori ) as the initial heading of the .ith particle. .wk(i) and .ck(i) stand for the
particle weight and cluster label, respectively, and they are initialized as .1/N0 and
.−1.
In this subsection, the location . P (i) , heading .θ (i) , and weight .w (i) of particles are
updated. Whenever a step is detected, the step length and direction are calculated by
the PDR and fed into the PF. The particles propagate based on the current state and
PDR estimation. In addition, Gaussian errors with zero mean and standard deviation
.σ and .σ
l o
respectively are added to step length and heading direction update to
simulate the effect of the uncertainty and noises of measurements, as well as avoid
the loss of diversity among the particles.
The weight of the particles is determined by the system evaluation func-
tion, with the commonly used evaluation parameters including direction or dis-
tance [68, 69]. We apply Gaussian distribution to calculate the weight of the particles,
as follows [70].
10 Indoor Floor Detection and Localization Based … 249
(( ) ( ) )
(i) 2 (i) 2
Δxk −Δxk + Δyk −Δyk
1
wk(i) = wk−1
(i) −2σ 2
. · √ ·e , (10.18)
σ 2π
We use clustering to group the surviving particles and find the centroid of each
cluster. With the clustering results, we (1) confirm the convergence of particles, (2)
optimize the location estimation from multiple modes of particle distribution (to be
described in Sect. 10.4.2.4), and (3) adjust the number of particles dynamically sub-
ject to reduce computational burden without sacrificing performance (to be described
in Sect. 10.4.2.5). Before explaining these, we introduce the clustering algorithm.
We utilized mean shift as the clustering algorithm [71]. Mean shift is a non-
parametric and centroid-based technique that defines a region around each data point
and moves the center (or mean) of that region toward the densest part of the region
until it converges to the local maximum. The motivation for using mean shift is that
it is simple, fast, and can delineate arbitrarily shaped clusters and count the number
of clusters automatically, which is well suited for the dynamic and irregularly shaped
particle cloud. In the proposed scheme, mean shift evaluates the similarity between
particles based on their location in the orthogonal coordinate system. In particular,
normalization is not performed as we want to retain the unit of features to express
the real distance. The main parameter of mean shift is the bandwidth . B, which is set
as 3 m in our scheme. This is a relatively large value, which implies that the spatial
separation between particles may need to approximate the distance across a room
for them to be classified into distinct clusters. The purpose of utilizing mean shift is
to describe particle distribution and discover dispersed particle clusters rather than
dividing a converged particle cloud into several clusters. Therefore, a slightly larger
value of . B is recommended.
In the initialization phase, particles are evenly distributed across the state space
area. As the new step is detected, the particle cloud converges to where the user might
be. The early particles are meaningless until the filter gathers over several iterations
to represent the user’s possible location. The clustering results can be exploited to
explain the current particle distribution. The more dispersed the particles are, the
more clusters and the more distant the centroid are from each other. Therefore, we
250 C. Lin and Y. Shin
assume that the PF has converged enough to provide valid location information when
only one cluster exists or the largest cluster’s weight exceeds 80% of the total weight.
In PF, the estimate of user location is obtained by taking the weighted average of the
surviving particle’s location. Mathematically, they can be expressed as follows.
[ ] [∑n s (i) (i) ]
xk w x
. = ∑in s k(i) k(i) , (10.19)
i wk yk
yk
10.4.2.5 Resampling
One drawback of the SIS method is the degeneracy of weight, where the importance
weights concentrate on a few particles while the majority of particles have weights
close to 0 after multiple iterations. Resampling is a common solution to handle this
issue which ignores the particles with low weights and multiplies the particles with
high weights. However, resampling causes the particles to lose diversity, resulting
in sample impoverishment [14, 21, 72]. A typical approach to handle this issue is to
implement resampling only at certain iterations [28]. Hence, instead of resampling
every iteration, we only perform it when (1) five iterations have passed since the last
resampling, or (2) the number of surviving particles is less than . N p /5, where . N p
is the current maximum number of particles, and its value is dynamically adjusted
based on the clustering result.
Theoretically, a large number of particles can reduce the variance of the estimated
posterior distribution, leading to more accurate state estimation. However, the PF is a
computationally intensive method, and increasing the number of particles increases
the computational cost of the algorithm. This problem is magnified in real-time tasks
and on the mobile platform. Therefore, balancing the number of particles with com-
putational efficiency is important for optimizing the performance of a PF. We assume
10 Indoor Floor Detection and Localization Based … 251
that fewer particles can still achieve good localization results when the particles are
concentrated in one area, such as a closed corridor, while if the particle distribution
is dispersed, more particles are needed to explore the feasible paths. Thus, the pro-
posed scheme dynamically adjusts the number of particles . N p based on the number
of clusters to achieve good performance with lower overhead, as follows.
where .n cluster is the number of clusters, and . Nc stands for the number of particles
assigned for one cluster. In the initialization phase, there can be many clusters due to
the dispersion of particles. To prevent generating too many particles, the maximum
value of . N p is set to .15 × Nc . A large number of particles . N0 are only used at the
first iteration since PF has to cover the interesting state space areas. Then, . N p is
determined by (10.20).
Here, . f j and . P jcn respectively indicate the floor level and location of the CN, .T ype j
is the floor transition type (i.e., stairs or elevator), . D j is the transition direction, .θ cn
j
stands for the possible direction range, and .n cn is the total number of CNs. Because
the structure of the facilities restricts the direction of user movement (e.g., an elevator
having only one exit direction), each CN is assigned a .θ cn that is used for initializing
the heading of the regenerated particles.
There is an example of CN matching-based location correction. A user takes an
elevator from the first floor to the third floor, and the floor decision algorithm returns
the first floor as the previous region and the third floor as the current region, with
the mode of vertical transportation being recognized as the elevator. Therefore, the
CNs with . f = 3, .T ype = elevator, and . D = ascending are matched for location
correction. Location correction is performed according to the following criteria.
• If the PF has not converged, then . Nc particles are generated around each matched
CN. Note that we established CNs near each vertical transportation, thus there is
at least one CN that exists for matching.
• If the PF has converged, the closest CN is treated as the main CN based on the
estimated user’s location, and then .n cluster × Nc particles are generated near this
CN. The remaining CNs are considered as sub-CNs, and .(n cluster − 1) × Nsub
particles are generated near each sub-CN. We set . Nc = 150 and . Nsub = 15; thus
sub-CNs will be identified as tiny clusters and do not affect the location estimation.
In this way, we not only extend PF to 3D scenarios but also correct particle states
using information from matched CNs. Additionally, this method accelerates particle
convergence, because the collision detection algorithm can easily eliminate incorrect
clusters due to the narrow transition zone. Furthermore, small clusters generated
based on the sub-CN provide an opportunity for rectification when the PF converges
to the incorrect place: the correct cluster can grow after the other cluster disappears
by colliding with the wall.
Note that while there are variants of the PF to improve the performance, the
optimal strategy remains an open question. Our focus is not on providing the most
10 Indoor Floor Detection and Localization Based … 253
accurate localization solution, but rather on offering sustainable and reliable long path
tracking services with minimal human efforts and limited measurement data. Finally,
the 2D location and the floor number obtained from the floor decision algorithm are
combined to represent the user’s location in a multi-floor scenario.
In this section, we presented the effectiveness of our suggested method using var-
ious experiments. The data was collected via an Android app on a Samsung Note
.10+ with a barometer and IMU sensors, all at a 20 Hz sampling rate. We approach
indoor multi-floor localization in two stages: (1) entry and movement inside the
building, and (2) initiating localization. The first determines the initial floor upon
entry using entrance data or other technologies, while our DL-based method tracks
the user’s floor without limiting their actions. During the second stage, the smart-
phone is assumed to be directed forward relative to the user’s body. To reflect the
performance under real-world usages, the tester exhibited complex mobility patterns
during the experiments, including different walking speeds and unconventional stair
navigation. To demonstrate the performance of our Seq2Seq scheme, we used the
MLP model proposed in [29] for comparison. In this context, F# indicates floors
above ground and B# denotes those below; for instance, F5 is the fifth floor.
When a step is detected, the Seq2Seq model first predicts the step action, and then the
floor decision algorithm calculates the floor number and user’s vertical movement
information based on the step action, barometer reading, and relative pressure map.
The floor decision algorithm describes a step using one of the following classes: a
specific floor, “Stairs up,” “Stairs down,” “Elevator up,” or “Elevator down.” A flat
step is referred to a step on a flat floor in this section.
The accuracy rate (AR) is adopted to evaluate the accuracy of floor number
calculation as follows.
#{ fˆi | fˆi = f i }
. ARFN = (×100%), (10.22)
n f loor
where .#{ fˆi | fˆi = f i } represents the count of steps where the predicted floor number
fˆ is identical to the actual floor number . f i , and .n f loor indicates the total number
. i
of steps whose actual label is “Normal” (i.e., a floor number). . A R F N quantifies the
proportion of correctly identified floor numbers out of all the steps on a flat floor.
Since steps in staircases and elevators do not represent an exact floor number, they
are excluded in the computation of . A R F N .
254 C. Lin and Y. Shin
There were 9 floor detection experiments conducted in Sung-deok Hall, Jilli Hall,
and Cho Man-sik Memorial Hall at Soongsil University. Each floor of the buildings
had a height of about 3.0–3.5 m and was equipped with both elevators and stairs.
Table 10.2 provides a detailed description of the paths and activities performed during
the experiments, where C, T, S, and P stand for calling, typing, swinging, and pocket
E S
cases, respectively, while.−→ and.−→ represent elevators and stairs. For example, F1
E
(C).−→ F2 (C-P) means the user goes upstairs from F1 to F2 by elevator, during which
he finishes a phone call and puts his smartphone in his pocket. Tables 10.3, 10.4 and
10.5 present the evaluation confusion matrices and . A R F N scores for floor detection
using MLP and Seq2Seq models. From Table 10.3, the Seq2Seq model accurately
recognized all elevator steps and misclassified some stairs and floor steps. On the
other hand, from Table 10.4, the MLP model outperforms Seq2Seq in correctly-
identifying floor steps. However, the comparison of Tables 10.3 and 10.4 shows
the relative purity of results for the Seq2Seq model. For example, Seq2Seq’s false
negatives for “Floor” were exclusively misidentified as “Stairs,” with no instances of
Table 10.3 Confusion matrix for floor transition detection using Seq2Seq
Ground truth action
Recognized Stairs (%) Elevator (%) Floor (%)
action
Stairs 97.08 0 6.58
Elevator 0 100 0
Floor 2.92 0 93.42
10 Indoor Floor Detection and Localization Based … 255
Table 10.4 Confusion matrix for floor transition detection using MLP
Ground truth action
Recognized Stairs (%) Elevator (%) Floor (%)
action
Stairs 94.36 1.71 3.93
Elevator 0 99.64 0.36
Floor 4.94 0.12 94.94
confusion with “Elevator.” This contrasts with the MLP model which presents a more
complex confusion matrix, with instances of misclassification between all classes.
Table 10.5 shows . A R F N scores of the floor calculation in the three experimental
buildings. The Seq2Seq model yields a total accuracy of over 90%, indicating that
the majority of estimated floor numbers are consistent with the actual floor numbers
under conditions of complex user activity. Although some errors exist, they primarily
occur when the user enters and exits the transition zone or during changes in activity.
These false positive and negative errors result in a delay of .n wait steps but do not
cause inaccuracies in the computation of the floor level. In a scenario of unrestricted
user activity, our goal is not to guarantee perfect accuracy in step action detection,
but rather to prevent these potential errors from leading to incorrect floor number
calculations. On the other hand, the AR scores of the MLP model are noticeably lower
than those of the Seq2Seq model, indicating that the unstable step action recognition
reflected in Table 10.4 has an considerable impact on floor calculation. This implies
that the Seq2Seq model demonstrates significantly better stability when dealing with
noisy data compared to the MLP model.
#{dk | dk ≤ ∈}
. A R Loc = × (100%), (10.24)
n
where . Pk = (xk , yk ) and . P̂k = (x̂k , ŷk ) indicate the estimated location and actual
location of .kth step, respectively, and .dk indicates the Euclidean distance between
them. .#{dk | dk ≤ ∈} represents the count of steps where .dk is less than or equal to
.∈, and .n is the total number of steps. Furthermore, root-mean-square error (RMSE)
We collected 1,000 step data for the experiment. To demonstrate the performance
of our scheme, we performed six approaches to calculate the location, which are: (a)
the proposed scheme, (b) the proposed scheme with MLP, (c) PF (1k particles) with
CN matching, (d) PF (1k particles), (e) PDR with CF, (f) calibrated PDR with CF,
and (g) PDR with Acc & Mag. The setup of each approach is described as follows.
(a) The PF with CN matching-based location correction and dynamic adjustment of
particle numbers via (10.20).
(b) The PF with CN matching-based location correction and dynamic adjustment of
particle numbers via (10.20). Notably, the DL model utilized in floor detection
is MLP.
(c) The PF with CN matching-based location correction uses a fixed number of
1,000 (1k) particles generated in the resampling phase, instead of using (10.20).
(d) Conventional PF generates a fixed number of 1k particles in the resampling
phase. When a floor transition occurs, the particle information from the previous
region is used directly.
(e) Step locations are calculated with (10.11). The step length is calculated by
(10.14) and (10.15), and the heading direction is calculated using (10.16), where
.γ = 0.99.
(f) Step locations are calculated in the same way as (e). In addition, whenever a floor
transition is detected, the location of matched CN is set as the current location
to correct the location [29].
(g) Step locations are calculated using (10.11). The step length is calculated
by (10.14) and (10.15), and the heading direction is calculated from the
accelerometer and magnetometer sensors, i.e., .α m .
The vertical movement and altitude information were obtained from the floor
detection. Because traditional PDR can not function without an initial state, the start
location of (e), (f), and (g) was manually annotated. Furthermore, the probabilistic
nature of the PF, slight variations can occur in its computational results each time it
10 Indoor Floor Detection and Localization Based … 257
is run. Hence, the scores of (a), (b), (c), and (d) are obtained from the average of ten
separate computations.
Figure 10.9 presents the results of the long path tracking. Before analyze the exper-
imental results, there are some notes for the visualization of the tracking trajectory.
To illustrate the details between different floors and areas, we plot the actual location
of the user on each floor’s floor plan. The start and end points marked by a navy
square and diamond, respectively. The orange circles represent CNs, labeled “S” for
stairs and “E” for elevators. Only CNs identified as main are depicted. Each grid
on the map equates to 2 m of space, and text blocks and arrows indicate floor tran-
sition types and directions. For clarity, only results from (a), (c), (d), and (e) were
illustrated in Fig. 10.9. Since the estimated locations before particle convergence are
meaningless and could potentially hinder visual comprehension, the results for PF
methods are plotted after convergence is confirmed.
In Fig. 10.9, we consider a realistic trajectory consisting of a sequence of
movements through various complex sections within the building, as follows.
258 C. Lin and Y. Shin
• F6 activities: Start from room 613 .→ go downstairs through the right stairwells.
Since convergence had not yet been reached, . N p particles are generated around
each matched CN.
• F4 activities: Enter rooms 404 and 407 consecutively .→ move along the corridor
.→ return to F6 via elevator. In this segment, S5 was matched as the main CN.
• F6 activities: Move along the corridor .→ go downstairs via elevator. In this
segment, E4 was matched as the main CN.
• F5 activities: Move along the corridor .→ enter rooms 525 and 524 consecutively
.→ go downstairs through the staircase near the open corridor. In this segment, E1
and S2 were matched as the main CNs.
• F3 activities: Move along the right side corridor .→ enter room 311 and 329
consecutively.→ reach the exit.→ detected as leaving the building. In this segment,
S3 was matched as the main CN.
Note that the user’s walking trajectory may vary between rooms due to the presence
of obstacles (e.g., tables). The first floor transition occurred at the 32nd step, and
convergence was confirmed at the 42nd step for the PF with CN matching and at the
150th step for the PF without CN matching.
Figure 10.9 shows PDR with CF tracking well initially, but deviating over time.
PF (1k particles) offered an enhanced result by eliminating cumulative errors in long
path tracking through boundary constraints, but faced issues: (1) Prone to failure: It
failed in 6 out of 10 trials due to incorrect particle convergence, (2) Slow conver-
gence: In successful trials, correct location was achieved after roughly 150 steps,
and (3) Inaccurate localization: It incorrectly entered an adjacent room (310) when
entering room 311 on F3, rendering it inaccurate and impractical in real applications.
Fortunately, they are overcome by our solutions. All ten instances of PF (1k parti-
cles) with CN matching successfully located the user. Particularly, there were two
instances where convergence to incorrect locations was observed, and both were rec-
tified using our CN matching-based location correction. Additionally, it detected all
rooms correctly. Moreover, the proposed scheme demonstrated performance compa-
rable to that of PF (1k particles) with CN matching, i.e., achieving successful tracking
throughout, convergence at approximately the 42nd step, and correct detection of all
rooms.
Table 10.6 also shows the AR and RMSE values for the approaches. Since we con-
ducted multiple computations for PF approaches, values in Table 10.6 were obtained
from the average of separate computations. Furthermore, because the convergence
speed is environment-dependent and unrelated to estimated location accuracy, the
AR score and RMSE loss of PF approaches were calculated when the convergence
is confirmed, i.e., at the 42nd step. This implies that slower convergence in the case
of PF without CN matching would yield a lower AR score. Furthermore, because the
estimated locations before convergence are random and would distort the informative
value of the RMSE loss, the RMSE loss for PF without CN matching is calculated
from the 150th step. In Table 10.6, PDR with Acc & Mag achieved an AR value of
8.1% within a 2.0 m error boundary, illustrating the inaccuracies in the conventional
10 Indoor Floor Detection and Localization Based … 259
PDR methods using consumer-grade processors. PDR with CF and its calibrated ver-
sion improved results by fusing orientations, but performance was still inadequate
due to inherent path deviation when IMU data is used over long trajectories. On
the other hand, the PF methods show a better performance. Among them, PF (1k
particles) without CN matching obtained low AR scores due to slow convergence.
By comparing the RMSE values, it can be observed that even excluding the factor
of convergence speed, the PF without CN matching still underperforms compared to
the PF with CN matching. Moreover, our proposed scheme’s performance is better
than the proposed scheme with MLP and is comparable to PF (1k particles) with CN
matching, which obtained an AR score of 96.7% within the error boundary.∈ = 2.0m.
Furthermore, the average particle count per iteration . Naverage can be computed as
∑1000
N p,i
. Naverage = i=42
, (10.26)
n conv
where . N p,i is the particle number of the .ith step and .n conv is the number of steps after
convergence. By computing (10.26), we obtain an . Naverage of 198.7, signifying our
scheme’s comparable performance to the PF using 1k particles, but with less than
1/5 of particles. Overall, the results show our floor detection method’s benefits for
2D localization in multi-floor scenarios, and confirm the proposed scheme’s superior
performance and computational efficiency.
260 C. Lin and Y. Shin
In this chapter, we contend that the merit of a localization system should not be solely
evaluated based on location accuracy, and a robust system should exhibit long-term
stability and the capability to efficiently process and operate on a limited amount
of measurement data. We propose an indoor multi-floor localization scheme that
leverages only a smartphone’s IMU and barometer sensors. Our scheme consists of
two components: DL-based floor detection and PF with clustering. Our scheme is
designed to facilitate indoor localization without relying on the infrastructure and
calculate the user’s location without a given initial state. We conducted multiple
extensive experiments in typical university buildings to evaluate the proposed floor
detection and multi-floor indoor localization. The experimental results show the
promising performance of our scheme. The DL-based floor detection accurately
tracked the floor number and efficiently extracted vertical movement information
under a variety of user activities. The indoor multi-floor long-path tracking scheme
achieved an average localization accuracy of over 96% within a 2 m error boundary
with a limited number of particles in the PF.
Our DL-based floor detection not only tracks the floor level but also extracts the
vertical movement information of a step. The floor level can be used to extend a
2D localization to the 3D application, and the vertical movement features are par-
ticularly useful for probabilistic methods such as the PF. Furthermore, the proposed
CN matching-based location correction also holds value within some infrastructure-
dependent systems. For instance, CN is well-suited to serve as a substitute for anchor
nodes in areas such as stairwells that lack adequate signal coverage.
While our scheme performs well in typical medium-sized buildings, its efficiency
may be challenged in large, open spaces (e.g., airports) due to the lack of map
constraints. Magnetic field information, providing universal absolute location data,
could be a solution, as suggested by recent research [47, 74, 75]. In future work, we
will optimize DL models and floor decision algorithms for better vertical movement
data, and accommodate various ways of carrying smartphones to make our scheme
applicable to more scenarios.
Acknowledgements This work was supported by the National Research Foundation of Korea
(NRF) grant funded by the Korea government (MSIT) (No. RS-2023-00251595).
References
1. Lin C, Shin Y (2023) Multi-floor indoor localization scheme using a Seq2Seq-based floor
detection and particle filter with clustering. IEEE Access 11:66094–66112
2. Poulose A, Han DS (2020) UWB indoor localization using deep learning LSTM networks.
Appl Sci 10(18):6290–6312
3. Zhao H, Cheng W, Yang N, Qiu S, Wang Z, Wang J (2019) Smartphone-based 3D indoor
pedestrian positioning through multi-modal data fusion. Sensors 19(20):4554–4573
10 Indoor Floor Detection and Localization Based … 261
4. Chen Z, Zhu Q, Soh Y (2016) Smartphone inertial sensor-based indoor localization and tracking
with iBeacon corrections. IEEE Trans Industr Inf 12(4):1540–1549
5. Nguyen TLN, Vy TD, Kim KS, Lin C, Shin Y (2021) Smartphone-based indoor tracking in
multiple-floor scenarios. IEEE Access 9:141048–141063
6. Brovko T, Chugunov A, Malyshev A, Korogodin I, Petukhov N, Glukhov O (2021) Com-
plex Kalman filter algorithm for smartphone-based indoor UWB/INS navigation systems. In:
Proceedings of Ural symposium on biomedical engineering, radioelectronics and information
technology (USBEREIT), pp 0280–0284
7. Tan P, Tsinakwadi TH, Xu Z, Xu H (2022) Sing-Ant: RFID indoor positioning system using
single antenna with multiple beams based on LANDMARC algorithm. Appl Sci 12(13):6751–
6765
8. Madgwick SOH, Harrison AJL, Vaidyanathan R (2011) Estimation of IMU and MARG ori-
entation using a gradient descent algorithm. In: Proceedings of IEEE international conference
on rehabilitation robotics, pp 1–7
9. Mahony R, Hamel T, Pflimlin JM (2008) Nonlinear complementary filters on the special
orthogonal group. IEEE Trans Autom Control 53(5):1203–1218
10. Xie L, Tian J, Ding G, Zhao Q (2017) Holding-manner-free heading change estimation for
smartphone-based indoor positioning. In: Proceedings of IEEE 86th vehicular technology
conference (VTC-Fall), pp 1–5
11. Jiménez AR, Seco F, Prieto JC, Guevara J (2010) Indoor pedestrian navigation using an
INS/EKF framework for yaw drift reduction and a foot-mounted IMU. In: Proceedings of
workshop on positioning, navigation and communication, pp 135–143
12. Wang C, Liang H, Geng X, Zhu M (2014) Multi-sensor fusion method using Kalman filter
to improve localization accuracy based on android smart phone. In: Proceedings of IEEE
international conference on vehicular electronics and safety, pp 180–184
13. Jiawei C, Wenchao Z, Dongyan W, Xiaofeng S (2022) Research on indoor constraint location
method of mobile phone aided by magnetic features. In: Proceedings of IEEE international
conference on indoor positioning and indoor navigation (IPIN), pp 1–7
14. Racko J, Brida P, Perttula A, Parviainen J, Collin J (2016) Pedestrian dead reckoning with
particle filter for handheld smartphone. In: Proceedings of IEEE international conference on
indoor positioning and indoor navigation (IPIN), pp 1–7
15. Fetzer T, Ebner F, Bullmann M, Deinzer F, Grzegorzek M (2018) Smartphone-based indoor
localization within a 13th century historic building. Sensors 18(12):4095–4126
16. Pipelidis G, Tsiamitros N, Gentner C, Ahmed D, Prehofer C (2019) A novel lightweight
particle filter for indoor localization. In: Proceedings of IEEE international conference on
indoor positioning and indoor navigation (IPIN), pp 1–8
17. De Cock C, Joseph W, Martens L, Trogh J, Plets D (2021) Multi-floor indoor pedestrian dead
reckoning with a backtracking particle filter and Viterbi-based floor number detection. Sensors
21(13):4565–4593
18. Wang X, Li T, Sun S, Corchado J (2017) A survey of recent advances in particle filters and
remaining challenges for multitarget tracking. Sensors 17(12):2707–2727
19. Qian J, Pei L, Ma J, Ying R, Liu P (2015) Vector graph assisted pedestrian dead reckoning
using an unconstrained smartphone. Sensors 15(3):5032–5057
20. Wu Y, Zhu HB, Du QX, Tang SM (2019) A survey of the research status of pedestrian dead
reckoning systems based on inertial sensors. Int J Autom Comput 16(1):65–83
21. Ristic B, Arulampalam S, Gordon N (2003) Beyond the Kalman filter: particle filters for
tracking applications. Artech house
22. Ye H, Gu T, Tao X, Lu J (2014) B-Loc: scalable floor localization using barometer on smart-
phone. In: Proceedings of IEEE international conference on mobile Ad Hoc and sensor systems,
pp 127–135
23. Yi C, Choi W, Jeon Y, Liu L (2019) Pressure-pair-based floor localization system using
barometric sensors on smartphones. Sensors 19(16):3622–3640
24. Ichikari R, Ruiz L, Kourogi M, Kurata T, Kitagawa T, Yoshii S (2015) Indoor floor-level
detection by collectively decomposing factors of atmospheric pressure. In: Proceedings of
IEEE international conference on indoor positioning and indoor navigation (IPIN), pp 1–11
262 C. Lin and Y. Shin
25. Ye HB, Gu T, Tao XP, Lv J (2015) Infrastructure-free floor localization through crowdsourcing.
J Comput Sci Technol 30(6):1249–1273
26. Wang Q, Fu M, Wang J, Luo H, Sun L, Ma Z, Li W, Zhang C, Huang R, Li X, Jiang Z, Huang
Y, Xia M (2023) Recent advances in floor positioning based on smartphone. Measurement
214:112813–112836
27. Willemsen T, Keller F, Sternberg H (2014) Concept for building a MEMS based indoor local-
ization system. In: Proceedings of IEEE international conference on indoor positioning and
indoor navigation (IPIN), pp 1–10
28. Nurminen H, Ristimäki A, Ali-Löytty S, Piché R (2013) Particle filter and smoother for indoor
localization. In: Proceedings of IEEE international conference on indoor positioning and indoor
navigation (IPIN), pp 1–10
29. Lin C, Shin Y (2022) Deep learning-based multifloor indoor tracking scheme using smartphone
sensors. IEEE Access 10:63049–63062
30. Nilsson JO, Gupta AK, Händel P (2014) Foot-mounted inertial navigation made easy. In:
Proceedings of IEEE international conference on indoor positioning and indoor navigation
(IPIN), pp 24–29
31. Ryu U, Ahn K, Kim E, Kim M, Kim B, Woo S, Chang Y (2013) Adaptive step detection algo-
rithm for wireless smart step counter. In: Proceedings of international conference on information
science and applications (ICISA), pp 1–4
32. Zhang Y, Zhu Z, Wang S (2018) Multi-condition constraint adaptive step detection method
based on the characteristics of gait. In: Proceedings of ubiquitous positioning, indoor navigation
and location-based services (UPINLBS), pp 1–5
33. Lee JH, Shin B, Kim SLJH., Kim C, Lee T, Park J (2014) Motion based adaptive step length
estimation using smartphone. In: Proceedings of IEEE international symposium on consumer
electronics (ISCE), pp 1–2
34. Shin S, Park C, Kim J, Hong H, Lee J (2007) Adaptive step length estimation algorithm using
low-cost MEMS inertial sensors. In: Proceedings of IEEE sensors applications symposium, pp
1–5
35. Abadleh A, Al-Hawari E, Alkafaween E, Al-Sawalqah H (2017) Step detection algorithm for
accurate distance estimation using dynamic step length. In: Proceedings of IEEE international
conference on mobile data management (MDM), pp 324–327
36. Weinberg H (2002) Using the ADXL202 in pedometer and personal navigation applications.
Analog Devices AN-602 application note 2(2):1–6
37. Jaworski W, Wilk P, Zborowski P, Chmielowiec W, Lee A, Kumar A (2017) Real-time 3D
indoor localization. In: Proceedings of IEEE International conference on indoor positioning
and indoor navigation (IPIN), pp 1–8
38. Ouyang G, Abed-Meraim K (2021) Analysis of magnetic field measurements for mobile local-
isation. In: Proceedings of IEEE international conference on indoor positioning and indoor
navigation (IPIN), pp 1–8
39. Haque F, Dehghanian V, Fapojuwo AO (2017) Sensor fusion for floor detection. In: Proceedings
of IEEE annual information technology, electronics and mobile communication conference
(IEMCON), pp 134–140
40. Zhao M, Qin D, Guo R, Wang X (2020) Indoor floor localization based on multi-intelligent
sensors. ISPRS Int J Geo Inf 10(1):6–22
41. Li Y, Gao Z, He Z, Zhang P, Chen R, El-Sheimy N (2018) Multi-sensor multi-floor 3D
localization with robust floor detection. IEEE Access 6:76689–76699
42. Boim S, Even-Tzur G, Klein I (2021) Height difference determination using smartphones based
accelerometers. IEEE Sens J 22(6):4908–4915
43. Ye H, Gu T, Zhu X, Xu J, Tao X, Lu J, Jin N (2012) FTrack: infrastructure-free floor localization
via mobile phone sensing. In: Proceedings of IEEE international conference on pervasive
computing and communications, pp 2–10
44. Itzik K, Yaakov L (2019) Step-length estimation during movement on stairs. In: Proceedings
of mediterranean conference on control and automation (MED), pp 518–523
10 Indoor Floor Detection and Localization Based … 263
45. Gao B, Yang F, Cui N, Xiong K, Lu Y, Wang Y (2022) A federated learning framework for
fingerprinting-based indoor localization in multibuilding and multifloor environments. IEEE
Internet Things J 10(3):2615–2629
46. Rihan N, Abdelaziz M, Soliman S (2022) A hybrid deep-learning/fingerprinting for indoor
positioning based on IEEE P802.11az. In: Proceedings of international conference on
communications, signal processing, and their applications (ICCSPA), pp 1–6
47. Abid M, Compagnon P, Lefebvre G (2021) Improved CNN-based magnetic indoor positioning
system using attention mechanism. In: Proceedings of IEEE international conference on indoor
positioning and indoor navigation (IPIN), pp 1–8
48. Feigl T, Kram S, Woller P, Siddiqui R, Philippsen M, Mutschler C (2019) A bidirectional
LSTM for estimating dynamic human velocities from a single IMU. In: Proceedings of IEEE
international conference on indoor positioning and indoor navigation (IPIN), pp 1–8
49. Wang Q, Luo H, Ye L, Men A, Zhao F, Huang Y, Ou C (2019) Pedestrian heading estimation
based on spatial transformer networks and hierarchical LSTM. IEEE Access 7:162309–162322
50. Klein I, Asraf O (2020) StepNet-deep learning approaches for step length estimation. IEEE
Access 8:85706–85713
51. Kim Y, Lee S, Lee S, Cha H (2012) A GPS sensing strategy for accurate and energy-efficient
outdoor-to-indoor handover in seamless localization systems. Mob Inf Syst 8(4):315–332
52. Yu M, Xue F, Ruan C, Guo H (2019) Floor positioning method indoors with smartphone’s
barometer. Geo-Spat Inf Sci 22(2):138–148
53. Wang L, Dong Z, Pei L, Qian J, Liu C, Liu D, Liu P (2015) A robust context-based heading esti-
mation algorithm for pedestrian using a smartphone. In: Proceedings of international technical
meeting of the satellite division of the institute of navigation (ION GNSS+), pp 2493–2500
54. Milette G, Stroud A (2012) Professional android sensor programming. Wiley
55. Sutskever I, Vinyals O, Le Q (2014) Sequence to sequence learning with neural networks.
Advances in neural information processing systems 27
56. Tanigawa M, Luinge H, Schipper L, Slycke P (2008) Drift-free dynamic height sensor using
MEMS IMU aided by MEMS pressure sensor. In: Proceedings of workshop on positioning,
navigation and communication, pp 191–196
57. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural
networks. In: Proceedings of international conference on artificial intelligence and statistics,
pp 249–256
58. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
59. Muralidharan K, Khan A, Misra A, Balan R, Agarwal S (2014) Barometric phone sensors: more
hype than hope!. In: Proceedings of workshop on mobile computing systems and applications,
pp 1–6
60. Android Developer. SensorEvent (2023). https://developer.android.com/reference/android/
hardware/SensorEvent
61. Apple Developer. CoreMotion (2023). https://developer.apple.com/documentation/
coremotion
62. Kang W, Han Y (2014) SmartPDR: smartphone-based pedestrian dead reckoning for indoor
localization. IEEE Sens J 15(5):2906–2916
63. Valenti R, Dryanovski I, Xiao J (2015) Keeping a good attitude: a quaternion-based orientation
filter for IMUs and MARGs. Sensors 15(8):19302–19330
64. Apple Developer. CoreLocation (2023). https://developer.apple.com/documentation/
corelocation
65. Poulose A, Senouci B, Han D (2019) Performance analysis of sensor fusion techniques for
heading estimation using smartphone sensors. IEEE Sens J 19(24):12369–12380
66. Kitagawa G (1993) A Monte Carlo filtering and smoothing method for non-Gaussian nonlinear
state space models. In: Proceedings of US-Japan joint seminar on statistical time series analysis,
pp 110–131
67. Doucet A, Johansen AM (2009) A tutorial on particle filtering and smoothing: fifteen years
later. In: Handbook of nonlinear filtering, vol 12, issue 3, pp 656–704
264 C. Lin and Y. Shin
11.1 Introduction
Wi-Fi positioning technology utilizes existing indoor access points (APs), and it has
advantages of simple system deployment, no requirement of additional hardware
equipment, and large coverage area. However, Wi-Fi signal has significant volatility,
due to signal scattering, reflection and diffraction, which directly affects the accuracy
of Wi-Fi positioning. Existing research has shown that machine learning methods can
suppress the effect of Wi-Fi signal fluctuation on localization [1], such as support
vector machines (SVM), artificial neural network (ANN), back propagation (BP)
neural network, and convolutional neural network (CNN) methods.
To achieve a more accurate and adaptable RSSI-based ranging model, the RSSI
data is transformed by translation and scaling to reduce its fluctuation (see Sect. 11.3).
Y. Lin (B) · K. Yu
School of Environmental Science and Spatial Informatics, China University of Mining and
Technology, Xuzhou 221116, China
e-mail: [email protected]
K. Yu
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 265
K. Yu (ed.), Positioning and Navigation Using Machine Learning Methods, Navigation:
Science and Technology 14, https://doi.org/10.1007/978-981-97-6199-9_11
266 Y. Lin and K. Yu
Following this, the transformed RSSI data and a BP neural network are employed
to construct the ranging model, termed GTBPD for simplicity (see Sect. 11.4). The
GTBPD ranging model has three main advantages: (1) Distance estimation can be
performed using the Wi-Fi RSSI signal received by a smartphone, without the need
for specialized equipment. (2) Unlike most existing ranging models that estimate
the distance between the receiver and the transmitter and rely on a path loss model
(PLM), which requires pre-deployment of the transmitter, our proposed GTBPD
ranging model estimates the distance between any two receivers. Once the GTBPD
model is trained with RSSI data from the smartphone, it can estimate the distance
between two indoor locations using just the smartphone collected RSSI data at those
locations. (3) The BP neural network, grounded in deep learning, constructs the
distance measurement model, which excels in nonlinear fitting and effectively adapts
to RSSI fluctuations in distance estimation. Additionally, a new localization algo-
rithm based on GTBPD model is introduced, which is named GTBPD-LSQP, and is
highly adaptable to different indoor environments (see Sect. 11.5). Before proceeding
further on the details of the proposed ranging model and the localization algorithm,
we first briefly study the fundamentals of Wi-Fi localization in the following section.
(1) Offline phase: firstly, the reference points are deployed in the location area of
indoor environment, and the positions of the points are precisely measured. A
device (e.g. a smartphone) is used to collect the RSSI of a number of APs at each
reference point, so a fingerprint is formed as a vector of RSSI, labeled by the
reference point position. Then, using the fingerprints of all the reference points,
an RSSI offline database is built and stored in a server or processing center.
(2) Online phase: when the pedestrian moves in the location area, the smartphone
carried by the pedestrian collects the RSSI of each of the same APs in the
offline phase and sends the online RSSI vector to the server. The server then
compares the online RSSI vector with the RSSI vectors of the database and uses
a positioning algorithm to estimate the current location of the user and returns
the estimated coordinates to the user.
11 An Indoor Wi-Fi Localization Algorithm Using BP Neural Network 267
Indoor environment
Here [x̂u , ŷu ]T is the estimated user position, [xi , yi ]T is position of the ith reference
point out of the k reference points selected, and wi is the ith reference point’s weight,
which is determined based on RSSI Euclidean distance [3]. In addition, in order
to overcome the shortcomings of the WKNN algorithm, Shin et al. introduced the
Enhanced Weighted K-Nearest Neighbor (EWKNN) algorithm [4], and a Bayesian
localization algorithm was introduced in [5], aiming to improve accuracy.
268 Y. Lin and K. Yu
Here, d̂i (i = 1, 2, 3) is the distance estimate from the ith AP to the receiver.
Figure 11.2. illustrates the basic principle of the algorithm in the absence of distance
measurement error, where the three distance circles intersect at one point which is
the true location of the user. However, in reality, there are usually errors in distance
measurements. As a consequence, as shown in Fig. 11.3, such an intersection at a
point will not happen, but two different situations would occur, and solving Eq. (11.2)
will produce an estimate of the user’s position. To reduce the effect of ranging errors,
redundant distance observations related to more APs can be used. Linear least-squares
algorithm and optimization algorithms can be utilized to determine the user’s position
[1].
Angle-based positioning method utilizes the geometric relationship between the APs
and the receiver to estimate position, as shown in Fig. 11.4. Unlike ranging-based
method, the measurement is not distances, but angles. θi is the angle between the
positive x-axis and the direction from the ith AP to the user (smartphone). And θi can
be measured by the Angle of Arrival (AoA) method [9] or the Angle of Departure
(AoD) method [10]. The relationship between θi and the user position (x̂u , ŷu ) is
given as:
ŷu − yi
tan θi = (11.3)
x̂u − xi
11 An Indoor Wi-Fi Localization Algorithm Using BP Neural Network 269
AP2
AP3
x2 , y2 x3 , y3
d̂ 2
d̂3
xˆu , yˆu
d̂1
AP1
x1 , y1
where (xi , yi ) is the position of the ith AP. If the measurements are made at M APs,
according to (11.3), the following equation is generated:
270 Y. Lin and K. Yu
xˆu , yˆu
i
APi xi , yi
X
⎡ ⎤ ⎡ ⎤
y1 − x1 tan θ1 − tan θ1 1
⎢ ⎥ ⎢ [ ]
⎢ y2 − x2 tan θ ⎥ ⎢ − tan θ2 1⎥ ⎥ x̂u
⎢ .. ⎥=⎢ .. .. ⎥ (11.4)
⎣ . ⎦ ⎣ . . ⎦ ŷu
yM − xM tan θ M − tan θM 1
Then the linear least-squares algorithm can be used to determine the unknown
position (x̂u , ŷu ).
for indoor localization applications. Feng et al. [18] presented a timely and compre-
hensive review of the most interesting deep learning methods for Wi-Fi fingerprint
recognition, with the goal of identifying the most effective neural networks under
various localization evaluation metrics.
When utilizing RSSI for positioning, averaging multiple RSSI samples is a common
practice to mitigate the impact of signal fluctuations. However, as demonstrated
in Fig. 11.5, repeated RSSI measurements from the same AP at the same location
reveal that Wi-Fi signals are highly vulnerable to external environmental factors.
Signal fluctuations can achieve 10 dBm, significantly hindering the accuracy of
RSSI-based ranging model. To reduce the effect of RSSI fluctuations, we perform
transform the RSSI vector by translation and scaling, as follows.
The RSSI vector of the AP signal received at the ith location point is converted
using z-score standardization by (11.5), to mitigate the effects of signal variations.
RSSI i − vi
TRi = , i = 1, 2, . . . (11.5)
σi
Here RSSI i represents the RSSI vector received at the ith location point, which can
be described as:
where, M is the total number of APs in the indoor location area, and ri,q (q =
1,2, . . . , M ) represent the RSSI of the qth AP signal received at the ith location
-42
RSSI (dBm)
-44
-46
-48
-50
0 20 40 60 80 100
Time (s)
272 Y. Lin and K. Yu
Z-score standardization employs the mean and standard deviation of all compo-
nents within the vector, making the data comparable across different dimensions and
allowing for more reliable extraction of data characteristics [19]. After standardiza-
tion, each element in the RSSI vector adheres to a standard normal distribution with
a mean of 0 and a standard deviation of 1. This process diminishes discrepancies
among corresponding elements caused by noise and multipath interference, while
preserving the essential characteristics of the RSSI vector. Then a ranging model
based on this transformed RSSI vector is proposed.
We develop a new ranging model using a BP neural network and the transformed
RSSI vector. The number of hidden layers of BP network can be regarded as either
multiple or single. While increasing the number of hidden layers might slightly
enhance estimation accuracy, it also raises the risk of overfitting, where the network
performs well on training data but poorly on online test data. Furthermore, a multi-
layer hidden network structure is more complex, leading to longer training time. To
create a training time-efficient ranging model, we employ a BP neural network with
one hidden layer, as illustrated in Fig. 11.6. Satisfactory estimation accuracy can be
achieved with a sufficient number of nodes in a single hidden layer.
The differential transformed RSSI vector between the ith and jth location points
can be calculated by (i = j means at the same location):
Here TRi and TRj are calculated by (11.5). Then, each element in the TRij vector is
squared and define as TR2ij :
{ }
(ij) (ij)
TR2ij = Input1 , . . . , InputM
11 An Indoor Wi-Fi Localization Algorithm Using BP Neural Network 273
1
W1
1
W111 H1( ij )
Input1( ij ) W211
W2
WM1 1
1
2
W121 W12
Input2( ij ) W221
H 2( ij ) W22
2
W 1
M2
. D̂ij
.
TR 2
ij .
.
.
. WL2
W1L1
InputM( ij ) W21L 1
L
WML
1
H L( ij )
([ ]2 [ ]2 )
ri,1 − vi rj,1 − vi ri,M − vi rj,M − vj
= , ,..., − (11.9)
σi σj σi σj
Here [] is the rounding down operation, M represents the number of nodes in the input
layer, equivalent to the total number of APs involved, and O represents the number
of nodes in the output layer. Given that the model’s output is solely the distance, O is
set to 1. The parameter a is a positive integer constant typically ranging from 1 to 10
274 Y. Lin and K. Yu
[22]. In this study, the value of a is determined using data from previous extensive
experiments.
(ij)
In Fig. 11.6 Hk (k = 1,2, . . . , L) is the value of the kth node in the hidden layer.
θk1 represents the bias of the kth node in the hidden layer, and θ 2 represents the bias
of the node in the output layer. In fact, the values of the weights and the biases are
(ij)
given randomly at first. Then Hk and D̂ij are calculated as follows:
⎧ (∑ ( ) )
⎨ H (ij) = f M (ij)
q=1 Wqk × Inputq − θk1 , k = 1, 2, . . . , L
1
k
Δ
(∑ ( ) ) (11.11)
⎩ Dij = f L
W 2
× H
(ij)
− θ 2
k=1 k k
Here, f (x) is the activation function, which is set to be f (x) = (arctan(x) + 1)−1
from the input layer to the hidden layer, and set to be f (x) = x from the hidden layer
(ij)
to the output layer. Then, Hk and D̂ij become:
⎧ ⎛ ⎞ ⎫−1
⎨ ∑M ( ) ⎬
(ij)
Hk = arctan⎝ 1
Wqk × Inputq(ij) − θk1 ⎠ + 1
⎩ ⎭
q=1
(11.12)
L (
∑ )
(ij)
D̂ij = Wk2 × Hk − θ2
k=1
BP network needs input samples and output samples to train the values of weights
and biases. The RSSI data collected from reference points in the location area of
interest is used to train the ranging model. Assume that η reference points are laid
out in the location area, indexed by i, j = 1, 2, . . . , η in (11.12) and the total number
of training samples, denoted by S, equals η × η. To ensure that the weights and biases
in the network can adapt to different input TR2ij , the BP neural network undergoes
iterative training using the BP algorithm [23]. This process updates the values of the
weights and biases so that the loss function E, defined by:
η η
1 ∑∑
E= (D̂ij − Dij )2 (11.13)
S i=1 j=1
is finally minimized. Here, Dij is the true distance from the ith location point to the
jth location point:
/
Dij = (Xi − Xj )2 + (Yi − Yj )2 (11.14)
where (Xk , Yk ) is the true location of the kth reference point. The final values of the
network’s weights and biases are established when the loss function E falls below a
threshold or the maximum number of iterations is reached, thereby completing the
construction of the ranging mode. Notably, we perform 100 iterations of network
11 An Indoor Wi-Fi Localization Algorithm Using BP Neural Network 275
training for each given value of a using pre-measured data. The loss function values
after 100 iterations are presented in Table 11.1. The training results indicate that the
loss function is minimized when a equals 10. Consequently, in this chapter, the BP
neural network constructs the ranging model with a set to 10. However, the initial
weights and biases in the network are assigned randomly, which can impact the
accuracy of the ranging model. Suboptimal initial values may cause the loss function
E to settle at a local minimum rather than the global minimum [24]. Therefore,
optimizing the initial weights and biases is essential for enhancing the accuracy of
the BP network-based ranging model.
To optimize these initial values, accelerate network convergence, and achieve
the global optimal solution, this chapter employs the GA for optimizing BP neural
networks. While other algorithms, such as particle swarm optimization and simulated
annealing, can also optimize the initial weights and biases, GA offers distinct advan-
tages. Firstly, GA operates using a coding scheme and can optimize multiple param-
eters simultaneously, making it highly operable. Secondly, GA iteratively optimizes
parameters based on probabilistic transfer rules, which provides superior global opti-
mization capabilities. Thus, we utilize GA to optimize the initial weights and biases
in the BP neural network, thereby constructing a high-precision ranging model. GA,
a metaheuristic algorithm inspired by natural selection, belongs to the broader cate-
gory of evolutionary algorithms [25]. The process of multi-objective optimization
using GA involves six steps, as outlined below.
(1) Initialization
Firstly, the chromosome representing the individual is needed to define, which is set
as a string of binary characters with length C:
C = (M × L + L + L × O + O) × 4 (11.15)
Here, O stands for the number of output layer nodes, L stands for the number of hidden
layer nodes, and M stands for the number of input layer nodes. We represent every
4 characters with a value between −7 and 7, with the first bit indicating whether the
number is positive or negative, to simplify calculations and accelerate optimization.
The chromosome is divided into 4 parts, as illustrated in Fig. 11.7. Specifically,
Part 1 contains 4ML characters and represents the values in the weight vector W1 .
Part 2 consists of 4L characters, representing the bias values θk1 of the hidden layer
(k = 1,2, . . . , L). Part 3 includes 4LO characters, representing the values in the
Table 11.1 The value of loss function E when a takes different values
a 1 2 3 4 5 6
E 0.0171 0.0141 0.0163 0.0140 0.0146 0.0143
7 8 9 10 11 12 13
0.011 0.0137 0.0134 0.0102 0.0118 0.0127 0.0133
276 Y. Lin and K. Yu
weight vector W2 . Part 4 has 4O characters, representing the bias value θ 2 of the
output layer.
(2) Fitness
At the start, each character value of the chromosome is randomly generated, creating
an individual that represents all the weights and biases of the BP neural network.
Let the population size be B, meaning there are B chromosomes in total. For the
weights and biases in the neural network represented by the bth chromosome (b =
1, 2, . . . , B), and using all input samples, the estimated distance of all input samples
under the bth chromosome are obtained using Eq. (11.12). Use the data collected at
the reference point as sample data, then define the estimated distance vector R̂b for
all input samples under the bth chromosome as:
Δ [ b Δ
b Δ
b
Δ
bΔ
b
Δ ]
Rb = G 11 , G 12 , . . . , G 21 , G 22 , . . . , G ηη , b = 1, 2, . . . , B (11.16)
Here Ĝijb (i, j = 1, 2, . . . , η) is the estimated distance of the input sample TR2ij under
the bth chromosome. Then fitness of the bth individual is caulated as follows:
[
|∑
| η ∑ η
Fb = √ (Ĝijb − Dij )2 (11.17)
i=1 j=1
Here Dij is defined in (11.14). The smaller the individual’s fitness, the more accurate
the prediction, indicating better initial values for the network’s weights and biases
[11].
Then the average fitness of the population is calculated by:
∑
B
Fb
b=1
F= (11.18)
B
(3) Selection
The next generation of new individuals is generated by mutation and crossover, and
the roulette wheel selection [26] is employed to decide whether to retain the resulting
new individuals. And, the probability of selecting the individual to remain is given
by:
11 An Indoor Wi-Fi Localization Algorithm Using BP Neural Network 277
F −1
Pb = ∑B b −1 (11.19)
b=1 Fb
∑
b
Qb = Pj (11.20)
j=1
(4) Mutation
(5) Crossover
P1 P2 ... PB
Q1
Q2
QB-1
QB
The bth
individual 0 0 0 1 ... 1 0 0 1 ... 1 1 0 0 ... 1 1 1 0
Mutation operation
The new
individual 0 0 0 1 ... 1 1 0 1 ... 1 1 0 0 ... 1 1 1 0
The bth
individual 0 0 0 1 ... 1 0 0 1 ... 1 1 0 0 ... 1 1 1 0
The b+1th
individual 0 1 0 1 ... 1 0 0 1 ... 1 0 1 1 ... 1 0 1 0
Crossover operation
The new
individual 0 0 0 1 ... 1 0 0 1 ... 1 1 0 0 ... 1 0 1 0
The new
individual 0 1 0 1 ... 1 0 0 1 ... 1 0 1 1 ... 1 1 1 0
Fig. 11.10 The bth individual is crossed with the (b + 1)th individual
of two new individuals. Each individual engages in crossover with only one other
individual. Subsequently, after B/2 crossover operations, a new population is formed.
(6) Iteration
Continue repeating steps (2) through (5) until either F is below a threshold or the
predefined number of iterations is reached. Following this, select the values of weights
and biases represented by the chromosome with the minimum fitness value in the
population as the initial values for the BP neural network’s weights and biases, used
to construct the ranging model.
Using measured data from the indoor environment, the initial values for the
ranging model’s weights and biases are determined through GA. Subsequently, the
ranging model undergoes iterative training via the BP algorithm until the loss func-
tion E falls below a threshold or the predefined number of iterations is reached. The
flowchart illustrating the construction of our proposed GTBPD model is depicted in
Fig. 11.11. The training data’s input samples TR2ij are calculated using Eq. (11.9),
and the output samples of the training data {Dij } are calculated using Eq. (11.14).
Upon completion of the offline phase, the ranging model is established and utilized
for position determination during the online phase.
During the online phase, the RSSI vector collected at the target point undergoes
translation and scaling according to (11.5). Subsequently, input vectors of GDBPD
model from the target point to all reference points are computed using (11.9). The
established GDBPD model is then employed to derive the estimated distance vector
Δ Δ Δ Δ
Start
Y
N Complete the
Reach number of iterations
Y construction of
or satisfy E requirements?
Execute selection GTBPD ranging model
N
Execute Mutation
Update the weights
and biases of the
network by back
Execute Crossover propagation training
algorithm
Calculate the
average fitness by
(11.18)
the distance estimates are typically unreliable. To mitigate these issues, a distance
threshold θd is employed to determine the acceptability of the distance estimate. The
threshold θd is primarily determined by the dimensions of the indoor environment.
If θd is too large, both aforementioned situations may arise; conversely, setting it
too small may result in insufficient estimated distances for accurate position deter-
mination. The impact of the θd will be assessed using experimental data. Excluding
estimated distances greater than the θd yields the new estimated distance vector
Δ Δ Δ Δ ∼
ξ = [ξ 1 , ξ 2 , . . . , ξ ε ], while the true distance vector is denoted by d= [d̃1 , d̃2 , . . . , d̃ε ],
Δ Δ Δ Δ
squares estimator is employed to calculate the initial position estimate of the target.
Specifically, squaring both sides of the distance equations yields:
⎧ 2 Δ
⎪
Δ Δ
0 2 0 2 2
.. (11.21)
⎪
⎪ .
⎪
⎪
⎩ Δ Δ 2 Δ
Here, (x̂0 , ŷ0 ) represents the initial estimated coordinates of the target point, (x̃i , ỹi )
is the true position of the ith reference point, and ξ̂i (i = 1,2, . . . , ε) is the estimated
distance from the target point to the ith reference point. By subtracting the last
equation from each of the other equations in (11.21) and rearranging the resulting
equations, we obtain:
Ap0 = b (11.22)
Here b is constant vector, p0 is the initial estimated position vector, and A is constant
matrix, which are defined as:
⎡ ⎤
2(x̃1 − x̃ε ) 2(ỹ1 − ỹε )
⎢ 2(x̃2 − x̃ε ) 2(ỹ2 − ỹε ) ⎥
⎢ ⎥
A=⎢ .. .. ⎥ (11.23)
⎣ . . ⎦
2(x̃ε−1 − x̃ε ) 2(ỹε−1 − ỹε )
⎡ ⎤
x̃12 − x̃ε2 + ỹ12 − ỹε2 − ξ12 + ξε2
⎢ x̃22 − x̃ε2 + ỹ22 − ỹε2 − ξ22 + ξε2 ⎥
⎢ ⎥
b=⎢ .. ⎥ (11.24)
⎣ . ⎦
2
x̃ε−1 − x̃ε2 + ỹε−1
2
− ỹε2 − ξε−1
2
+ ξε2
[ ]
x̂
p0 = 0 (11.25)
ŷ0
Therefore, the initial position estimate using the least squares estimator is:
[27], which is a very effective algorithm for nonlinear constrained optimization prob-
lems. The position estimates given in (11.26) are used as initial position guesses for
the SQP algorithm.
The nonlinear distance error equations can be described as:
⎧ / Δ
⎪ .. (11.27)
⎪
⎪ .
⎪
⎩ / Δ
vε = (x − x̃ε )2 + (y − ỹε )2 − d ε
Δ Δ Δ Δ
Here {gi ≤ 0} are the constraints, and {gi } are defined as:
⎧
⎪
⎪ xmin − x, i = 1
⎪
⎪
⎪
⎪ ymin − y, i = 2
⎪
⎪
⎪
⎨ x − xmax , i = 3
gi = (11.30)
⎪
⎪ y − ymax , i = 4
⎪
⎪
⎪
⎪ −d̂i−4 , 5 ≤ i ≤ ε + 4
⎪
⎪
⎪
⎩
d̂i−(ε+4) − θd , ε + 5 ≤ i ≤ 2ε + 4
where (xmax , ymax ) represents the maximum coordinate values in the independent
coordinate system, while (xmin ,ymin ) represents the minimum coordinate values.
Subsequently, the initial position estimate and the distance estimates obtained from
the GTBPD ranging model are employed as the initial values of the parameter vector:
Δ Δ Δ
Δ Δ
u0 = [x0 , y0 , ξ 1 , ξ 2 , . . . , ξ ε ]T (11.31)
Next, the Taylor expansion is used to simplify the objective function F(u) of the
nonlinear constrained problem at the kth iteration into a quadratic function:
282 Y. Lin and K. Yu
1
F(u) = [u − uk ]T ∇ 2 F(uk )[u − uk ] + ∇F(uk )[u − uk ]
2 (11.32)
s.t. ∇gi (uk )T [u − uk ] + gi (uk ) ≤ 0 i = 1, 2, . . . , 2ε + 4
Here ∇ is the gradient (derivative) operator. uk is the value of the parameter vector
u at the kth iteration. Define
⎧
⎪
⎪ Sk = u − uk
⎪
⎪
⎪
⎪ H = ∇ 2 F(uk )
⎪
⎨ k
Ck = ∇F(uk ) (11.33)
⎪
⎪ [ ]T
⎪
⎪ A = ∇g (u ), ∇g (u ), . . . , ∇g (u )
⎪
⎪ k 1 k 2 k 2ε+4 k
⎪
⎩ [ ]T
Bk = − g1 (uk ), g2 (uk ), . . . , g2ε+4 (uk )
1 T
F(u) = S Hk Sk + CTk Sk
2 k (11.34)
s.t. Ak Sk ≤ Bk
According to the active set method, take the case where the constraint inequalities
in Eq. (11.34) are equal:
1 T
min [(Sk , λ) = S Hk Sk + CTk Sk + λT (Ak 'Sk − Bk ') (11.36)
2 k
which is actually a system of linear equations with (Sk , λ) as variables, and the
solution of this equation is obtained for Sk , which yields the optimized estimation
of u for the kth iteration, and the process repeats until (11.36) satisfies a certain
condition or the predefined number of iterations is reached.
In summary, the functional block diagram depicted in Fig. 11.12 outlines the
proposed GTBPD-LSQP localization algorithm, which involves three main phases:
1. Offline RSSI Transformation: Initially, the extract stable feature of RSSI vector
is obtained by transformation.
2. Offline Ranging Model Construction: Subsequently, a ranging model is
constructed using a BP neural network. The initial values of the network’s weights
and biases are optimized through GA.
3. Online Phase: The RSSI vector obtained at the target point undergoes transfor-
mation. Utilizing the established ranging model, distances from the target point
to each reference point are computed. Position determination is then executed
based on the least-squares estimator and SQP algorithm [28].
Expanding upon these steps, the algorithm nables robust and accurate indoor
localization by effectively addressing signal fluctuations and leveraging the predictive
capabilities of neural networks. Through the integration of optimization techniques
and iterative algorithms, it ensures precise positioning even in challenging indoor
environments.
We conducted experiments in the academic office buildings No. 4 and No. 5 at the
China University of Mining and Technology (CUMT), to evaluate the localization
accuracy of our proposed algorithm. Specifically, we established experimental fields
within these buildings: two fields measuring 7 m * 10 m were arranged on the first
and second floors of the No. 5 academic office building. Notably, these fields were
situated in the lobby areas adjacent to the stairs, characterized by complex building
structures and a high volume of foot traffic. Additionally, we set up a 9 m * 15 m
experimental field in a discussion hall located in the No. 4 academic office building.
In both office buildings, APs were evenly distributed to facilitate data services for
the campus network.
Figure 11.13 provides visual representations of the field conditions during data
collection. The left image depicts the first experimental field, the middle image
showcases the second experimental field, and the right image illustrates the third
experimental field. It’s worth noting that these experimental areas are frequently
traversed by students, leading to fluctuations in RSSI due to the movement of people.
These experimental setups enable us to evaluate the performance of our algorithm
in real-world scenarios characterized by complex indoor environments and dynamic
human activities.
The size of every fingerprint database is n * M, with n denoting the number of
reference points and M representing the total number of APs scanned in the indoor
location area. An AP that has been scanned previously may not be scanned next
time. The RSSI value corresponding to an un-scanned AP at a reference point is set
as −100dBm, in order to make sure that the array length of the fingerprint data for
Fig. 11.14 Layout of reference and test points at three experimental fields
each reference point in the fingerprint database is the same. We conduct the scanning
and collection of Wi-Fi data via the MI 5 smartphone once per second. Repeat the
scanning for 10 times at each reference point, and then take the average of the RSSI
values of the 10 collections for each AP as the RSSI value. In this way, the fingerprint
of each reference point is obtained, which is corresponding to the vector RSSIi .
We set up independent coordinate systems in each of the three experimental fields.
The left image of Fig. 11.14 shows the arrangement of reference and test points in
the first and second indoor experimental fields. Each of these two experimental fields
contains 117 reference points and 54 online test points which are evenly distributed.
The right image of Fig. 11.14 shows the layout of reference and test points in the
third experimental field, where 187 reference points and 120 test points were laid
out.
In the first experimental field, a total of 163 AP signals were received, and when
establishing the BP neural network, the number of hidden layer nodes was set to 22.
In the second experimental field, a total of 210 AP signals were received, and the
number of hidden layer nodes was set to 24. And in the third experimental field, a
total of 200 AP signals were received, and then the number of hidden layer nodes
was also set to 24. As mentioned before, the determination of the number of hidden
layer nodes is based on (11.10) and the analysis of the experimental results.
the two sessions, while the right image displays the Transformed RSSI (TR) vectors
obtained after applying translation and scaling. In Fig. 11.15, the horizontal axis
represents the IDs of the various APs, with a total of 40 APs scanned at each location
point. If an AP scanned during the first collection is not detected during the second
collection, the corresponding RSSI value for that AP is set to −100 dBm. This
analysis allows us to assess the consistency and stability of the RSSI data across
different collection sessions and ascertain the effectiveness of the translation and
scaling process in mitigating signal fluctuations and enhancing the reliability of the
collected data.
The length of the dashed line in Fig. 11.15 indicates the discrepancy between the
two RSSI values collected at the same location point with the same AP. From the
graphs (a), (c), and (e) in Fig. 11.15, it is evident that the two RSSI values for the
same AP at the same location point can differ by approximately 10 dBm. An AP
that can be scanned the first time might not be able to be scanned the second time.
At any given time, as long as other conditions such as the smartphone hardware
conditions remain unchanged, the RSSI values received from the same AP at the
same location will be equal without being influenced by the environment. However,
the environment typically changes over time, and the RSSI values received from the
same AP at the same point may vary for different phones due to the heterogeneity of
the phones. Hence, the variation of RSSI is unavoidable, and the deviation of its value
can have a significant impact on the accuracy of fingerprint-based Wi-Fi localization.
Therefore, the RSSI vector is translated and scaled to minimize the impact. As can
be observed from the three plots (b), (d), and (f) in Fig. 11.15, the RSSI vectors
collected twice at the three points have a better match after translation and scaling.
Here, we use the mean relative absolute deviation (MRAD) to evaluate the degree
of match between two vectors, which is calculated by:
| |
1 ∑ |R1,i − R2,i |
V
MRAD = | |/ (11.39)
V i=1 |R1,i + R2,i | 2
Here R1,i denotes the value of the ith element of vector in the first collection, and
similarly R2,i denotes the ith element of vector in the second collection, and V is the
length of the vector.
At the same location, when comparing the MRAD between original RSSI vectors
collected at two different times and the MRAD between the TR vectors obtained
after z-score normalization of the corresponding RSSI vectors, significant reduc-
tions in MRAD are observed. Table 11.2 illustrates these MRAD values between
the original RSSI vectors and the MRAD between the TR vectors at three distinct
positions. The results demonstrate a substantial decrease in MRAD when utilizing
TR vectors. Following the translation and scaling of the RSSI vectors, the MRAD
of the TR vectors at all three points is notably reduced, typically ranging from one
eighth to one tenth of the MRAD before transformation. This indicates a remarkable
improvement in matching the TR vectors derived from translated and scaled RSSI
vectors collected at different times. Consequently, this transformation effectively
11 An Indoor Wi-Fi Localization Algorithm Using BP Neural Network 287
-50 3
-55 First collection First collection
Second collection Second collection
-60 2
-65
RSSI (dBm)
-70 1
TR
-75
-80 0
-85
-90 -1
-95
-100 -2
0 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 35 40 45
AP ID AP ID
(a) First point raw RSSI vector (b) First point transformed RSSI vector
-30 4
-35 First collection First collection
-40 Second collection 3 Second collection
-45
-50 2
-55
RSSI (dBm)
-60 1
TR
-65
-70 0
-75
-80 -1
-85
-90 -2
-95
-100 -3
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
AP ID AP ID
(c) Second point raw RSSI vector (d) Second point transform RSSI vector
-45 4
-50 First collection First collection
Second collection 3 Second collection
-55
-60 2
-65
RSSI (dBm)
-70 1
TR
-75 0
-80
-85 -1
-90
-2
-95
-100 -3
0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50
AP ID AP ID
(e) Third point raw RSSI vector (f) Third point transform RSSI vector
Fig. 11.15 Original RSSI vector and the vector after translation and scaling
mitigates the fluctuation effects of RSSI, thereby preserving relatively stable RSSI
vector feature information while minimizing redundancy.
288 Y. Lin and K. Yu
Firstly, we make use of the measured data to check if the GA that we have designed can
provide better initial weights and biases for our proposed ranging model. Figure 11.16
depicts the alteration process of the minimum fitness value within the population
under 100 iterations for the GA optimization by respectively employing the finger-
print data of reference points from three experimental fields. It can be observed from
Fig. 11.16 that the minimum fitness values in the three experimental fields via the GA
optimization network decrease along with the increase of the number of iterations.
The smaller the fitness value is, the better the corresponding initialized values of the
network weights and biases. The outcomes in Fig. 11.16 demonstrate that the GA
has the ability to optimize our ranging model.
Then three ways of constructing the ranging model are considered. Table 11.3
shows the mean absolute error (MAE) of the estimated distances from all test points
to all reference points for the different construction methods of the ranging model.
Note that Site i (i = 1, 2, 3) denotes for the ith experimental field. BPD represents
the model in which the original RSSI vector is used directly as the input sample
data, meaning {Input1 , ..., InputM } = { [ri,1 −rj,1 ]2 ,..., [ri,1 −rj,1 ]2 } in (11.17), and
the network weights and biases are not initialized by GA. TBPD denotes the model in
which the transformed RSSI vector is used as the input sample data, but the network
is not initialized by GA. GTBPD represents the model in which the transformed
RSSI vector is used as the input data and the network is initialized by GA, i.e. the
ranging model we proposed.
Figure 11.17 illustrates the cumulative distribution function (CDF) of distance
estimation errors across three experimental fields using ranging models derived
from three distinct approaches. Notably, the GTBPD model exhibits superior perfor-
mance, surpassing the other two models by a significant margin. This underscores
the efficacy of the RSSI transformation and network initialization. To illustrate the
ranging performance of the GTBPD, let’s examine the percentage of distance esti-
mation errors within 3 m: For the first and second experimental fields, the GTBPD
model achieves approximately 76% accuracy within 3 m, representing a notable 10%
enhancement over the alternative ranging models. In the third experimental field, the
GTBPD model maintains a distance estimation error within 3 m at about 71.8%, while
the GBPD model and BPD model report 62.2 and 57.5% respectively, indicating the
clear superiority of the GTBPD approach.
Table 11.3 provides further insight into the performance discrepancies, specifi-
cally through the MAE of distance estimation. Through translation and scaling of
RSSI, the MAE experiences reductions of 0.165, 0.401, and 0.318 m across the
11 An Indoor Wi-Fi Localization Algorithm Using BP Neural Network 289
Fitness (m)
452
450
448
446
0 20 40 60 80 100 120
Number of evolutions
(a) First experimental field
495
485
480
475
470
0 20 40 60 80 100 120
Number of evolutions
(b) Second experimental field
990
970
Fitness (m)
960
950
940
930
920
0 20 40 60 80 100 120
Number of evolutions
(c) Third experimental field
290 Y. Lin and K. Yu
Table 11.3 Mean absolute error of distance estimation under different ranging models
Models GTBPD TBPD BPD
MAE (m) Site 1 2.070 2.726 2.891
Site 2 2.000 2.605 3.006
Site 3 2.285 2.647 2.965
three experimental fields respectively. This indicates the efficacy of the RSSI vector
transformation method in extracting pertinent feature information from RSSI signals,
thereby mitigating their inherent fluctuations. Moreover, the utilization of GA opti-
mization significantly enhances the accuracy of the ranging model. Table 11.3 illus-
trates that compared to the TBPD model, the MAE of the GTBPD model experiences
reductions of 0.656, 0.605, and 0.362 m in the three experimental fields respectively.
This underscores the effectiveness of network initialization facilitated by GA. Addi-
tionally, the MAE of distance estimation using the proposed GTBPD model consis-
tently hovers around 2 m across diverse indoor environments. This indicates robust
performance and the model’s adaptability to varied indoor settings.
Based on the experimental field’s dimensions, the distance threshold θd is set between
3 and 7 m for experimental analysis. Table 11.4 presents the statistics of positioning
errors for three experimental sites with different distance thresholds. The results
indicate that optimal positioning accuracy is achieved with θd = 4 m for the first
experimental site, θd = 5 m for the second site, and θd = 4 m for the third site.
Notably, the positioning accuracy tends to diminish as the θd increases or decreases.
However, the disparity in accuracy among different thresholds within the range is
minimal. Hence, the algorithm’s sensitivity to the selection of θd is low, suggesting
a preference for θd = 4 m in similar scenarios. Nonetheless, selecting an appropriate
threshold θd should be based on the location area’s size and experimental findings.
Once the GTBPD ranging model for the location area of interest is established, it
becomes applicable for position determination using RSSI data at the target point.
In each of the three experimental fields, a random test point was selected. Different
ranging models established for these fields were then utilized to predict distances from
the selected test points to all reference points within their respective experimental
fields. Scatter plots of predicted versus true distances for these test points are depicted
in Fig. 11.18. Ideally, when the predicted distances match the true distances from the
11 An Indoor Wi-Fi Localization Algorithm Using BP Neural Network 291
CDF (%)
50
40
30
20
10
0
0 1 2 3 4 5 6 7 8 9 10
Distance Error (m)
(a) First experimental
100
90
80
GTBPD
70 TBPD
60 BPD
CDF (%)
50
40
30
20
10
0
0 1 2 3 4 5 6 7 8 9 10
Distance Error (m)
(b) Second experimental
100
90
80
GTBPD
70 TBPD
60 BPD
CDF (%)
50
40
30
20
10
0
0 1 2 3 4 5 6 7 8 9 10
Distance error (m)
(c) Third experimental
292 Y. Lin and K. Yu
test point to each reference point, scatter points in Fig. 11.18 align precisely along
the solid blue line.
From Fig. 11.18, it is apparent that scatter points generally cluster near the blue
line. By eliminating distances exceeding the distance threshold θd along with their
corresponding reference points, the positions of test points are determined using the
least squares estimator and SQP. Additionally, the positions of all test points within
the localization area were estimated using four algorithms introduced in Sect. 11.2:
WKNN algorithm, Bayesian localization algorithm, EWKNN algorithm, and the
GA-ANN-based localization algorithm proposed by [13], denoted as GA-ANN.
Comparisons were made with the proposed GTBPD-LSQP algorithm. The least-
squares estimator based on the GTBPD ranging model is denoted as GTBPD-LS
algorithm.
Tables 11.5, 11.6 and 11.7 present the positioning errors of six algorithms across
the three experimental fields. k values ranging from 6 to 14 were explored to analyze
the impact on the performance of the WKNN algorithm. The findings underscore
the superior performance of our proposed GTBPD-LSQP algorithm, consistently
exhibiting the lowest mean error and RMSE across all experimental fields. Firstly,
the GTBPD-LS algorithm marginally outperforms the GA-ANN algorithm, yielding
accuracy improvements of 0.126, 0.046, and 0.085 m for the respective experimental
fields. Secondly, the GTBPD-LSQP algorithm surpasses the GTBPD-LS algorithm
in all experimental fields, showcasing accuracy enhancements of 0.239, 0.139, and
0.361 m respectively. This improvement is achieved through iterative optimization,
employing the estimate derived from the GTBPD-LS algorithm as the initial estimate,
albeit at the expense of increased computational complexity. Furthermore, when
compared against the other four algorithms, the proposed GTBPD-LSQP algorithm
significantly outperforms them. For instance, compared to the best-case scenario
of the WKNN algorithm, the GTBPD-LSQP algorithm achieves improvements of
0.92, 1.28, and 1.038 m across the three experimental fields, highlighting its robust
performance.
11 An Indoor Wi-Fi Localization Algorithm Using BP Neural Network 293
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10 11
Real Distance (m)
(b) Second experimental
13
12
11
10
Predict Distance (m)
9
8
7
6
5
4
3
2
1
0
10
11
12
13
0
1
2
3
4
5
6
7
8
9
In Fig. 11.19, the cumulative distribution of positioning errors across three distinct
experimental fields is depicted for six algorithms. Notably, the WKNN algorithm
achieves its minimal RMSE value for k = 12 in the first field, k = 10 in the second
field, and k = 12 in the third field. Examining Fig. 11.19 reveals that when the
positioning error threshold is set at 3 m, the CDF of positioning errors for each
algorithm varies across the experimental fields. For the GTBPD-LSQP algorithm, the
CDF of positioning errors is approximately 85.5, 83.5, and 61.7% for the three fields
respectively. The GTBPD-LS algorithm demonstrates CDF values of about 72.7,
74.5, and 52.1%. The GA-ANN algorithm exhibits CDF values around 74.1, 68.5, and
50.0%. EWKNN shows CDF values of 63.6, 67.3, and 54.2%. WKNN demonstrates
CDF values of 52.7, 61.8, and 54.2%. Lastly, the Bayesian algorithm presents CDF
values of 52.7, 45.5, and 42.5%. The CDF analysis reveals that the performance of the
GTBPD-LSQP algorithm surpasses that of the other five algorithms. This outcome
aligns with the earlier assessment of RMSE, reinforcing the superior performance
of our proposed approach across diverse indoor environments.
Our GTBPD-LSQP algorithm demonstrates remarkable adaptability to various
indoor settings, consistently outperforming the four existing algorithms considered
in the evaluation. This robust performance underscores its effectiveness in accurately
localizing targets, even amidst challenging indoor conditions.
11.7 Conclusion
CDF (%)
GTBPD-LSQP
50
40
30
20
10
0
0 1 2 3 4 5
Positioning Error (m)
(a) First experimental
100 Bayesian
90 WKNN
EWKNN
80 GA-ANN
70 GTBPD-LS
GTBPD-LSQP
60
CDF (%)
50
40
30
20
10
0
0 1 2 3 4 5
Positioning Error (m)
(b) Second experimental
100
Bayesian
90 WKNN
80 EWKNN
GA-ANN
70
GTBPD-LS
60 GTBPD-LSQP
CDF (%)
50
40
30
20
10
0
0 1 2 3 4 5
Positioning Error (m)
(c) Third experimental
11 An Indoor Wi-Fi Localization Algorithm Using BP Neural Network 299
References
1. Lin Y, Yu K, Hao L, Wang J, Bu J (2022) An indoor Wi-Fi localization algorithm using ranging
model constructed with transformed RSSI and BP neural network. IEEE Trans Commun
70(3):2163–2177
2. Zhang H, Wang Z, Xia W, Ni Y, Zhao H (2022) Weighted adaptive KNN algorithm with
historical information fusion for fingerprint positioning. IEEE Wirel Commun Lett 11(5):1002–
1006
3. Xia S, Liu Y, Yuan G, Zhu M, Wang Z (2017) Indoor fingerprint positioning based on Wi-Fi:
an overview. ISPRS Int J Geo Inf 6(5):135–160
4. Shin B, Lee JH, Lee T, Kim HS (2012) Enhanced weighted K-nearest neighbor algorithm
for indoor Wi-Fi positioning systems. In: 2012 8th international conference on computing
technology and information management (NCM and ICNIT), pp 574–577
5. Binghao L (2006) Indoor positioning techniques based on wireless LAN. In: 1st IEEE
international conference on wireless broadband and ultra wideband communications, pp 1–6
6. Jawad HM, Jawad AM, Nordin R, Gharghan SK, Abdullah NF, Ismail M, Abu-AlShaeer MJ
(2019) Accurate empirical path-loss model based on particle swarm optimization for wireless
sensor networks in smart agriculture. IEEE Sens J 20(1):552–561
7. Zhou B, Wu Z, Chen Z, Liu X, Li Q (2023) Wi-Fi RTT/Encoder/INS-based robot indoor
localization using smartphones. IEEE Trans Veh Technol 72(5):6683–6694
8. Zhou H, Liu J (2022) An enhanced RSSI-based framework for localization of bluetooth devices.
In: 2022 ieee international conference on electro information technology (eIT), pp 296–304
9. Zhou T, Xu K, Shen Z, Xie W, Zhang D, Xu J (2022) AoA-based positioning for aerial intelligent
reflecting surface-aided wireless communications: an angle-domain approach. IEEE Wirel
Commun Lett 11(4):761–765
10. Huang C, Zhuang Y, Liu H, Li J, Wang W (2020) A performance evaluation framework for
direction finding using BLE AoA/AoD receivers. IEEE Internet Things J 8(5):3331–3345
11. Song X, Fan X, Xiang C, Ye Q, Liu L, Wang Z, Fang G (2019) A novel convolutional
neural network based indoor localization framework with WiFi fingerprinting. IEEE Access
7(1):110698–110709
12. Mehmood H, Tripathi NK (2013) Optimizing artificial neural network-based indoor positioning
system using genetic algorithm. Int J Digital Earth 6(2):158–184
13. Ma L, Sun Y, Zhou M, Xu Y (2010) WLAN indoor GA-ANN positioning algorithm via
regularity encoding optimization. In: 2010 international conference on communications and
intelligence information security, pp 261–265
14. Zhu W, Zeng Z, Yang Q, Zhao X, Zhang J (2021) Research on indoor positioning algorithm
based on BP neural network. J Phys Conf Ser 1–6
15. Zhang W, Liu K, Zhang W, Zhang Y, Gu J (2016) Deep neural networks for wireless localization
in indoor and outdoor environments. Neurocomputing 194(1):279–287
16. Roy P, Chowdhury C (2021) A survey of machine learning techniques for indoor localization
and navigation systems. J Intell Rob Syst 101(3):63–70
300 Y. Lin and K. Yu
17. Shang S, Wang L (2022) Overview of WiFi fingerprinting-based indoor positioning. IET
Commun 16(7):725–733
18. Feng X, Nguyen KA, Luo Z (2022) A survey of deep learning approaches for WiFi-based
indoor positioning. J Inf Telecommun 6(2):163–216
19. Hu W, Liang J, Jin Y, Wu F, Wang X, Chen E (2018) Online evaluation method for low frequency
oscillation stability in a power system based on improved XGboost. Energies 11(11):3238–3248
20. Shafi I, Ahmad J, Shah SI, Kashif FM (2006) Impact of varying neurons and hidden layers
in neural network architecture for a time frequency application. In: 2006 IEEE international
multitopic conference, pp 188–193
21. Shen H, Wang Z, Gao C, Qin J, Yao F, Xu W (2008) Determining the number of BP neural
network hidden layer units. J Tianjin Univ Technol 24(5):13–20
22. Cui X, Yang J, Li J, Wu C (2020) Improved genetic algorithm to optimize the Wi-Fi indoor
positioning based on artificial neural network. IEEE Access 8(1):74914–74921
23. Wang J, Wen Y, Gou Y, Ye Z, Chen H (2017) Fractional-order gradient descent learning of BP
neural networks with Caputo derivative. Neural Netw 89(1):19–30
24. Wu C, Wang H (2016) BP neural network optimized by improved adaptive genetic algorithm.
Electron Des Eng 24(24):29–33
25. Garg H (2016) A hybrid PSO-GA algorithm for constrained optimization problems. Appl Math
Comput 274(1):292–305
26. Lipowski A, Lipowska D (2012) Roulette-wheel selection via stochastic acceptance. Physica
A 391(6):2193–2196
27. Yu K, Guo YJ (2008) Improved positioning algorithms for nonline-of-sight environments.
IEEE Trans Veh Technol 57(4):2342–2353
28. Yu K, Sharp I, Guo YJ (2009) Ground-based wireless positioning. Wiley
Chapter 12
Intelligent Indoor Positioning Based on
Wireless Signals
12.1 Introduction
Nowadays, indoor positioning technology has penetrated into all aspects of people’s
lives and has a wide range of applications in medical care, logistics, manufacturing,
retail, emergency rescue and other fields. However, due to the complexity and diver-
sity of the indoor environment, there are still some challenges for indoor positioning,
mainly including low positioning accuracy and low calibration efficiency. Therefore,
researchers are constantly developing emerging indoor positioning technologies to
solve the problems [1, 2].
Intelligent indoor positioning is the process of navigation and localization of users
or objects in the indoor environment. Which is mainly divided into a number of stages
as shown in Fig. 12.1. First, users need to collect data through smart devices, and the
Y. Han · Z. Li (B)
The College of Communication Engineering, Jilin University, Changchun, China
e-mail: [email protected]
Y. Han
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 301
K. Yu (ed.), Positioning and Navigation Using Machine Learning Methods, Navigation:
Science and Technology 14, https://doi.org/10.1007/978-981-97-6199-9_12
302 Y. Han and Z. Li
collected data may include wireless signal data [3] such as Wireless Fidelity (WiFi),
Bluetooth Low Energy (BLE), Ultra-WideBand (UWB), and built-in sensor data [4]
such as acceleration, heading angle, and geomagnetic data. Then, according to the
requirements, the server needs to select the required data set for preprocessing and
apply the appropriate positioning algorithm to learn and train the data. Finally, the
system applies the trained model to realize the position estimation and navigation of
the user.
Data collection can be carried out mainly in two different ways: professional
surveyor-based and crowdsourcing-based methods. In the first method, professional
surveyors need to collect the required data at predefined reference points to get
labeled data with location information. Crowdsourcing-based methods distribute the
workload of data acquisition to massive ordinary mobile device users. This can not
only save the measurement cost but also adaptively update the database according
to environmental changes. Crowdsourcing data collection relies on the feedback of
people but people may intentionally or unintentionally provide the wrong location
labels into the database or refuse to provide location information due to privacy
security issues, so one of the biggest challenges of this method is the labeling problem.
Indoor positioning algorithms can be mainly divided into sensor-based and wire-
less signal-based positioning algorithms. The sensor-based positioning method is
mainly based on the data of the built-in sensors of smart devices to estimate the
trajectory through Simultaneous Localization And Mapping (SLAM) [22] or Pedes-
trian Dead Reckoning (PDR) [23] algorithms. The wireless signal-based positioning
algorithms can be divided into ranging-based and fingerprinting-based methods. The
ranging-based methods mainly use Time of Arrival (ToA) [5], Time Difference of
Arrival (TDoA) [6] and Angle of Arrival [7] to realize location estimation by trilateral
positioning algorithm, which need to know the location of at least three base stations,
and all of them have high requirements for time synchronization. The fingerprinting-
based methods are mainly based on signal strength. Due to the multipath effect
12 Intelligent Indoor Positioning Based on Wireless Signals 303
Based on the above research, researchers have provided a variety of technical solu-
tions for indoor positioning, but due to the complexity of indoor positioning tasks,
there are still many challenges to be solved. We discuss the major challenges of
indoor positioning as follows.
304 Y. Han and Z. Li
Due to the complexity of indoor scenes, there are many factors affecting signal trans-
mission and sensor measurement. Wireless signals like WiFi, BLE, and RFID are
significantly susceptible to multipath fading, obstacles, human motion, and interfer-
ing devices, which will accumulate noise in free-space propagation, resulting in a
deviation between the received signal and the transmitted signal. In addition, there
may be cumulative errors in the sensors built into smart devices. These factors may
have a great impact on the positioning accuracy.
In most of the existing work, experiments are carried out with specific mobile
devices within a specific and small experimental area. In this case, the algorithm can
often achieve good positioning performance. However, in some complex large build-
ings, the algorithm may be subject to technical limitations and signal interference,
which makes the positioning accuracy often difficult to reach the expectation, and
the stability and robustness are poor.
Calibration Efforts
In the actual positioning scene, the collection and labeling of fingerprint data are very
time-consuming and laborious, so it is often very difficult to construct a fine-grained
fingerprint database. At the same time, once the indoor environment changes and
the fingerprint data set is not updated immediately, there will be differences between
the radio map and the testing data, which will significantly affect the performance
of positioning algorithms.
In the process of crowdsourcing data collection, users may not provide the location
label of the data due to privacy and security issues, so the scarcity of labeled data
sets often occurs in fingerprint positioning tasks.
The fingerprint positioning method is easy to implement and has low cost, so it
has gradually become the mainstream trend of indoor positioning technology. The
fingerprint positioning method is composed of the offline phase and the online phase,
and its process is shown in Fig. 12.2. In the offline phase, the surveyor needs to collect
the signal strength from different Access Points (APs) at each Reference Point (RP) to
represent the fingerprint information of the RP, and the fingerprint information of all
RPs constitutes the fingerprint database of the region. In the online phase, according
to the fingerprint information provided by the user, the appropriate pattern-matching
algorithm is selected to match with the fingerprint database information to estimate
the target user’s position. In this section, we mainly introduce and summarize the
fingerprint positioning technology based on the framework of machine learning.
12 Intelligent Indoor Positioning Based on Wireless Signals 305
According to the type of data provided, we can use three types of training methods
to build the location model: supervised learning, unsupervised learning and semi-
supervised learning.
Supervised learning requires a fingerprint database with rich labels, which requires
intensive data collection efforts. If professional surveyor-based methods are used
to collect data, supervised learning techniques can be used to build a localization
model.
∏
k
. Pr (o|li ) = Pr (RSS j |A P j , li ), (12.1)
j=1
where . Pr (RSS j |A P j , li ) is the probability that . A P j has the signal strength mea-
surement . RSS j at location .li . In the online phase, a posterior distribution over all the
locations is computed using the Bayes rule:
where .o∗ is a new observation obtained. . Pr (li |o∗ ) encodes prior knowledge about
where a user may be. In [28], the authors first use Bayesian estimation to calculate
the conditional a posteriori probability, and then use the maximum a posteriori prob-
ability criterion to obtain the target location. Researchers in [27] have proposed a
Bayesian-based particle filter method for indoor localization.
The deterministic positioning algorithm mostly uses similarity criteria such as
minimum Euclidean distance criterion and cosine similarity to match the user’s
fingerprint information with the fingerprint database, The user’s location is then
determined by the locations of the RPs corresponding to the fingerprint that best
matches the target user’s fingerprint, and the overall process is shown in Fig. 12.3.
The KNN [11] adopts the Euclidean distance to determine the k nearest fingerprints
of an unknown fingerprint. The SVM algorithm realizes the classification of refer-
ence points by finding a set of hyperplanes in the fingerprint data set of N reference
points, and the authors in [12] use SVM to determine the area of the object. The RF
constructs an ensemble learning model by constructing a large number of decision
trees. Each decision tree predicts a class, and the most common class is considered
as the final prediction result. In [25], the author designs an enhanced fingerprint
pattern-matching algorithm using an RF regression model.
Compared with traditional machine learning methods, deep learning algorithms have
more efficient data analysis and processing capabilities, and once the models are
trained with enough labeled data, the learned features can be used to make predictions
on unknown data.
Typical deep learning algorithms such as Convolutional Neural Networks (CNN)
are generally composed of multiple convolutional layers and fully connected layers,
which are usually used to process grid-structured data. As shown in Fig. 12.4, a set
of RSS fingerprint sequences are first input into the neural network for convolution
operations to extract fingerprint feature information, and then they are input into
the fully connected layer for flattening, and finally the position coordinates .(X, Y )
are obtained [14]. Wireless signals vary not only with the distance from the target
but also with the change of time, so the Recurrent Neural Network (RNN) can be
introduced to process time series data, and Fig. 12.5 shows the localization process
based on the RNN model. The input to the RNN is an RSS vector .rt and the hidden
state vector .h t−1 from the previous time unit, and after RNN training, it outputs a
new hidden state vector .h t that contains information from the previous and current
input data. The hidden state feature vector .h t is then fed into a multiple regression
function to obtain the current position estimation . pt [29]. However, a single CNN or
RNN model may have the problem of gradient disappearance and gradient explosion
during the training process. Therefore, researchers in [31] propose a spatial-temporal
positioning algorithm combining Residual Network (ResNet) and LSTM, in which
the ResNet extracts the spatial state information of the signal and the LSTM extracts
the time state information of the signal. By combining the two improved deep learning
networks, positioning accuracy is greatly improved.
Graph Neural Network (GNN) is a model that uses graph-structured data as input
to make predictions or solve classification tasks. We can represent the signal strength
and position information of beacons as a graph structure, where the positions repre-
sent the nodes in the graph and the links between them represent the edges, and then
transform the fingerprint data into new features and input them into the GNN for posi-
tioning, as shown in Fig. 12.6. In [32], the authors use a graph regression approach
to predict the location coordinates, which is compared with several existing GNN
models. The authors of [34] propose a scheme to convert fingerprints into graphs
by geometric methods and then use a Graph Sample and Aggregate (GraphSAGE)
estimator for localization, which allows multiple wireless signals to be used as fin-
gerprint features at the same time. Fingerprint-based indoor positioning methods
usually have the problem of low feature discrimination of discrete signal finger-
prints or high time cost of continuous signal fingerprint acquisition. To solve this
problem, the authors in [33] propose a collaborative indoor positioning framework
based on Graph Attention (GAT). The framework first constructs an adaptive graph
representation using multiple discrete signal fingerprints collected by several users
as inputs to effectively model the relationship between collaborative fingerprints,
and then using GAT as the basic unit, a deep network with residual structure and
hierarchical attention mechanism is designed to extract and aggregate features from
the constructed graphs for collaborative localization.
In recent years, Federated Learning (FL) has been widely used in the field of
deep learning, which is a way to train models without exchanging raw data. FL
models can be trained locally on the respective data, and then the model parameters
12 Intelligent Indoor Positioning Based on Wireless Signals 309
are uploaded to the server for aggregation, which helps to solve the privacy and
security issues of indoor positioning [37]. Figure 12.7 illustrates a basic model of
FL in indoor localization, First, each client uses the private fingerprint database to
train the local positioning model and uploads the model to the central server, then the
central server aggregates the uploaded local model to generate a new global model
and sends the new model parameters to the client to update the local model, and
iteratively optimizes to generate the final positioning model [38, 39]. The authors in
[40] propose Monte Carlo (MC) dropout to reduce the communication overhead and
improve the computational efficiency of FL in localization.
k-means clustering combined with a logical floor plan mapping method. Researchers
in [20] propose an unsupervised learning indoor localization model based on the
memetic algorithm, which first uses a global search algorithm to obtain an initial
model, and then uses a local optimization algorithm based on k-means to estimate
the user location. The method builds an accurate indoor localization model using only
unlabeled fingerprints by integrating global search and local optimization algorithms
into MA, avoiding the necessity of position labels. In [21], researchers propose
a method to automatically construct and optimize the unlabeled fingerprint data
acquired by random walks to construct radio maps based on unsupervised learning,
which does not require a complex site survey.
Traditional supervised learning requires a large amount of labeled data for training,
but it is very difficult to obtain labeled data samples covering all indoor positioning
scenes. Unlabeled crowdsourced data is easy to obtain and has a wider coverage,
while unsupervised learning can extract unlabeled data features. Therefore, it is a
good choice to mix labeled and unlabeled data for localization in a semi-supervised
learning way, in this way, we can acquire features of large amounts of unlabeled data
through unsupervised training, and then make supervised fine-tuning with a small
amount of labeled data.
AutoEncoder (AE) [18] is a kind of neural network using an unsupervised learn-
ing method, which is composed of two main components: encoder and decoder. The
encoder converts the input data into low-dimensional vectors, and then the decoder
maps the low-dimensional vectors to the original input space for data reconstruction.
The AE is often used for tasks such as dimensionality reduction, anomaly detection,
and generative modeling, where the goal is to learn a compressed representation of
input data while preserving its salient features. Therefore, the AE can be used as a
feature extractor of the semi-supervised model to extract the features of unlabeled
data . X u , so as to obtain its feature representation .r , and then fine-tune the network
model with a small amount of labeled data . X L in downstream tasks to obtain the
localization target .Y . The process of the AE-based semi-supervised positioning sys-
tem is shown in Fig. 12.8.
There are many types of AE, such as Stacking AutoEncoders (SAE) [35],
Denoising AutoEncoder (DAE) and Variational AutoEncoder (VAE) [36], etc., and
researchers have also proposed many different AE-based semi-supervised indoor
positioning solutions to deal with various problems of indoor positioning tasks.
In [43], the authors utilize the SAE to extract high-level features from unlabeled
crowdsourced data to improve localization performance during classification. RSS-
based indoor positioning technology is particularly vulnerable to security threats.
The authors in [42] propose a semi-supervised deep learning security enhancement
framework combining DAE and CNN, aiming to achieve security enhancement with-
out affecting the effectiveness and efficiency of indoor positioning. The model first
12 Intelligent Indoor Positioning Based on Wireless Signals 311
uses DAE components to denoise the abrupt RSS values to reduce the impact of
malicious attacks and then uses CNN to perform fingerprint matching. Researchers
in [44] propose a semi-supervised learning model based on the Variational AutoEn-
coder (VAE). During unsupervised learning, the VAE is used as the feature extractor
to learn the latent distribution of the original input, and then the labeled data is used
to train the predictor.
In order to alleviate the scarcity of labeled data, researchers in [41] propose a
centralized indoor localization method based on pseudo-labels. This scheme uses
unlabeled fingerprint data collected by mobile crowdsourcing to generate pseudo-
labels and combines it with a small amount of labeled data to reduce the burden of
labeling fingerprints. The AP locations in the environment may change over time,
causing deviations in the fingerprint database and affecting the location performance.
In [16], the authors propose a crowdsourced indoor location method based on ensem-
ble learning, which can automatically identify altered APs in the crowdsourced data
and update the database.
312 Y. Han and Z. Li
The inertial sensor-based positioning method can produce cumulative errors or trajec-
tory drift over time, and wireless signals-based positioning is prone to environmental
interference. By fusing different positioning trajectory information, the advantages
can be complementary.
The authors in [48] use an enhanced particle filter to fuse PDR, GPS and WiFi fin-
gerprints, and propose a three-step tracking and matching algorithm to obtain crowd-
sourced radio maps, achieving high-precision outdoor-indoor seamless localization.
In order to solve the problem of trajectory fragmentation caused by crowdsourcing
data, the authors in [49] propose a robust iterative trace merging algorithm based
on WiFi access points as signal markers to merge a large number of trajectories,
and further improve the accuracy of crowdsourcing indoor positioning by removing
trajectory outliers through the enhanced matching algorithm. In [50], the researchers
propose a graph-based SLAM framework that adaptatively establishes WiFi-based
edges and GPS-based edges as prior information for corresponding positions in IMU
trajectory estimation to achieve trajectory fusion and generate crowdsourced radio
maps. After fusion, the generated crowdsourced radio map can be used for other
users to conduct WiFi-based fingerprint positioning, reducing the cost of building a
fingerprint database.
In this section, we provide examples from traditional machine learning, deep learning
and crowdsourcing for indoor positioning, which are conducted by our laboratory.
signals (ZigBee) based on software defined radio techniques to extract RSS and
timestamps. We deploy the system in two indoor scenarios in the INF building at the
University of Bern, in which the distribution maps of reference points and APs as
shown in Fig. 12.10a and Fig. 12.10c respectively. We compare the performance of
the Fusion KNN-RF, Fusion WKNN, RSS WKNN, and DTDOA WKNN algorithms
in the four experiments shown in Table 12.1. KNN-RF is the proposed algorithm
integrating the KNN and RF model, and Fusion indicates the fused RSS and DTDOA,
in which DTDOA is the differential time difference of arrival information. We refer
to our previous work [25] for more details.
12 Intelligent Indoor Positioning Based on Wireless Signals 315
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
CDF
CDF
0.5 0.5
0.4 0.4
0.3 DTDOA WKNN (K=9) 0.3 DTDOA WKNN (K=9)
RSS WKNN (K=9) RSS WKNN (K=9)
0.2 0.2 Fusion WKNN (K=9)
Fusion WKNN (K=9)
0.1 Fusion KNN-RF (K=9) 0.1 Fusion KNN-RF (K=9)
0 0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8
Positioning Errors in Meters Positioning Errors in Meter
(a) M1 (b) M2
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
CDF
CDF
0.5 0.5
0.4 0.4
0.3 DTDOA WKNN (K=9) 0.3 DTDOA WKNN (K=9)
RSS WKNN (K=9) RSS WKNN (K=9)
0.2 0.2 Fusion WKNN (K=9)
Fusion WKNN (K=9) 0.1
0.1 Fusion KNN-RF (K=9) Fusion KNN-RF (K=9)
0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6 7
Positioning Errors in Meters Positioning Errors in Meters
(c) M3 (d) M4
CDFs of positioning accuracy are shown in Fig. 12.11a–d and the mean positioning
errors are summarized in Table 12.2. KNN-RF outperforms traditional KNN. As
shown in Table 12.2, Fusion KNN-RF achieves an accuracy of .1.61 m outperforming
Fusion WKNN with an improvement of .19.1%. With the same input information, the
performance of Fusion KNN-RF is significantly better than Fusion WKNN thanks
to the RF for regression. the proposed Fusion KNN-RF significantly outperforms
RSS WKNN by fusing the DTDOA-RSS features and adopting KNN-RF for pattern
316 Y. Han and Z. Li
matching. As shown in Table 12.2, Fusion KNN-RF with a mean accuracy of .1.61 m
outperforms the RSS-based fingerprinting by.36.1%. The results in Fig. 12.11a–d also
indicate that Fusion KNN-RF significantly outperforms the traditional RSS-based
fingerprinting.
According to the experimental results, it can be seen that the deep learning local-
ization algorithm is significantly better than the traditional machine learning local-
ization algorithm represented by KNN. This is because the KNN algorithm can only
use the original features of the data, while the deep learning algorithm can process
and analyze fingerprint data more efficiently through the neural network. Because
CNN can effectively capture the spatial characteristics of the input data through con-
volution, the positioning accuracy of CNN is improved by .14% compared with MLP.
At the same time, it can be observed that on the basis of CNN, ResNet reduces the
positioning error by .0.41 m. This is because the input information of the network
structure of CNN is propagated forward through each layer, which may produce the
problem of vanishing gradients. However, by adding the residual connection, ResNet
enables the information to be transmitted directly. Therefore, the model can train a
deeper network and improve positioning accuracy.
Indoor positioning based on crowdsourcing data labels RSS values with the loca-
tions on the merged crowdsourcing traces to generate a radio map. Compared with the
traditional radio maps created by site surveying, crowdsourcing radio maps are nor-
318 Y. Han and Z. Li
mally noisy. Hence, it is still challenging to locate users with high accuracy based on
crowdsourcing indoor positioning. Moreover, the accuracy of positioning is further
limited due to the sparse coverage of the crowdsourcing radio map.
In our previous work of [49], we have designed a crowdsourcing indoor position-
ing system namely WiFi-RITA. In WiFi-RITA, we merge massive noisy user traces
to recover indoor walking paths, in which massive noisy user traces are large quan-
tities of short traces with uncertain rotation errors. In WiFi-RITA, traces are merged
by iteratively translating and rotating relying on the ubiquitous signal-marks of WiFi
access points. WiFi-RITA positioning further improves positioning accuracy based
on noisy crowdsourcing radio maps with limited coverage, in which we design a
multivariate Gaussian model to generate a grid-based radio map and an enhanced
matching algorithm to improve our fingerprinting accuracy. Then, we fuse PDR and
the fingerprinting based on a particle filter to further improve performance, especially
in the uncovered areas of the radio map.
To evaluate WiFi-RITA positioning, we conduct a set of comprehensive experi-
ments in Fig. 12.14 (.6656 m2 ) and Fig. 12.15 (.8372 m2 ), in which the experiments
are conducted for trace merging and user positioning. For trace merging, 10 users
randomly walk along the predefined paths in two scenarios as shown in Figs. 12.14a
and 12.15a. We use Huawei Mate8, Mate9 and P10 for data collection with differ-
ent phone placements, i.e., in coat pockets, in bags, in trouser pockets, and freely
holding in hands. The data collection duration is about 10 h with around 1 h for each
user. For locating users, the users walk on five predefined paths as the green lines in
Fig. 12.14b to f and Figure 12.15b to f.
Figure 12.16 shows the accuracy of the merged traces, in which the median accu-
racy achieves .1.3 m for Scenario 1 and .1.0 m for Scenario 2.
1
0.9
0.8
0.7
0.6
CDF
0.5
0.4
0.3 Scenario 1
0.2 Scenario 2
0.1
0
0 1 2 3 4 5
Grid Accuracy in Meter
1 1 1
0.9 0.9 0.9
0.8 0.8 0.8
0.7 0.7 0.7
0.6 0.6 0.6
0.5 0.5 0.5
0.4 0.4 0.4
0.3 PDR+WiFi 0.3 PDR+WiFi 0.3 PDR+WiFi
0.2 WiFi (MVG-OR) 0.2 WiFi (MVG-OR) 0.2 WiFi (MVG-OR)
PDR PDR PDR
0.1 0.1 0.1
0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
12.6 Conclusion
References
12. Abbas HA, Boskany NW, Ghafoor KZ, et al(2021) Wi-Fi based accurate indoor localization
system using SVM and LSTM algorithms. In: 2021 IEEE 22nd international conference on
information reuse and integration for data science (IRI), pp 416–422
13. Seok KY, Lee J H (2018) Deep learning based fingerprinting scheme for wireless positioning. In:
International conference on artificial intelligence in information and communication (ICAIIC),
pp 312–314. https://doi.org/10.1109/ICAIIC48513.2020.9065054
14. Ibrahim M, Torki M, ElNainay M (2018) CNN based indoor localization using RSS time-
series. In: 2018 IEEE symposium on computers and communications (ISCC), pp 01044–01049.
https://doi.org/10.1109/ISCC.2018.8538530
15. Wang B, Chen Q, Yang LT et al (2016) Indoor smartphone localization via fingerprint crowd-
sourcing: challenges and approaches. IEEE Wirel Commun 23(3):82–89. https://doi.org/10.
1109/MWC.2016.7498078
16. Yang J, Zhao X, Li Z (2019) Crowdsourcing indoor positioning by light-weight automatic
fingerprint updating via ensemble learning. IEEE Access 7:26255–26267. https://doi.org/10.
1109/ACCESS.2019.2901736
17. Ouyang RW, Wong AKS, Lea CT et al (2013) Indoor location estimation with reduced cal-
ibration exploiting unlabeled data via hybrid generative/discriminative learning. IEEE Trans
Mobil Comput 11(11):1613–1626. https://doi.org/10.1109/TMC.2011.193
18. Fontaine J, Ridolfi M, Van Herbruggen B et al (2020) Edge inference for UWB ranging
error correction using autoencoders. IEEE Access 8:139143–139155. https://doi.org/10.1109/
ACCESS.2020.3012822
19. Wu C, Yang Z, Liu Y, Xi W (2013) WILL: wireless indoor localization without site survey.
IEEE Trans Parallel Distrib Syst 24(4):839–848. https://doi.org/10.1109/TPDS.2012.179
20. Jung S, Moon B, Han D (2015) Unsupervised learning for crowdsourced indoor localization
in wireless networks. IEEE Trans Mobil Comput 15(11):2892–2906
21. Trogh J, Joseph W, Martens L, Plets D (2019) An unsupervised learning technique to optimize
radio maps for indoor localization. Sensors 19(4):752. https://doi.org/10.3390/s19040752
22. Dong Y, Yan D, Li T et al (2022) Pedestrian gait information aided visual inertial SLAM for
indoor positioning using handheld smartphones. IEEE Sens J 22(20):19845–19857. https://
doi.org/10.1109/JSEN.2022.3203319
23. Wu C, Yang Z, Liu Y (2014) Smartphones based crowdsourcing for indoor localization. IEEE
Trans Mobil Comput 14(2):444–457. https://doi.org/10.1109/TMC.2014.2320254
24. Calderoni L, Ferrara M, Franco A et al (2015) Indoor localization in a hospital environment
using random forest classifiers. Expert Syst Appl 42(1):125–134
25. Li Z, Braun T, Zhao X et al (2018) A narrow-band indoor positioning system by fusing time
and received signal strength via ensemble learning. IEEE Access 6:9936–9950. https://doi.org/
10.1109/ACCESS.2018.2794337
26. Bozkurt S, Elibol G, Gunal S, Yayan U (2015) A comparative study on machine learning
algorithms for indoor positioning. In: International symposium on innovations in intelligent
systems and applications (INISTA)
27. Seshadri V, Zaruba GV, Huber M (2005) A bayesian sampling approach to in-door localization
of wireless devices using received signal strength indication. In: Third IEEE international
conference on pervasive computing and communications, pp 75–84
28. Chai X, Yang Q (2007) Reducing the calibration effort for probabilistic indoor location esti-
mation. IEEE Trans Mobil Comput 6:649–662. https://doi.org/10.1109/TMC.2007.1025
29. Khassanov Y, Nurpeiissov M, Sarkytbayev A et al (2021) Finer-level sequential wifi-based
indoor localization. IEEE/SICE Int Symp Syst Int (SII) 2021:163–169
30. Poulose A, Han DS (2021) Feature-based deep LSTM network for indoor localization using
UWB measurements. In: International conference on artificial intelligence in information and
communication (ICAIIC), pp 298–301. https://doi.org/10.1109/ICAIIC51459.2021.9415277
31. Wang R, Luo H, Wang Q et al (2020) A spatial-temporal positioning algorithm using residual
network and LSTM. IEEE Trans Instrum Meas 69(11):9251–9261. https://doi.org/10.1109/
TIM.2020.2998645
12 Intelligent Indoor Positioning Based on Wireless Signals 323
32. Fu Y, Xiong X, Liu Z, Chen X, Liu Y, Fu Z (2022) A GNN-based indoor localization method
using mobile RFID platform. In: International conference on smart and sustainable technologies
(SpliTech), pp 1–6. https://doi.org/10.23919/SpliTech55088.2022.9854370
33. He T, Niu Q, Liu N (2023) GC-LOC: a graph attention based framework for collaborative
indoor localization using infrastructure-free signals. Proc ACM Interact Mobil Wearable and
Ubiquitous Technol 6(4):1–27
34. Luo X, Meratnia N (2022) A geometric deep learning framework for accurate indoor local-
ization. In: IEEE 12th international conference on indoor positioning and indoor navigation
(IPIN), pp 1–8. https://doi.org/10.1109/IPIN54987.2022.9918118
35. Song X (2019) A novel convolutional neural network based indoor localization framework
With WiFi fingerprinting. IEEE Access 7:110698–110709. https://doi.org/10.1109/ACCESS.
2019.2933921
36. Chidlovskii B, Antsfeld L (2019) Semi-supervised variational autoencoder for WiFi indoor
localization. In: International conference on indoor positioning and indoor navigation (IPIN),
pp 1–8. https://doi.org/10.1109/IPIN.2019.8911825
37. Nagia N, Rahman MT, Valaee S (2022) Federated learning for WiFi fingerprinting. In:
IEEE international conference on communications, pp 4968–4973. https://doi.org/10.1109/
ICC45855.2022.9838945
38. Liu Y, Li H, Xiao J, Jin H (2019) FLoc: fingerprint-based indoor localization system under a
federated learning updating framework. In: International conference on mobile Ad-Hoc and
sensor networks (MSN), pp 113–118. https://doi.org/10.1109/MSN48538.2019.00033
39. Wu P, Imbiriba T, Park J, Kim S, Closas P (2021) Personalized federated learning over non-
IID data for indoor localization. In: International workshop on signal processing advances in
wireless communications (SPAWC), pp 421–425
40. Park J et al (2022) Federated learning for indoor localization via model reliability with dropout.
IEEE Commun Lett 26(7):1553–1557. https://doi.org/10.1109/LCOMM.2022.3170878
41. Li W, Zhang C, Tanaka Y (2020) Pseudo label-driven federated learning-based decentralized
indoor localization via mobile crowdsourcing. IEEE Sens J 20(19):11556–11565. https://doi.
org/10.1109/JSEN.2020.2998116
42. Ye Q, Fan X, Bie H, Puthal D, Wu T, Song X, Fang G (2023) SE-LOC: security-enhanced indoor
localization with semi-supervised deep learning. IEEE Trans Netw Sci Eng 10(5):2964–2977.
https://doi.org/10.1109/TNSE.2022.3174674
43. Khatab ZE, Gazestani AH, Ghorashi SA et al (2021) A fingerprint technique for indoor localiza-
tion using autoencoder based semi-supervised deep extreme learning machine. Signal Process
181:107915
44. Qian W, Lauri F, Gechter F (2021) Supervised and semi-supervised deep probabilistic models
for indoor positioning problems. Neurocomputing 435:228–238
45. Luo J, Zhang C, Wang C (2020) Indoor multi-floor 3D target tracking based on the multi-sensor
fusion. IEEE Access 8:36836–36846. https://doi.org/10.1109/ACCESS.2020.2972962
46. Kong X, Wu C, You Y, Yuan Y (2023) Hybrid indoor positioning method of BLE and PDR
based on adaptive feedback EKF with low BLE deployment density. IEEE Trans Instrum Meas
72:1–12. https://doi.org/10.1109/TIM.2022.3227957
47. Yu Y, Chen R, Chen L et al (2021) H-WPS: hybrid wireless positioning system using
an enhanced wi-fi FTM/RSSI/MEMS sensors integration approach. IEEE Int Things J
9(14):11827–11842
48. Li Z, Zhao XHu, F, Zhao Z, Villacrés JLC, Braun T, (2019) SoiCP: a seamless outdoor-indoor
crowdsensing positioning system. IEEE Int Things J 6(5):8626–8644
49. Li Z, Zhao X, Zhao Z, Braun T (2021) WiFi-RITA positioning: enhanced crowdsourcing
positioning based on massive noisy user traces. IEEE Trans Wirel Commun 20(6):3785–3799.
https://doi.org/10.1109/TWC.2021.3053582
50. Gu Y, Zhou C, Wieser A, Zhou Z (2018) Trajectory estimation and crowdsourced radio map
establishment from foot-mounted IMUS, wi-fi fingerprints, and GPS positions. IEEE Sens J
19(3):1104–1113. https://doi.org/10.1109/JSEN.2018.2877804
Chapter 13
High Precision Positioning Algorithms
Based on Improved Sparse Bayesian
Learning in MmWave MIMO Systems
13.1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 325
K. Yu (ed.), Positioning and Navigation Using Machine Learning Methods, Navigation:
Science and Technology 14, https://doi.org/10.1007/978-981-97-6199-9_13
326 J. Fan et al.
target node by measuring various parameters and then determines the target node’s
position through geometric relationships. Non-ranging positioning, on the other hand,
utilizes various relationships between known location nodes and target nodes, such
as node coverage, signal transmission characteristics, wireless fingerprints, and other
related features to ascertain the position of unknown nodes. Research has demon-
strated that range-based positioning generally achieves higher accuracy, particularly
when based on propagation delay measurements. In range-based positioning, the
target device observes signals from one or more reference transmitters, then param-
eter estimation methods are used to determine position-related parameters such as
distance and angle, and finally, the device’s position is calculated based on these
estimated parameters.
In communication and positioning systems, millimeter-wave (mmWave) tech-
nology offers significant advantages due to its inherent characteristics. First, the
high frequency of mmWave leads to significant losses when encountering obstacles,
resulting in limited scattering and making line-of-sight (LOS) propagation domi-
nant, thereby creating sparse channels. Second, the short wavelength of mmWave
allows for the integration of a large number of antennas into a small space, providing
high angular resolution. Additionally, mmWave’s larger available bandwidth offers
higher delay resolution [1]. Consequently, mmWave can be employed for high-
precision positioning. Accurate estimation of positioning parameters is crucial for
precise target location determination. Typically, positioning parameters include the
angle of arrival (AOA) and time delay. Joint estimation of AOA and delay enables
a single receiver to determine the target position, reducing system overhead and
enhancing efficiency. Traditional subspace methods, such as multi-signal classifica-
tion (MUSIC) [2, 3] and estimation of signal parameters via rotational invariant
techniques (ESPRIT) algorithm [4], are widely used for joint AOA and delay
estimation.
In recent years, compressed sensing (CS) techniques using sparse representation
have emerged as a novel approach to address parameter estimation problems [5, 6].
Sparse Bayesian learning (SBL) is a recent parameter estimation method based on
the CS concept. The traditional SBL method uses a fixed grid search to estimate
parameters for signal reconstruction, but this approach often results in a mismatch
between parameters and the grid, leading to larger estimation errors. References [7,
8] propose the off-grid SBL (OGSBL) method to address grid mismatch, replacing
grid points with off-grid interval values to reduce estimation errors of the traditional
SBL method. However, the OGSBL method requires first-order Taylor expansion
to approximate off-grid interval values, leading to greater approximation errors and
higher algorithm complexity. Additionally, some works improve the SBL method
for different scenarios.
To further enhance SBL positioning methods in mmWave systems, we propose
a new two-dimensional adaptive grid refinement method and a joint AOA and time
delay estimation method based on an improved SBL algorithm. The contributions
of the proposed method are summarized as follows: (1) To mitigate the performance
degradation of traditional subspace methods under low SNR and a low number of
snapshots, we formulate AOA and time delay estimation as an SBL problem. (2) To
13 High Precision Positioning Algorithms Based on Improved Sparse … 327
where L denotes the number of paths, ∗ represents convolution, x(t) represents the
transmitted signal, βl is the equivalent channel gain of the l th path, and wm (t) is
noise.
After Fourier transform, the frequency domain signal received at the m th antenna
can be expressed as
( L )
∑
−j2π mϕl −j2πf τl
ym (f ) = βl e e · x(f ) + wm (f ). (13.2)
l=1
[ ]T
where a(ϕl ) = 1, e−j2πϕl , . . . , e−j2π(M −1)ϕl ∈ CM ×1 represents the antenna array
response corresponding to the angle θl .
We assume that orthogonal frequency division multiplexing (OFDM) modulation
with N subcarriers is used. Define Δf as the subcarrier interval, and then the nth
subcarrier operates at nΔfHz. Therefore, the received signal on each subcarrier can
be expressed as
∑
L
y= βl q(ϕl , τl ) + w, (13.5)
l=1
13 High Precision Positioning Algorithms Based on Improved Sparse … 329
{ }
where y ∈ CMN ×1 , q(ϕl , τl ) = vec a(ϕl )bT (τl ) ∈ CMN ×1 , vec{·} refers to
vectorization, [ that is, transforming a matrix into a one-dimensional column vector,
b(τl ) = 1 · x(0), e−j2πΔf τl · x(Δf ), · · · , e−j2π(N −1)Δf τl ·x((N −1)Δf )]T ∈ CN ×1
represents the frequency-domain steering vector pointing to the delay τl , w =
[w1 , w2 , . . . , wMN ]T ∈ CMN ×1 represents the additive zero-mean complex Gaus-
sian noise with covariance σ 2 I ∈ CMN ×1 . Our goal is first to estimate ϕl , τl and βl
from y, and then the user’s position p is estimated by the obtained ϕl and τl .
As one of two important subspace algorithms, the MUSIC algorithm was proposed
by Schmidt in [1]. The algorithm can perform high-resolution AOA estimation.
By exploiting the orthogonality between the signal subspace and noise subspace,
the spatial spectrum function is constructed and searched for peak value where the
direction is regarded as the estimated AOA.
The array response or steering vector corresponding to the input signal forms a
signal subspace, and to eliminate the noise vector, the signal subspace should be
orthogonal to the noise subspace. The orthogonality satisfies α(θ )H Qn = 0, where
α(θ ) is the steering vector, Qn is the noise projection vector. Therefore, the pseudo
spectrum provided by MUSIC is given by the following formula,
1
PMUSIC = (13.6)
α H (θ )Qn QnH α(θ )
These algorithms also require a large number of snapshots to accurately capture the
signal or noise subspace. Therefore, their performance degrades significantly when
the number of snapshots is small or the SNR is low.
In iterative algorithm, the constraint ensures that the estimate ŝ is K-sparse. The
constraint restricts ŝ to be consistent with the measurement matrix y to solve the
optimization problem [5]:
Δ
Each iteration consists of two operations, i.e., hard thresholding and gradient
descent. Here, ŝ(i) is at the ith iteration and T (.) is the thresholding operator. Several
algorithms belonging to this category are the Iterative Hard Thresholding (IHT) [9]
and the Fast Iterative Shrinkage Thresholding Algorithm (FISTA) and Basis Pursuit
Denoising (BPDN). the performance of BPDN and IHT increases with the increase
of the number of antenna elements.
The Bayesian frame of reference is utilized for statistical sparse recovery which
approaches the signal vector in a probabilistic manner. In the Maximum-a-Posteriori
(MAP) procedure, the estimate of s is given as
where f (s) is the prior distribution of s and it is modeled such that the f (s) reduces
with magnitude of s. For instance, the Expectation–Maximization (EM) algorithm
[10] is an iterative method used to solve Maximum Likelihood (ML) estimation prob-
lems where some information is missing or unknown. EM estimates unobserved data
from incomplete observed data through expectation and maximization steps. Given
that the EM algorithm provides a general method rather than a specific solution, there
is ongoing debate about whether it qualifies as a true algorithm. The EM algorithm
is based on the concept of having complete but unobserved data and incomplete but
observed data. The space-alternating generalized expectation–maximization (SAGE)
algorithm was first introduced in [11]. Each iteration of the SAGE algorithm essen-
tially corresponds to an iteration of the EM algorithm. SAGE significantly reduces
complexity as the number of parameters increases by decomposing the optimization
problem into several simpler sub-problems. The EM algorithm struggles to optimize
13 High Precision Positioning Algorithms Based on Improved Sparse … 331
the likelihood function given the high dimensionality required for the maximization
step due to the large number of model parameters. SAGE uses an initial rough esti-
mation for the zeroth iteration, and its performance heavily depends on the quality
of these initial estimates.
Bayesian Compressed Sensing (BCS) and Sparse Bayesian Learning (SBL) are
part of the Gaussian prior model, where the prior is modeled as Gaussian with
variance parameterized by a hyper-parameter, and can be estimated from data using
the EM algorithm or ML estimation.
13.3.4 Discussion
There are many techniques and methods to estimate DOA, but there is a compro-
mise between calculation time and resolution. Two conventional methods, namely
MUSIC and ESPRIT, provide high resolution, which requires highly complicated
mathematical calculations, especially at a low signal-to-noise ratio. Statistical sparse
recovery techniques similar to SBL also provide high resolution at high SNR and
false peaks at low SNR. At this stage, deep learning, as a subset of machine learning,
can also be used to learn the conversion between received signals and channels by
training neural networks instead of heavy calculation, but it is also limited by training
data and calculation overhead. The important problem of sparse statistical recovery
algorithms is that there is a grid mismatch between the estimated values that are not
on the grid points, while the existing off-grid algorithms need to pay a large system
complexity. Therefore, this chapter will introduce sparse Bayesian learning and its
improved algorithm in detail.
For R users,
[ ]
Q(ϕ, τ) = q(ϕ 1 , τ 1 ), . . . , q(ϕ RK , τ RK ) ∈ CMN ×RK (13.12)
where
13 High Precision Positioning Algorithms Based on Improved Sparse … 333
( )−1
∑ = QH (ϕ, τ )Q(ϕ, τ ) + A−1 , (13.17)
( )
α (j+1) = arg max L ξ (j+1) , α, ϕ (j) , τ(j) |ξ (j+1) , α(j) , ϕ (j) , τ(j) , (13.22)
α
Then we substitute (13.16) and (13.20) into the iterative formula to get a closed-
form solution of ξ, α, ϕ, τ through updating.
For ξ and αk , the surrogate function can be simplified to a convex function, so the
closed solution can be obtained, then ξ is updated as
MN + v
ξ (j+1) = ( ), (13.25)
χ + Φ ξ (j) , α(j) , ϕ (j) , τ(j)
( )
where Φ(ξ, α, ϕ, τ ) = tr Q(ϕ, τ)∑QH (ϕ, τ) + ||y − Q(ϕ, τ)μ||22 .
For αk , it is updated as
(j+1) ε
αk = [ ( ( ))] , ∀k, (13.26)
ρ + diag Ξ ξ (j+1) , α(j) , ϕ (j) , τ(j) k
where Ξ(ξ, α, ϕ, τ) = ∑ + μ · μH .
For ϕk and τk , since the surrogate function is non-convex and it is difficult to find
the global optimal solution, we use the exact block MM algorithm in [8] to update
ϕ and τ, that is, we apply the gradient update to the surrogate function as
(j)
ϕ (j+1) = ϕ (j) + ηϕ · L' (ϕk ), (13.27)
(j)
τ(j+1) = τ(j) + ητ · L' (τk ), (13.28)
where η is the step size of the backtracking search, L' (ϕk ) and L' (τk ) are the derivative
of the objective function to differentiate ϕk and τk respectively.
The traditional SBL method operates on a fixed grid. However, the actual AOA
and delay may not necessarily align with the given grid points, resulting in larger
estimation errors. To address this issue, we propose an enhanced SBL algorithm that
incorporates a novel two-dimensional adaptive grid refinement technique into the
traditional SBL framework. The key aspects of this method are twofold:
Firstly, we address the fixed-grid problem in sparse estimation. For the fixed
parameter grid in system modeling, we transform the fixed grid into an adjustable
grid, where the grid points are treated as adjustable parameters. This allows the mesh
fineness to be adjusted based on different accuracy requirements.
Secondly, to distinguish from general high-density grids, we selectively refine only
the areas around grid points that may contain actual values, significantly improving
the algorithm’s efficiency.
Figure 13.2 illustrates the grid refinement process. The specific refinement method
is as follows:
13 High Precision Positioning Algorithms Based on Improved Sparse … 335
(1) We build a two-dimensional grid and assume that the abscissa of the grid repre-
sents the angle domain, and the ordinate of the grid represents the delay domain.
Meanwhile, let δϕ and δτ denote the grid interval of the angle domain and the
delay domain respectively.
(2) At the jth iteration, the grid where⎧there
[ may be real values ]of the AOA and
⎨ ϕ̂ (j) − 2δϕ(j) , ϕ̂ (j) + 2δϕ(j)
TOA is updated to a new grid with [ (j) ]
k k
⎩ τ̂ − 2δτ(j) , τ̂ (j) + 2δτ(j)
k k
(j) (j)
as the grid size, where ϕ̂k and τ̂k represent the estimated angle and delay
values at the jth iteration respectively.
(
(j+1) (j)
δϕ = δϕ /ζ
(3) Then, in the next iteration, let (j+1) (j) , where ζ is the refinement
δτ = δτ /ζ
interval. We hope to obtain a fine grid by slowly narrowing the grid interval
using more refinement levels. A too large ζ will result in the new grid not
containing the true values, while too small ζ will make the mesh refinement
time too long, resulting in a large number of iterations. Therefore, ζ needs to
be set to a reasonable value.
We combine the grid refinement method with the SBL framework to get the improved
SBL algorithm. The specific idea of the proposed algorithm is as follows:
(1) For the first iteration, i.e.j = 1, create a rough two-dimensional grid (ϕ, τ) =
{(ϕ k , τ k ), k = 1, . . . , K} around the possible true values. However, the original
grid should not be too rough to avoid large errors.
(2) Use grid points at this time to obtain Q(ψ, τ) in (13.9), then the hyperparameters:
ξ̂ , α, ψ̂, τ̂ are updated with the MM algorithm.
(3) Recalculate Q(ψ, τ) using the estimated ξ̂ , α, ψ̂, τ̂ , and obtain μ. Then calculate
the average power of each grid point in line k of β, as
L )
∑ [ ](
cos(θl )
p= s + c · τl /L. (13.30)
sin(θl )
l=1
is, M = 16, N = 256, K = 10, the complexity of the improved SBL algorithm is
also less than that of the ESPRIT algorithm. In summary, the proposed improved
SBL algorithm also has certain advantages in algorithm complexity.
Fig. 13.3 Positioning accuracy comparison of the conventional MUSIC algorithm and the MUSIC
algorithm combined with the adaptive grid refinement method
Fig. 13.6 The RMSE of position estimation varies with SNR in LOS condition
13 High Precision Positioning Algorithms Based on Improved Sparse … 341
In this section, we validate the proposed algorithm and other classic algorithms using
a localization test dataset [16] for 5G large-scale MIMO collected by K. Gao and H.
Wang, comparing the performance of different algorithms in two typical scenarios. A
server equipped with an Intel Xeon E5-2626 v2 CPU and a GeForce GTX 1080 GPU
was used to run the algorithms with the dataset. A Nikon DTM-352C Total Station
was utilized for site surveying and positioning-point labeling. Two typical 5G NR
positioning scenarios were selected for dataset experiments, as shown in Figs. 13.8
and 13.9.
Scenario 1 is an indoor office hall in a new building at the Chinese Academy of
Sciences (CAS) in Beijing, China. Five ISAC gNBs operate at 3.5 GHz with 100
MHz bandwidth and 40 W power. These five gNBs are suspended on plastic holders
2.4 m above the ground. For simulation, there is a random floating height of 0.1 m
to prevent coplanarity. The UE, acting as a receiver, is mounted on a marked liftable
cart 1.2 m above the ground to simulate a 1.8 m tall person holding a mobile phone.
Scenario 2 represents a typical urban canyon environment, where it is difficult
for a UE to access sufficient satellites for positioning. The UEs are located on a
342 J. Fan et al.
Fig. 13.7 Comparison of positioning results with clock offset and without clock offset
Fig. 13.8 A typical indoor positioning scenario in an office hall: a photograph and b electromag-
netic simulation results
low-rise platform between two high-rise buildings. The dataset can be downloaded
from IEEE DataPort at https://dx.doi.org/https://doi.org/10.21227/jsat-pb50.
In scenario 1, we face a challenging indoor environment characterized by the
presence of numerous obstacles, walls, and numerous signal interference sources,
13 High Precision Positioning Algorithms Based on Improved Sparse … 343
Fig. 13.9 A typical outdoor positioning scenario in an urban canyon: a photograph and b electro-
magnetic simulation results
or areas where signal reflection and attenuation are significant due to various struc-
tures. The complexity of this environment leads to an increase in positioning errors,
posing greater challenges for algorithms to accurately estimate positions. In this envi-
ronment, positioning algorithms need to overcome signal obstruction and reflection
problems caused by multiple obstacles, as well as interference from other electronic
devices. These factors work together, resulting in the cumulative error distribution
function (CDF) chart (as shown in Fig. 13.10) showing higher error rates and larger
error fluctuations. Compared to the complexity of scenario 1, scenario 2 describes a
simpler and more controllable outdoor environment. This is usually an open space
between buildings or a setting with minimal obstacles and fewer signal interference
sources. In such an environment, signal propagation is more direct and predictable,
which enables positioning algorithms to estimate positions more accurately, as the
number of error sources is greatly reduced. Due to the simplicity of scenario two, the
localization algorithm can better utilize signal strength and quality, thereby improving
the accuracy of localization. Therefore, the error CDF chart of different algorithms
in this indoor environment (as shown in Fig. 13.11) shows lower error rates and more
stable performance.
In scenario 1, the indoor environment, the localization error distribution of all
algorithms shows larger error values, but in contrast to the original analysis, the
performance in the outdoor environment, scenario 2, is better than in scenario 1.
This indicates that the proposed method, followed by OGSBL, ESPRIT, and MUSIC,
shows improved performance in the outdoor scenario. The CDF curve of the proposed
method reaches a higher CDF value, meaning that it can reach high positioning
accuracy even in the presence of significant outdoor environmental challenges.
Compared with the indoor scenario, the positioning error of all algorithms in the
outdoor scenario generally decreases, indicating that the outdoor scenario provides
344 J. Fan et al.
conditions that are favorable for accurate localization. The proposed method main-
tains the best performance in both environments, but its advantages are more
pronounced in the outdoor scenario. Through this analysis, we can understand the
performance of each algorithm in two different localization scenarios more accu-
rately. The proposed method has shown excellent performance in both scenarios,
especially in the outdoor scenario, where its advantages are more pronounced.
13.7 Summary
In this chapter, we first explore the applications of the classical MUSIC and ESPRIT
algorithms in mmWave MIMO systems. The MUSIC algorithm is renowned for its
excellent resolution of signal subspaces, performing particularly well under high
SNR conditions. However, its performance may be limited when SNR is low and
there are few snapshots. In contrast, the ESPRIT algorithm, by recursively decom-
posing the signal subspace, reduces dependence on the noise subspace, demonstrating
robustness in the case of low SNR and a limited number of snapshots.
Although the MUSIC and ESPRIT algorithms exhibit remarkable performance
under certain conditions, they may experience performance degradation in extreme
scenarios such as low SNR and a limited number of snapshots. To address this issue,
we introduce a novel approach that leverages the joint sparsity of angle and delay in
mmWave systems, formulating the estimation of angle and delay as a Sparse Bayesian
Learning (SBL) problem. To tackle the challenges posed by traditional SBL algo-
rithms, including grid mismatch and the high complexity of the OGSBL algorithm,
we design a new two-dimensional adaptive grid refinement method. This method
treats fixed grid points as adjustable parameters within a given range, gradually
reducing the grid area with each iteration. By integrating the adaptive grid refinement
method into the SBL framework, we propose an improved SBL algorithm to enhance
the accuracy of Angle of Arrival (AOA), time delay, and Mobile Station (MS) posi-
tion estimation. Simulation results demonstrate that under the condition of a single
snapshot, the proposed algorithm achieves higher positioning accuracy compared to
other algorithms. Tests on the real dataset also show that even in the presence of
complex environmental challenges, the algorithm achieves high positioning accu-
racy. This approach excels in complex environments, providing a reliable solution
for high-precision positioning.
References
1. Garcia N, Wymeersch H, Slock DTM (2018) Optimal precoders for tracking the AoD and AoA
of a mmWave tath. IEEE Trans Signal Process 66(21):5718–5729
2. Guo Z, Wang X, Heng W (2017) Millimeter-Wave channel estimation based on 2-D beamspace
MUSIC method. IEEE Trans Wirel Commun 16(8):5384–5394
346 J. Fan et al.
3. Yin D, Zhang F (2020) Uniform linear array MIMO radar unitary root MUSIC angle estimation.
In: 2020 Chinese automation congress (CAC), pp 578–581
4. Zhang J, Haardt M (2017) Channel estimation and training design for hybrid multi-carrier
mmWave massive MIMO systems: the beamspace ESPRIT approach. In: 2017 25th European
signal processing conference (EUSIPCO), pp 385–389. https://doi.org/10.23919/EUSIPCO.
2017.8081234
5. Talaei F, Dong X (2019) Hybrid mmWave MIMO-OFDM channel estimation based on the
multi-band sparse structure of channel. IEEE Trans Commun 67(2):1018–1030. https://doi.
org/10.1109/TCOMM.2018.2871448
6. Lee J, Gil G, Lee YH (2016) Channel estimation via orthogonal matching pursuit for hybrid
MIMO systems in millimeter wave communications. IEEE Trans Commun 64(6):2370–2386
7. Yang Z, Xie L, Zhang C (2013) Off-grid direction of arrival estimation using sparse bayesian
inference. IEEE Trans Signal Process 61(1):38–43
8. Dai J, Liu A, Lau VKN (2018) FDD massive MIMO channel estimation with arbitrary 2D-array
geometry. IEEE Trans Signal Process 66(10):2584–2599
9. Stoeckle C et al (2015) DoA estimation performance and computational complexity of subspace
and compressed sensing-based methods. In: WSA 2015; 19th international ITG workshop on
smart antennas, pp 1–6
10. Moon TK (1996) The expectation-maximization algorithm. IEEE Signal Process Mag
13(6):47–60
11. Fleury BH et al (1999) Channel parameter estimation in mobile radio environments using the
SAGE algorithm. IEEE J Sel Areas Commun 17(3):434–450
12. Ji S, Xue Y, Carin L (2008) Bayesian compressive sensing. IEEE Trans Signal Process
56(6):2346–2356
13. Wipf DP, Rao BD (2004) Sparse bayesian learning for basis selection. IEEE Trans Signal
Process 52(8):2153–2164. https://doi.org/10.1109/TSP.2004.831016
14. Tipping ME (2001) Sparse bayesian learning and the relevance vector machine. J Mach Learn
Res 1(3):211–244
15. Zhao S et al (2021) A new TOA localization and synchronization system with virtually
synchronized periodic asymmetric ranging network. IEEE Internet Things J 8(11):9030–9044
16. Gao K, Wang H, Lv H, Liu W (2022) Toward 5G NR high-precision indoor positioning via
channel frequency response: a new paradigm and dataset generation method. IEEE J Sel Areas
Commun 40(7):2233–2247. https://doi.org/10.1109/JSAC.2022.3157397
Chapter 14
UWB Non-line-of-Sight Propagation
Identification and Localization
14.1 Introduction
In outdoor environments (e.g., city streets, deserts, and sea surfaces), the positioning
accuracy of the Global Navigation Satellite System (GNSS) can reach meter or even
sub-decimeter levels depending on the specific system used. However, in indoor
scenarios, GNSS signal loss and attenuation are significant due to blockage by
building roofs, walls, and other obstacles, making it impossible to obtain contin-
uous and accurate position information. Therefore, developing indoor positioning
technology is essential to meet the demand for indoor location services in the era of
J. Wang
Eacon Mining Technology, Beijing 100000, P. R. China
e-mail: [email protected]
K. Yu (B)
School of Environment Science and Spatial Informatics, China University of Mining and
Technology, Xuzhou 221116, P. R. China
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 347
K. Yu (ed.), Positioning and Navigation Using Machine Learning Methods, Navigation:
Science and Technology 14, https://doi.org/10.1007/978-981-97-6199-9_14
348 J. Wang and K. Yu
the “Internet of Things” with the help of continuously innovating wireless communi-
cation technology. Among existing indoor wireless positioning technologies, ultra-
wideband (UWB) stands out for its strong signal penetration, high anti-interference
capability, and high time resolution, achieving centimeter-level positioning accuracy.
However, UWB location accuracy can be significantly reduced by non-line-of-sight
(NLOS) propagation in complex indoor environments. Thus, identifying and miti-
gating NLOS errors is crucial for improving UWB positioning accuracy in such harsh
conditions.
According to several key models of wireless positioning technology, such as angle
of arrival (AOA) [1], time of arrival (TOA), time difference of arrival (TDOA) [2], and
received signal strength indication (RSSI) [3], various traditional indoor positioning
algorithms have been developed. Wade introduced a Taylor-series algorithm based
on TDOA as early as 1976 [4], which improves the position estimation of unknown
tags by solving the local least squares solution of the TDOA measurement error and
iterating towards the true position. In 1990, Fang proposed a straightforward localiza-
tion algorithm that directly uses four TDOA measurements to estimate the position
of an unknown tag [5]. Chan and Ho introduced a non-iterative localization algo-
rithm in 1994, implemented using TDOA and maximum likelihood estimation, and
demonstrated it could reach the Cramer-Rao lower bound in the small error region
[6]. These classical localization algorithms are effective in specific applications and
generally achieve good localization accuracy in LOS environments. However, they
fail to account for the impact of NLOS conditions on TDOA measurements, which
significantly reduces their accuracy. Therefore, in real indoor settings, it is crucial to
suppress the measurement errors caused by NLOS propagation. NLOS error suppres-
sion techniques can be broadly classified into two categories. One approach involves
first identifying NLOS propagation, processing the identified NLOS measurements,
and then determining the tag’s position based on the processed data. The other
approach focuses on directly optimizing the measurement information or the algo-
rithm itself. This chapter will concentrate on the first category of methods and the
associated experiments.
In multi-resolution analysis, L2 (R) = ⊕Wj (j ∈ Z), i.e., the signal space L2 (R)
is decomposed into the orthogonal sum of all subspaces Wj (j ∈ Z) according to
different scale factors j, where ⊕ denotes the direct sum operation, {W j } are the
wavelet subspaces of wavelet function ΨH (t) and Z is the integer set. On this basis,
we consider further subdivision of the wavelet subspace Wj . First, we define a new
350 J. Wang and K. Yu
space Ujn (n ∈ Z+ ),where Z+ is the positive integer set and n refers to the ordinal
number of each node of the decomposition tree:
(
Uj0 = Vj
, j∈Z (14.5)
Uj1 = Wj
where Vj is the scaling space. According to the multiresolution analysis [9], the
scaling space Vj is represented as:
0
Thus, according to (14.5) and (14.6), Uj+1 can be expressed as:
0
Uj+1 = Vj+1 = Vj ⊕ Wj = Uj0 ⊕ Uj1 , j ∈ Z (14.7)
n
Uj+1 = Uj2n ⊕ Uj2n+1 , j ∈ Z, n ∈ Z+ (14.8)
As the scaling factor j increases, the spatial resolution of the corresponding wavelet
basis function becomes higher. The division of the subspace Wj is represented by
a binary tree as shown in Fig. 14.1, which is the wavelet packet decomposition
tree. This case is the three-level decomposition and the number of levels is mainly
determined by the specific signals and experimental requirements.
In this chapter, we utilize “Haar” wavelets to decompose h' (t) up to the fifth layer.
Figure 14.2 displays colored coefficient images for the root nodes of the wavelet
14 UWB Non-line-of-Sight Propagation Identification and Localization 351
Fig. 14.1 The wavelet packet decomposition tree. Here Wj is decomposed into three layers as an
example
decomposition tree for h' (t) across three UWB signal propagation channels. The
coefficient distributions at the root node vary among the propagation channels after
one-dimensional wavelet packet decomposition. Larger color scales indicate higher
coefficients at the root node, reflecting stronger signals at that layer. In the LOS
propagation channel (Fig. 14.2a), the root node’s coefficient distribution is highly
concentrated, suggesting dominance of the first arrival path of the UWB signal. In
the HNLOS propagation channel (Fig. 14.2c), the root node’s coefficient distribu-
tion is more scattered, indicating significant influence from NLOS and pronounced
multipath effects, resulting in signal distortion. The SNLOS propagation channel
(Fig. 14.2b) exhibits intermediate characteristics between the LOS and HNLOS
scenarios.
Convolutional neural network
In comparison to other machine learning techniques, convolutional neural networks
(CNNs) excel in feature learning, especially with grayscale matrix inputs (e.g.,
images), offering stability and simplicity. A typical CNN architecture comprises
convolutional layers, pooling layers, and fully connected layers, with adjustments to
layer counts and hyperparameters based on specific tasks. Figure 14.3 illustrates the
standard structure of a CNN, where feature map sizes decrease progressively across
layers, culminating in outputs from the fully connected layer.
The initial step involves data input. Following the acquisition of colored coefficient
images for root nodes from the wavelet packet decomposition tree across different
signal propagation channels, we refrain from direct usage as input for the CNN model.
Instead, we first engage in data preprocessing. Initially, a segment of the acquired
image database is set aside as training data for the CNN. The remaining data serves
as the validation and test sets for subsequent evaluation. All participating images are
then converted into the tfrecords data format, a binary file type containing sequences
352 J. Wang and K. Yu
Fig. 14.2 a, b, and c are the colored coefficients images in LOS, SNLOS, and HNLOS propagation
channels, respectively. The number of decomposition layers was set to 5
14 UWB Non-line-of-Sight Propagation Identification and Localization 353
of byte strings. These converted images are in RGB color mode with a pixel size of
128 × 128.
The next phase involves the convolution layer, which fundamentally entails a
specialized matrix operation between input images and a convolution kernel. This
operation computes by multiplying each weight (i.e., pixel value) in the convolution
kernel f by the corresponding pixel value in the input image I, summing the results
(considering only one channel in the calculation) as follows:
[ ] ∑∑
G(i, j) = (I ∗ f ) i, j = f (m, n)I (i − m, j − n) (14.11)
m n
354 J. Wang and K. Yu
In this layer, m and n represent the row and column indices of the convolution kernel
matrix respectively, while i and j represent the row and column indices of the output
feature map of the layer respectively. Three primary hyperparameters are essential
here: filter size, strides, and padding. Convolutional kernels typically consist of pixel
arrays with odd rows and columns. The filter size is determined by the number of input
channels (3 in this chapter) and the number of output channels (equal to the number
of convolutional kernels). Strides indicate the distance each convolution kernel slides
during the operation. Larger strides theoretically lead to more significant loss of input
image features. To mitigate feature loss after several convolution operations, we pad
the edges of the feature map with empty pixels, known as padding. Padding includes
two categories: “Same” (ensuring the size of the feature map remains unchanged
after padding) and “Valid” (indicating no padding is applied).
Following the convolution operation, instead of directly applying pooling as
shown in the basic CNN structure in Fig. 14.3, we introduce an activation func-
tion layer. This layer’s primary function is to nonlinearly map the output of the
convolutional layer, allowing the CNN model to fit complex functions effectively.
In this chapter, the rectified linear unit (ReLU) function is employed due to its fast
convergence rate and straightforward gradient calculation:
( ) { }
f pi,j = max 0, vi,j (14.12)
where vi,j is the pixel value of the feature map at position (i, j).
The subsequent pooling layer compresses the input feature images (termed feature
downsampling) to simplify network computational complexity and extract key
features. Pooling generally involves two types: max pooling and average pooling.
Max pooling selects the maximum value within each local region, while average
pooling computes the average value. In this chapter, we employ max pooling,
which requires setting several hyperparameters akin to the convolutional layer. These
include the pooling window size and the strides of the pooling window.
To enhance the CNN model’s training speed and generalization capability, a batch
normalization (BN) layer is incorporated. Unlike other normalization methods like
local response normalization (LRN), BN normalizes each layer’s input to maintain
fixed mean and variance within a defined range. This normalization helps stabilize
gradient descent, leading to improved training speed and generalization
{ }ability of the
CNN model. A mini-batch Ω of size m is denoted as Ω = a1, a2,... am , where {ai }
denotes the input activation parameter. The essence of BN can be expressed as:
⎧ ∑m
i=1 ai → μΩ
1
⎪
⎪ ∑mm
⎪
⎨ (a i − μΩ ) → σΩ
1 2 2
m i=1
√ ai −μΩ
→ ai
Δ
(14.13)
⎪
⎪
⎪
⎩ σΩ 2 +ε
Δ
BN γ ,β (ai ) ≡ γ ai + β → yi
14 UWB Non-line-of-Sight Propagation Identification and Localization 355
G = WTx + b (14.14)
where W T is the weight matrix, x denotes the input of the fully connected layer,
b is the bias, and G is the output column vector of the fully connected layer. The
function of the Softmax layer is to map L real numbers on the range of (+∞, −∞)
to L real numbers within the range of (0, 1). Thus, the probability that a certain
ODWPA-converted test image X belongs to category l is denoted as:
g
p(X , l) = softmax(gl ) = ∑e el gl ,
∑ L (14.15)
subject to L p(X , l) = 1
where gl is the lth element in the column vector G and L is the total number of
categories to be identified. Therefore, in the case of three channel categories, LOS,
soft NLOS (SNLOS) and hard NLOS (HNLOS), the final classification result of the
image can be denoted as:
⎧
⎨ 0, X ∈ SNLOS
P(X ) = argmax{p(X , l)} = 1, X ∈ HNLOS (14.16)
⎩
2, X ∈ LOS
Two experimental sites are chosen, representing typical indoor scenarios: a teaching
building and an underground car park at China University of Mining and Tech-
nology. For each scenario, UWB devices are strategically positioned following guide-
lines from [11, 12]. Figure 14.4 illustrates the layout of UWB devices at these two
experimental sites.
356 J. Wang and K. Yu
Classroom
Classroom
Corridor
Passageway
Elevator
During the training phase, 70% of the data collected from scenario #1 were allo-
cated as the training dataset for the five CNNs. The learning rate regulates how quickly
CNN parameters are updated during training, while batch size determines the number
of samples processed per training iteration. Consistency between predicted classifica-
tion labels and true labels is quantified using loss metrics. To ensure fair comparison,
identical learning rates and batch sizes were set for all CNN models. The smaller
the loss value, the closer the predicted label is to the real label. When it stabilizes
around a certain value, it means that the training of the model has reached conver-
gence. To monitor the loss variations during training of the five CNNs, we utilized
Tensorboard for visualization, as depicted in Fig. 14.5. Clearly, the losses of the two
self-built CNN models quickly approach zero and then stabilize, indicating rapid
convergence. In comparison, AlexNet requires more epochs to converge. ResNet
358 J. Wang and K. Yu
1.5
CNN_A
CNN_B
1 AlexNet
Loss
0.5
0
0 50 100 150 200 250 300
Epochs
6
ResNet
Loss
4 Inception_v3
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Epochs 104
and Inception_v3 exhibit prolonged training periods before reaching a stable state,
with final steady-state losses around 5.9 and 0.7, respectively.
Next, the CNNs underwent testing. We saved the trained UWB signal propagation
channel classifiers individually, and the remaining 30% of the dataset served as the
test set to evaluate each classifier’s performance. We assessed the classifiers’ ability to
identify three categories of UWB signal propagation channels using precision, recall,
and F1-score metrics [16]. Table 14.3 compares the performance of CNN_A, CNN_
B, AlexNet (at Epochs = 300), ResNet, and Inception_v3 (at Epochs = 50,000)
after reaching convergence in training. Both self-built models demonstrated superior
identification across all three propagation channels compared to the classical models.
Notably, CNN_A achieved an average precision and recall of 100%. Furthermore,
CNN_A exhibited 1.6, 17.4, and 2.8% higher average precision compared to AlexNet,
ResNet, and Inception_v3, respectively. Additionally, performance across the three
categories varied significantly, even within the same classifier. For instance, the
identification recalls of AlexNet on HNLOS and LOS is 2.5% lower than that on
SNLOS; the identification precision of ResNet on HNLOS is 8.4% lower than that
on LOS; and the recall of Inception_v3 on LOS is 5.3% higher than that on HNLOS.
To clearly illustrate the identification performance of the five classifiers, Fig. 14.6
presents a box plot of their F1-scores. CNN_A demonstrates significantly supe-
rior identification performance across all three channel categories compared to the
other classifiers. On average, CNN_A achieves F1-scores approximately 0.4%, 1.7,
Table 14.3 The precision and recall of five classifiers in scenario #1
CNN_A CNN_B AlexNet ResNet Inception_v3
Precision Recall Precision Recall Precision Recall Precision Recall Precision Recall
SNLOS 100.0 100.0 100.0 100.0 97.6 100.0 85.2 78.9 98.9 96.8
HNLOS 100.0 100.0 98.9 100.0 97.5 97.5 77.1 92.6 96.7 94.7
LOS 100.0 100.0 100.0 98.9 100.0 97.5 85.5 74.7 95.9 100.0
Mean 100.0 100.0 99.6 99.6 98.4 98.3 82.6 82.1 97.2 97.2
14 UWB Non-line-of-Sight Propagation Identification and Localization
359
360 J. Wang and K. Yu
22.1, and 3.0% higher than CNN_B, AlexNet, ResNet, and Inception_v3, respec-
tively. Despite its extensive training period (50,000 epochs), ResNet exhibits the
poorest identification performance across the three channel categories. Additionally,
we conducted experiments to verify if increasing the number of training samples
impacts classifier performance. We collected an additional 3000 UWB signal wave-
forms under various signal propagation channels in scenario #1. However, the results
show only marginal performance improvement, ranging from 0.1 to 0.6%. There-
fore, we decided against adding more training samples in subsequent experiments.
Table 14.4 details the time consumption of the five CNNs during training and testing
phases. Notably, “training time consumption” refers to the duration from the start
of training until convergence. CNN_A has the shortest total time consumption at
38.793 s. The self-built CNN models also exhibit shorter training times: 17.155 s
for one and 8.947 s for the other, as observed in Fig. 14.5. Conversely, ResNet and
Inception_v3 require longer training times to achieve convergence due to their larger
number of convolutional kernels, neurons, and layers. While the training phase can
be conducted offline, the time consumed during testing directly impacts the real-
time performance of propagation channel identification for localization. Based on
comprehensive evaluation and analysis, we selected the UWB signal propagation
classifier based on CNN_A to proceed with the next phase of the experiment.
To further validate the superiority, robustness, and generalizability of the selected
classifier, we combined UWB signal waveforms collected from scenario #1 and
scenario #2 to create a new mixed scenario. This approach allows the classifier
built in this mixed scenario to handle NLOS identification across multiple sub-
scenarios, eliminating the need for repeated modeling efforts. The identification
0.95
F1-score
0.9
0.85
0.8
Fig. 14.6 The F1-score of five classifiers in scenario #1. The green diamonds indicate the mean,
the red dashes indicate the median, and short black solid lines indicate the minimum or maximum
14 UWB Non-line-of-Sight Propagation Identification and Localization 361
Table 14.4 The time of five CNNs for training and testing
Phase CNN_A CNN_B AlexNet ResNet Inception_v3
Training 17.155 s 8.947 s 29.161 s > 24 h > 24 h
Testing 21.638 s 85.688 s 29.929 s 37.272 s 50.576 s
Total 38.793 s 94.635 s 59.090 s > 24 h > 24 h
performance of our proposed method across three experimental scenarios was also
compared with several existing methods. In general, Support Vector Machine (SVM)
excels in handling problems with a small number of samples, nonlinearity, and high-
dimensional features, but it may lack model stability. K-Nearest Neighbors (KNN)
offers higher accuracy but is computationally intensive. Random Forest (RF) can
manage high-dimensional problems but may be prone to overfitting. It’s important
to note that the choice of feature sets used for training SVM, RF, and KNN can
significantly impact their identification performance. We adopted specific feature
sets based on previous research for each method to maintain consistency. For SVM,
we utilized {maximum amplitude, mean excess delay} as described in [17]; for RF,
{standard deviation, skewness, kurtosis} as detailed in [18]; and for KNN, {root-
mean-square delay spread, mean excess delay, kurtosis, maximum amplitude, skew-
ness, rise time} as outlined in [17, 18]. Additionally, to demonstrate the superiority
of our proposed method, we evaluated and analyzed the performance of a CNN-
based scheme without employing ODWPA. In this alternative approach, referred to
as CNN_A’, we utilized raw CIR data as input instead of the colored coefficients
images processed by ODWPA. This strategy allows us to validate the effectiveness
of our approach and compare it comprehensively against existing methods.
Table 14.5 presents the identification performance of six methods in scenario
#1, highlighting precision and recall values above 90%. Both CNN-based methods
(CNN_A and CNN_A’) exhibit higher precision and recall compared to the other
four methods. Importantly, our proposed method successfully identifies all three
UWB signal propagation channels, a feat not achieved by previous approaches. SVM
(“rbf”), SVM (“linear”), and RF exhibit poorer performance on certain UWB signal
propagation channels, resulting in lower average precision and recall. For example,
SVM (“linear”) achieves only 62.3% precision for HNLOS, and RF achieves only
63.0% recall for HNLOS. This suggests these methods may struggle to effectively
differentiate between the three UWB signal propagation channels using the chosen
feature sets, leading to inadequate classifiers that lack sensitivity to specific propa-
gation channel categories. Figure 14.7 displays the F1-score of the six methods in
scenario #1, demonstrating that our proposed method achieves an average F1-score
of 1.0, indicating significantly superior overall identification performance compared
to the other methods.
Table 14.6 and Fig. 14.8 present the precision, recall, and corresponding F1-
score of the six methods in scenario #2, an underground car park with complex
conditions due to parked cars. In Table 14.6, RF achieves average precision and
recall of 97.0 and 96.6% respectively in scenario #2, slightly lower than those of
362
0.95
0.9
0.85
F1-score
0.8
0.75
0.7
0.65
0.6
the proposed method. However, overall, the average precision, recall, and F1-score
of all methods in scenario #2 are lower compared to scenario #1, except for RF.
Notably, the proposed method maintains the highest average precision and recall
at 98.1%, while SVM (“rbf”) and SVM (“linear”) both achieve average precision
and recall below 80%. Regarding the anomalous performance of RF in scenario #2,
where it outperforms other methods, we speculate it may relate to the specific training
feature set used ({standard deviation, skewness, kurtosis}). RF might be particularly
adept at identifying UWB signal propagation channels in scenario #2 due to these
features. Nonetheless, the proposed method, CNN_A’, and RF exhibit significantly
better overall identification performance compared to the other three methods, as
depicted in Fig. 14.8.
Table 14.6 and Fig. 14.9 depict the precision, recall, and corresponding F1-score of
the six methods in the mixed scenario, aimed at assessing the stability and adaptability
of the classifier. In Table 14.6, only the proposed method achieves both average
0.95
0.9
0.85
F1-score
0.8
0.75
0.7
0.65
0.6
CNN_A CNN_A' SVM ("rbf") SVM ("linear") RF KNN
Different Methods
precision and recall above 90%. SVM (“linear”) exhibits the lowest average precision
and recall at 69.6 and 64.1%, respectively. Figure 14.9 illustrates that the proposed
method consistently achieves a significantly higher average F1-score across the three
UWB signal propagation channels compared to the other methods. This highlights the
robustness and effectiveness of the proposed method in handling mixed scenarios and
its superior performance in accurately identifying UWB signal propagation channels.
Figure 14.10 illustrates the identification accuracy of various methods across
three scenarios, evaluated using accuracy as the metric, which measures the ratio
of correctly classified samples to the total number of samples. Both CNN-based
methods (CNN_A and CNN_A’) consistently achieve the highest accuracy across
all scenarios, except for CNN_A’ being slightly lower than RF in scenario #2, high-
lighting the effectiveness of CNN and ODWPA. Overall, the accuracy of nearly
all methods declines from scenario #1 to the mixed scenario, with SVM (“linear”)
experiencing the largest decrease of approximately 36%. In contrast, the proposed
method maintains an overall accuracy consistently above 90%, with a decrease of
about 7% across scenarios. This underscores the robustness and autonomous learning
capability of the proposed method compared to other approaches. It’s important to
note that each method identifies a single sample within milliseconds, making them
suitable for real-time identification and localization tasks. However, the proposed
method has only been tested in two individual scenarios and a simple hybrid scenario.
For more challenging indoor environments (e.g., large shopping malls) and complex
hybrid scenarios involving multiple distinct scenarios, a more sophisticated NLOS
identification scheme may be necessary.
14 UWB Non-line-of-Sight Propagation Identification and Localization 365
0.95
0.9
0.85
0.8
F1-score
0.75
0.7
0.65
0.6
0.55
0.5
100
95
90
85
Accuracy(%)
80
75
CNN_A
70 CNN_A'
SVM ("rbf")
SVM ("linear")
65
RF
KNN
60
Sce. #1 Sce. #2 Sce. Mixed
Different Scenarios
Method Description
Since the ranging errors in LOS and NLOS scenarios have different sources and
properties, in order to make the ranging values closer to the real values, we consider
modeling the ranging errors in these two environments separately.
Newton’s method, gradient descent method and least squares method are usually
used for curve fitting. In particular, the least squares method is widely used because
it has the advantages of low complexity, less resource consumption, and globally
optimal solution, so we consider using the least squares method for curve fitting.
Assuming that there is a set of sample data points to be fitted {(xi , yi ), i = 1,2, . . . , n},
and the fitting function is y = ϕ(x); then the sum of the squares of the deviations of
this curve from each data point is minimized, and the mathematical expression is as
follows:
∑n ∑n
i=1 δi = min (ϕ(xi ) − yi )2
2 (14.17)
min i=1
where xi is the ranging value in the LOS/NLOS environment, and yi is the abso-
lute or relative error corresponding to xi . In addition, when choosing a function
model, the distribution of the data to be fitted should be analyzed first, and then
the interpretability and rationality of the function model should also be considered.
The common nonlinear functions for fitting include polynomial function, Gaussian
function and exponential function. Based on the data characteristics and previous
research results, high-order polynomial and exponential function are used to model
the relative error of ranging in LOS/NLOS environments, respectively.
The high-order polynomial function model is described as:
y = a0 + a1 x + · · · + ak xk (14.18)
Y = XA (14.19)
⎡ ⎤
1x1 . . . x1 k
[ ]T ⎢ 1x2 . . . x2 k ⎥
where Y = y1 , . . . , yk , X = ⎢ ⎣
⎥ and A = [a0 , a1 , . . . , ak ]T is the
⎦
...
1xn . . . xn k
y = aebx (14.21)
where {a, b} are the two model parameters to be determined. Taking logarithms on
both sides of Eq. (14.21) yields:
y = bx + C (14.22)
where y = lny and C = lna. Substituting {(xi , yi ), i = 1,2, . . . , n} into Eq. (14.21)
yields a linear equation of compact form:
L = KB (14.23)
where ⎡ ⎤
1x1
[ ]T ⎢ 1x2 ⎥
L = ŷ1 , ŷ2 , . . . , ŷn , K = ⎢ ⎥
⎣ . . . ⎦, B = [c, b] . The least squares solution to
T
1xn
(14.23) is given by
( )−1
B = KT K KT L (14.24)
SST −SSE
R_square = SSR
SST
= SST
=1− SSE
SST
(14.25)
where SSE is the sum of squares of the difference between the fitted value ŷi = ϕ(xi )
and the corresponding original value yi . SST is the sum of squares of the difference
between the original value and its mean. That is:
n (
∑ Δ )2
SSE = yi − yi (14.26)
i=1
( )2
∑
n ∑
n
SST = yi − 1
n
yi (14.27)
i=1 i=1
Experimental Analysis
This experiment is chosen to be carried out in the underground car park of the fifth
teaching building in Nanhu Campus of China University of Mining and Technology
(CUMT). The experimental equipment is Time Domain P450 module.
368 J. Wang and K. Yu
In both LOS and NLOS environments (obstacles with wooden boards, iron sheets
or pedestrians), ranging data were collected at 28 locations separated by 50 cm. At
each location, collection was repeated 500 times, to form a group of data. Among
the 28 groups of data, 24 groups were used for modeling and 4 groups were used for
model validation. The roughness of each group of ranging data was firstly eliminated,
and then its root mean square value was computed and the corresponding absolute
and relative errors were calculated.
First, the ranging error in the LOS environment is modeled and validated for
analysis. According to the results, the error model based on the third-order polynomial
function in the LOS environment can be established, as shown in Fig. 14.11.
The polynomial function can fit the trend of the relative error curve better.
By observing this relative error curve and according to the characteristics of the
exponential function, the transformed exponential function model is considered:
The fitting principle is similar to those applied to Eqs. (14.20) and (14.21). Thus,
the functional expressions for the two models are thus obtained as:
The evaluation metrics of model fitting errors are shown in Table 14.7.
20
Relative Error Curve
Fitting curves to polynomial functions
15
Relative Error/%
10
Table 14.7 Evaluation metrics of the model for fitting the ranging error in LOS
Error model ϕ1 (x) ϕ2 (x)
SSE 35.2 45.79
R-square 0.9245 0.9147
It can be seen from Table 14.7 that the polynomial function-based model ϕ1 (x)
has SSE and higher R-square than the exponential function based model ϕ2 (x),
so the former is considered as the ranging error compensation model in the LOS
environment. Next, the ϕ1 (x) model is validated using the remaining ranging data,
and the validation results are shown in Table 14.8.
From Table 14.8, after the relative error estimation by the ϕ1 (x) model and then
the correction of the ranging values, the ranging accuracy is significantly improved,
by 38% on average. Thus, the polynomial function-based error model ϕ1 (x) can
significantly improve the ranging accuracy in the LOS environment.
In summary, the polynomial function-based error model ϕ1 (x) and the exponential
function-based error model f2 (x) are selected for error compensation of ranging in
LOS and NLOS environments, respectively. They will be used in the localization
experiments to improve the localization accuracy as discussed in the next subsection.
Method Description
Due to the good characteristics of Kalman filter in solving the localization and
tracking problems, this chapter integr ates Chan algorithm, a traditional UWB local-
ization algorithm, with Kalman filter. Before proceeding further, the original obser-
vation information needs to be processed to reduce ranging error, in order to improve
the accuracy of localization. The proposed algorithm mainly consists of three stages.
(1) Stage I: Acquire the original ranging information and then utilize the UWB
signal propagation classifier for NLOS identification.
(2) Stage II: According to the recognition results in (1), the proposed error model
is utilized to compensate the errors of LOS/NLOS ranging values respectively,
so as to obtain the corrected ranging information.
(3) Stage III: Based on the corrected ranging information in (2), the Chan algorithm
is used to obtain the initial position estimate of the target tag, and then the
Kalman filter is used to obtain the final position estimate of the target tag.
Stages I and II have been elaborated and experimentally analyzed earlier, and
the specific implementation details of Chan-Kalman localization algorithm in
stage III will be presented next. Assuming that the position of the target tag
determined by Chan algorithm at the moment k is (xt,k , yt,k ), the position of
the ith (i = 1, 2,…,m) base station is (xi , yi ), and the estimated position of the
Δ Δ
target tag at the moment k is (xt,k , yt,k ), i.e., the state vector at the moment k is
Δ Δ
T
X (k) = [xt,k , yt,k ] . Then, according to Kalman filter, the state equation of the
system is:
where A0 = [1, 0; 0, 1] is the state transfer matrix, the system control matrix is
0, and Wk−1 is the process noise vector whose covariance matrix is a diagonal
matrix denoted by Q. The measurement equation of the system is:
Zk = H0 Xk + Vk (14.33)
where Zk is the tag position vector [xt,k , yt,k ]T determined by the Chan algorithm
at the kth moment, and Vk is the measurement noise vector whose covariance
matrix is also a diagonal matrix denoted by R. The observation matrix is H0 =
[1,0;0,1]. The implementation of the Kalman filter is realized as follows:
( )−1
Kk = Pk− H0T H0 Pk− H0T + R (14.36)
Pk = (I − Kk H0 )Pk− (14.38)
With the above three stages, the estimated position of the tag can finally be made
as accurate and reliable as possible.
Experimental Analysis
The site used in this localization experiment is LOS/NLOS hybrid environment. The
mobile tag is Tag 104 , the positions of four base stations BS 100 , BS 101 , BS 102 and BS 103
are (0, 0), (1500 cm, 0), (1500 cm, 700 cm), (0, 700 cm), and rectangle ABCD is the
preset trajectory of the mobile tag, in which the positions of A, B, C, D are (100 cm,
100 cm), (1100 cm, 100 cm), (1100 cm, 600 cm), and (100 cm, 600 cm), respectively,
as shown in Fig. 14.13.
First, static localization experiments are conducted. The mobile tag Tag 104 is
placed on four known points A, B, C, and D successively, and 1000 times of location
solving is performed on each point. Therefore, the RMSE of Chan algorithm, Chan-
Kalman algorithm and the proposed algorithm (improved Chan-Kalman algorithm
based on NLOS recognition) for location solving at different points can be obtained,
as shown in Table 14.9.
From the above experimental results, regardless of which point the mobile tag is
at, the localization accuracy of Chan-Kalman algorithm is always higher than Chan
algorithm, with an average improvement of about 10.7%. Among them, when the
mobile tag is at point B, Chan-Kalman algorithm improves the localization accuracy
372 J. Wang and K. Yu
Table 14.9 RMSE (cm) of different algorithms for position solving at four points A, B, C and D
A B C D
Chan algorithm 30.1 9.4 14.9 28.9
Chan-Kalman algorithm 28.2 6.9 13.7 28.4
The proposed algorithm 18.1 4.8 5.7 21.9
of Chan algorithm by about 26.6% at most. And since both Chan algorithm and Chan-
Kalman algorithm utilize the raw ranging information to complete the localization
solution, this proves the gain effect of Kalman filter in the localization algorithm.
The average localization accuracy of the proposed algorithm at the four points is
12.6 cm, which is about 32.8% better than the other two algorithms on average,
indicating that the proposed algorithm receives a good gain by utilizing the ranging
information after the error compensation.
Next, the dynamic localization experiment is carried out. The mobile tag Tag 104
is moved slowly along the preset trajectory ABCDA. Therefore, the tag trajectory
obtained by Chan algorithm, Chan-Kalman algorithm and the proposed algorithm
can be obtained, as shown in Fig. 14.14.
As can be seen from the figure, the trajectory estimated by the proposed algorithm
is the closest to the real one, and the estimated trajectory is very smooth, almost unaf-
fected by the ranging error. The average error in the X-coordinate and Y-coordinate
is about 10.7 and 8.9 cm, respectively, which are much lower than the other two
algorithms.
14.4 Conclusions
Aiming at the problem that UWB indoor localization technology is greatly affected
by NLOS propagation in complex environments, this chapter carries out a detailed
study on UWB non-line-of-sight propagation identification and localization. First, an
NLOS identification method based on ODWPA and CNN is proposed for improving
14 UWB Non-line-of-Sight Propagation Identification and Localization 373
700
600
500
400
Y/cm
300
200
100
The real trajectory Chan algorithm Chan-Kalman algorithm The proposed algorithm
0
0 200 400 600 800 1000 1200
X/cm
References
1. Shao H, Zhang X, Wang Z, Wang Z (2007) An efficient closed-form algorithm for aoa based
node localization using auxiliary variables. IEEE Wirel Commun 14(4):90–96
2. Alameda-Pineda X, Horaud R (2014) A geometric approach to sound source localization from
time-delay estimates. IEEE/ACM Trans Audio, Speech, Lang Process 22(6):1082–1095
3. Gezici S (2008) A survey on wireless position estimation. Wireless Pers Commun 44:263–282
4. Foy WH (1976) Position-location solutions by Taylor-series estimation. IEEE Trans Aerosp
Electron Syst 2:187–194
5. Fang BT (1990) Simple solutions for hyperbolic and related position fixes. IEEE Trans Aerosp
Electron Syst 26(5):748–753
6. Chan YT, Ho KC (1994) A simple and efficient estimator for hyperbolic location. IEEE Trans
Signal Process 42(8):1905–1915
7. Cui Z, Gao Y, Hu J, Tian S, Cheng J (2020) LOS/NLOS identification for indoor UWB posi-
tioning based on Morlet wavelet transform and convolutional neural networks. IEEE Commun
Lett 25(3):879–882
374 J. Wang and K. Yu