Semi-Supervised Variational Autoencoder For WiFi Indoor Localization

1. The document proposes a semi-supervised variational autoencoder method for indoor localization using WiFi fingerprinting that can train an accurate prediction model using a small set of annotated WiFi observations and a large set of non-annotated observations. 2. Traditional WiFi fingerprinting requires a dense radio map created through a time-consuming data collection and annotation process, and it can be impacted by hardware and environmental factors. 3. The proposed method aims to significantly reduce the need for labeled data through a semi-supervised deep learning approach based on variational autoencoders.

Uploaded by

issam sayyaf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views8 pages

Semi-Supervised Variational Autoencoder For WiFi Indoor Localization

Uploaded by

issam sayyaf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

1

2019 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 30 Sept. - 3 Oct. 2019, Pisa, Italy

Semi-supervised Variational Autoencoder for WiFi

Indoor Localization
Boris Chidlovskii and Leonid Antsfeld
Naver Labs Europe
Meylan 38240, France
[email protected]

Abstract—We address the problem of indoor localization based The WiFi fingerprinting is a well established approach
on WiFi signal strengths. We develop a semi-supervised deep that tries to overcome this problem. It only requires WiFi
learning method able to train a prediction model from a small received signal strengths (RSS) from the available APs, at a
set of annotated WiFi observations and a massive set of non-
annotated ones. Our method is based on the variational au- set of predefined locations, and does not need to know the
toencoder deep network. We complement the network with an locations of the emitting APs. It proceeds first by the off-line
additional component of structural projection able to further im- data collection, where arrays of WiFi signal strengths (a.k.a.
prove the localization accuracy in a complex, multi-building and fingerprints) are recorded in the predefined locations. All the
multi-floor environment. We consider several different network recordings allow to constitute a WiFi radio-map, which is
compositions which combine the classification and regression sub-
tasks to achieve optimal performance. We evaluate our method then used online, for the real-time localization. To provide
on the public UJI-IndoorLoc dataset and show that the proposed a good location estimation, the fingerprints should be densely
method allows to maintain the state of the art localization recorded and annotated with exact coordinates.
accuracy with a very limited amount of annotated data. The fingerprinting technology is today considered as the
Index Terms—WiFi based indoor localization, semi-supervised state-of-the-art, but the need for a dense and up-to-date radio
learning, variational auto-encoder, UJI-IndoorLoc dataset map represents a serious challenge, since the process of
creating and maintaining such a map is time consuming.
Moreover, hardware variance may have a significantly impact
I. I NTRODUCTION on the position accuracy, as different devices used to collect
WiFi signals often show different levels of consistency. More-
An increasing number of real-world applications require
over, other factors of the signal attenuation, such as physical
an accurate localization for the quality of their services.
obstructions, presence of people, and multi-path should be
While outdoor positioning is addressed by the satellite position
taken into consideration [17].
systems like GPS, in the indoor environment, with no direct
Similarly to other domains, the machine learning techniques
line-of-sight view of the satellites, the accurate positioning
can help cope with some factors of noisy and multivariate WiFi
requires alternative solutions.
data. Indeed various ML methods showed to be successful for
With the WiFi infrastructure being widely deployed in a the localization tasks [4], [21]. These methods build prediction
public indoor environment, such as airports, shopping malls models that map a vector of WiFi Signal Strengths into a
and office buildings, the WiFi based localization is receiving location, where the location can be a coordinate, a room
a lot of attention. Many different techniques have been pro- identifier, a building or floor number, etc.
posed to take advantage of the existing WiFi infrastructure to One important issue when deploying ML methods is the
provide accurate indoor positioning [17], [21]. Most of these reduction of WiFi annotation effort. While it is relatively
techniques however require additional information that should easy to collect unlabeled WiFi data, e.g. by crowdsourcing,
be provided either by the facility administrator or obtained it is significantly more expensive and tedious to annotate the
during the off-line data collection or calibration phase. data with an exact location. The semi-supervised learning is
For example, knowledge of the exact location of the sta- a paradigm where both labeled and unlabeled data are used
tionary WiFi Access Points (APs) allows to use trilateration for building accurate prediction models [2]. Such a semi-
techniques to pinpoint the device or user location. While it is supervised setting is well suited for WiFi data collection
possible to get such information in a controlled environment, where one or more equipped devices can combine a low
like an airport, in practice, it is merely impossible to know an cost collection of non-annotated WiFi data with a limited
exact location of WiFi APs in a shopping mall, where every annotation effort. Several semi-supervised methods [5], [14],
shop manages its own infrastructure. [18], [20] showed their efficiency in reducing the annotation
978-1-7281-1788-1/19/$31.00 c 2019 IEEE needed for an accurate WiFi based localization.

Authorized licensed use limited to: IFSTTAR. Downloaded on December 28,2023 at 20:20:01 UTC from IEEE Xplore. Restrictions apply.
2

In recent years, Deep Learning (DL) achieved a remarkable a complex indoor environment often contains obstacles (such
progress in many domains; some attempts were taken in as walls, doors, etc.) and frequently moving objects (such
applying DL to the indoor localization task [1], [13]. However, as furniture, passing people), the effect of signal multi-path
as for many deep learning frameworks, they often require a makes the trilateration very challenging.
large amount of annotated data. In this paper, we take benefit In [6], the authors proposed a principal way to estimate
of the recent advances in deep learning and semi-supervised the AP locations, in order to use this information for further
learning [8] and apply them to the WiFi localization task. trilateration. The idea is that the change in a signal strength
We propose a new deep learning method, based on Vari- indirectly reflects the direction where the signal comes from.
ational Auto-Encoder (VAE), that significantly reduces the In a local environment, the signal strength is usually increased
need for the labeled data. It can combine a small amount of towards the AP, even if the signal is obstructed by an obstacle.
the labeled data with a large unlabeled dataset to build an Based on this observation, authors developed a gradient-
accurate predictor for the localization task. By adapting the based algorithm that estimates the AP locations. While their
VAE lower bounds, the encoder in our deep network, beyond approach is improving the state-of-the-art results, the median
the data mapping to the latent variables, plays the additional positioning accuracy of 33m is prohibitive to be used for
roles of a classifier (or a regressor) of the available labeled further trilateration in most practical purposes. SAIL [9] is
data, while the decoder plays the regularization role on both using a single AP and dead-reckoning techniques to estimate
labeled ans unlabeled data. We test our approach on publicly an agent location. The median error of 2.3m is reported,
UJI-IndoorLoc data set [16] and show that keeping as little as however, the method uses Channel State Information (CSI)
5% of annotated data does not penalize the accuracy, with the and Time of Flight (ToF) information, that are unfortunately
localization error outperforming the state of the art approaches. not readily accessible by commercial smartphone applications.
The main contributions of this work are summarized as The machine learning methods were found to be a strong al-
follows: ternative to trilateration, and below we discuss three categories
1) We propose a new semi-supervised method based on of ML methods, going from supervised to semi-supervised and
Variational Auto-Encoder for the WiFi-based indoor unsupervised ones.
localization task. The method is able to train a prediction
model from a small set of annotated WiFi observations B. Supervised methods
(expensive to collect) completed with a massive set of Supervised ML methods for WiFi localization have been
non-annotated WiFi observations (easy to collect). widely studied by S. Xia et al., [17]. We merely men-
2) We test our approach on a public dataset and show its tion [4] where authors successfully use Support Vector Ma-
efficiency in a considerable reduction the data annotation chine (SVM) with RSS measurements in order to classify the
while preserving the state of art localization accuracy. target zone. The experiments in the real world environment
3) We test our deep network under a number of real world in the University of Technology of Troyes (UTT) composed
scenarios, where we show that our approach is able to of 17 rooms and in a building of a hospital and consisting
cope with both a small set of annotated data and missing of 21 rooms spread over three floors showed around 90%
signals from some APs. classification accuracy.
The remainder of this paper is organized as follow. Sec- A large comparison of two main approaches to WiFI based
tion II discusses the related work and state-of-the-art methods indoor localization, based on trilateration and fingerprinting,
in the WiFi-based localization. In Section III we describe the was undertaken in [11]. Te paper concluded that a large
problem and introduce the semi-supervised methods based on amount of geo-labeled WiFi RSS data is required for the
variational autoencoders. Section V presents the deep net- ML based fingerprinting methods to work well. Moreover,
work architecture, discusses a number of alternatives and one the paper raised the issue of reducing the data collection and
important extension. Section VI reports experimental results annotation effort; we address it in the following section.
and ablation analysis conducted on the public UJI-IndoorLoc
dataset. Section VII concludes the paper.
C. Semi-supervised methods
II. R ELATED W ORK The principle of combining unlabeled and labeled data
is well established in ML, and several ML principals were
The recent work of F. Zafari et al. [21] provides a detailed
adopted to reduce the offline WiFi data collection. First,
overview of practical challenges for the indoor localization
mapping a high-dimensional signal strength data into two-
and various systems that have been proposed in the literature
dimensional manifold, using only a small sample of key
between 1997 and 2018. Below we provide some elements of
points with known locations, was proposed in [14]. While
the state of the art relevant to our work.
experimental results in a relatively small environment showed
promising results, the results in an open area proved to be less
A. Trilateration accurate and were not reported.
Trilateration can be used to localize an agent or a device, Another attempt for a faster radio-map construction was
if three or more locations of nearby Access Points are known. recently proposed in [3]. This was achieved by allowing
In reality, however, this information is rarely available and larger distances between successive fingerprints and by using
various methods were proposed to estimate it. Moreover, since adaptive path loss model interpolation to estimate locations

Authorized licensed use limited to: IFSTTAR. Downloaded on December 28,2023 at 20:20:01 UTC from IEEE Xplore. Restrictions apply.
3

of fingerprints in between. The authors report that reducing E. Deep Methods

construction effort by 85% still leads to the same positioning A comprehensive survey of recent research in the mobile
accuracy as is with a complete radio-map. In [20] Y. Yuan et and wireless networking domain, based on Deep Learning
al., introduced an efficient fingerprint training method, using is provided by Zhang et al. [22]. [13] were first to apply
semi-supervised learning, reporting 80% time cost reduction Deep Neural Network (DNN) with autoencoder directly on
while guaranteeing the localization accuracy. WiFi RSS data, for the floor and building classification. The
Ghourcian et al. [5] used Channel State Information (CSI) method was tested on publicly available UJI-IndoorLoc with
with semi-supervised learning in order to train a model for the the reported results to be comparable with the state of the art.
room-level localization. The system does not require a user to Adege et al. [1] combine DNN with the linear discriminate
carry any device, but uses a disturbance of pervasive WiFi analysis (LDA) for signal noise cleaning, in order to localize
signal between a receiver and a transmitter. The idea is to a WiFi user. The method was tested for both classification and
learn the distortion of a CSI to classify in what room the user regression problems.
is located and whether a room is empty or not, thus allowing
to detect an unwanted intrusion. F. Variational Auto-Encoders
Another semi-supervised method for localization of a mov-
A generic framework for semi-supervised learning with
ing smart-phone robot was proposed in [18]. First they obtain
generative models based on Variational Auto-Encoder (VAE)
pseudo-labels for the unlabeled data using Laplacian Embed-
was introduced in [8]. It allows to improve the quality of
ded Regression Least Square. Next, during the learning phase,
predictions for the classification problems in semi-supervised
two decoupled balancing parameters are individually weighted
settings. The performance of this approach was shown to be
to labeled and pseudo-labeled data. The algorithm was tested
on par with state-of-the-art in various application domains.
in a small and controlled environment in area of 3m x 4m with
In a recent study [10] authors combined VAEs with Deep
9 APs. Authors report that the algorithm estimated location
Reinforcement Learning (DRL) framework for various ”smart
more accurately with fewer labeled data in comparison with
city” scenarios. As a use case, the method was benchmarked
state-of-art semi-supervised algorithms.
on indoor localization task based on Bluetooth signals. The
results were 23% better than a traditional supervised DRL.

D. Unsupervised methods III. P ROBLEM S TATEMENT

In the WiFi localization problem, we have multivariate
Reducing the annotation effort to zero is addressed by data representing WiFi received signal strength (RSS) values
totally unsupervised methods. One idea is based on WiFi from the APs. Let xi,j denote i-th measurement value of j-
fingerprint crowd-sourcing in order to construct radio-map th AP. Furthermore, let xi denote a vector of multivariate
is proposed by Yu et al. [12]. Instead of annotating WiFi measurement such that xi = [xi,1 , . . . , xi,m ] where m is the
data, they first use known door locations as anchor points. number of WiFi APs. The measurement matrix is denoted
Then they crowd-source WiFi information along with smart- X = {x1 , . . . , xn }, where xi ∈ Rm and n is the number
phone sensors measurements (accelerometer, gyroscope and of observed measurements.
compass) to automatically construct a radio map at the anchor The RSS data can be labeled by placing the device at
points. The method was tested in an office environment, where known locations. We denote the set of labels (ground truth)
the room doors are frequent and weighted kNN method was by Y = {y1 , . . . , yn } where yi can be categorical variable,
used on the automatically built radio-map. Median accuracy such as the building number, a room identifier, floor level, etc.
of less than 3m was reported. At the same time, the proposed or the absolute coordinates (Lati , Loni , Eli ) ∈ R+ 3
giving the
method was not tested when the mentioned conditions are not latitude, longitude and elevation coordinates of a point. In the
satisfied, such as an environment with long hallways and a first case, we face the classification problem, in the second
few doors. case we work in the regression setting. Finally DT denotes
Another attempt of building radio-map without any labeling the full data set, DT = {(xi , yi )}, i = 1, . . . , n.
efforts is suggested in [7]. Authors build a WiFi radio-map A general approach is to learn the best predictor from the
from scratch, using only unlabeled crowd-sourced fingerprints. training data set DT . The challenging problem is to find
In order to estimate the locations of the unlabeled fingerprints, an appropriate transformation of the raw measurement x to
the method arranges the fingerprint sequences to fit into the efficiently learn a reliable predictor for y. Such a problem
inner structure of a building, which assumed to be known can be described by finding a mapping F to a latent variable
in advance. While this approach showed good results in the z ∈ Rk , such that F : x → z, k < m. Then the optimal
controlled experimental setup, such as an office building; their predictor can be defined by a function of z with parameter θ
behaviour in a different setup, with a different structure of a denoted by fθ (z) such that fθ (z) = argmaxy p(y|z, θ).
building [15] is hard to predict. We pay a particular attention to a setting where only a small
Both unsupervised methods presented above make a number fraction of the samples has label (ground truth) information.
of important assumptions; in practice, they are likely to Let DTL and DTU denote the part of the data set with labels and
be combined with other methods to reach a good level of the part without the labels, respectively. Our primary goal is to
robustness. develop deep learning-based approach to predict the location

Authorized licensed use limited to: IFSTTAR. Downloaded on December 28,2023 at 20:20:01 UTC from IEEE Xplore. Restrictions apply.
4

where |DTL | |DTU |. In such a semi-supervised setting, we where KL is the Kullback-Leibler divergence P function be-
assume that DTL comes from exactly the same distribution of tween the two distributions q and p, KL(q||p) = i qi log pqii .
DTU . Our semi-supervised approach fully exploits DTU to learn For the M2 model, the latent space include variables z and
the non-linear mapping function F to get a robust feature y and two cases should be considered. The first one deals with
representation of x for a robust predictor fθ . the subset of labeled data (x, y):

IV. VARIATIONAL AUTOENCODER log pθ (x, y) ≥ Eqφ(z|x,y) [log pθ (x|y, z)] + log pθ (y)
+ log pθ (z) − log qφ (z|x, y)] = −L(x, y).
For the semi-supervised setting presented above, we adopt (6)
two deep models, M1 and M2, based on variational au- The second case deals with unlabeled data x, where y is
toencoders (VAE) [8]. Each data point xi is mapped into a treated as a latent variable and the resulting lower bound is:
latent variable vector z. The distribution of labeled data is
represented by p̃l (x, y), while unlabeled data are represented log pθ (x) ≥ Eqφ (y,z|x) [log pθ (x|y, z)] + log pθ (y)
by p̃u (x). + log pθ (z) − log qφ (y, z|x)] = −U(x).
The first model, denoted M1, is the latent feature discrim- (7)
inative model, it is created using two parametric distributions Then the whole dataset has its bound of marginal likelihood
as a sum over labeled and unlabeled data:
p(z) = N (z|0, I); pθ (x|z) = f1 (x; z, θ), (1) X X
where p(z) is a Gaussian distribution with the mean vector J (x, y) = L(x, y) + U(x). (8)
(x,y)∼p̃l (x,y) x∼p̃u (x)
0 and the variance identity matrix I. The function f1 (x; z, θ)
is a nonlinear likelihood function with parameter θ for latent If y is a categorical variable, a classification loss on labeled
variables z based on a deep neural network. data (x, y) is added to the above function, and the optimized
The second model, denoted M2, is the generative semi- objective function becomes:
supervised model for generating data using a latent class
variable y in addition to latent variables z; it is defined as J α = J (x, y) + α Ep̃l (x,y) [− log qφ (y|x)], (9)
follows:
where α is a trade-off of the contributions of the generative
p(y) = Cat(y|π); p(z) = N (z|0, I);
(2) and discriminative models in the learning process. During the
pθ (x|y, z) = f2 (x; y, z, θ),
training process for both M1 and M2 models, the stochastic
where Cat(y|π) is a categorical or multinomial distribution gradient of J is computed at each mini-batch to be used
with a vector of probabilities π whose elements sum up to 1. for updating the generative parameters θ and the variational
if no label is available in the dataset, the unknown labels y parameters φ.
are considered as latent variables in addition to z. If y is multivariate data (on the location coordinates), the
It was shown that both M1 and M2 models have the lower classification loss in (9) is replaced with the regression loss,
bound objectives [8]. To formulate the model objectives, a which in our case is the Mean Square Error quadratic loss,
fixed form distribution qφ (z|x) is introduced with parameter Ep̃l (x,y) |y − qφ (y|x)|2 .
φ aimed to estimate the posterior distribution p(z|x). For
all latent variables in the models, an inference deep neural
network is introduced to generate a distribution of the form V. N ETWORK ARCHITECTURE
qφ (·). The architecture of the VAE-based localization system is
For M1 model, qφ (z|x) is defined as a Gaussian network presented in Figure 1. It includes the encoder, decoder and
for inferring latent variables z: classifier; the latent variables z are represented by their mean
qφ (z|x) = N (z|µφ (x), diag(σφ2 (x))), (3) and variance σ values.

where µφ (x) is the vector of means, σφ (x) is the vector of

standard deviations, and diag is a diagonal matrix.
For M2 model, an inference network is used for latent vari-
ables z and y using Gaussian and multinomial distributions,
respectively:
qφ (z|y, x) = N (z|µφ (y, x), diag(σφ2 (x))),
(4)
qφ (y|x) = Cat(y|πφ (x)),
where πφ (x) is a vector of probabilities parameterized by φ.
The lower bound for M1 is obtained as the Kullback-Leibler
divergence between the encoding qφ and the prior distribution
pθ :
log pθ (x) ≥ Eqφ (z|x) [log pθ (x|z)] − KL[qφ (z|x)||pθ (z)]
= −J (x), Fig. 1. VAE-based architecture for WiFi-based localization.
(5)

Authorized licensed use limited to: IFSTTAR. Downloaded on December 28,2023 at 20:20:01 UTC from IEEE Xplore. Restrictions apply.
5

A. Training details
The semi-supervised VAE network is implemented using
TensorFlow Library 1.13 with CUDA 9.1. Applied to the UJI-
IndoorLoc dataset [16], the encoder takes the input of 520
WiFi SSIs, all values being normalized. The encoder, decoder
and classifier are all implemented as three fully connected
layers with the ReLu activation, of 512, 256 and |z|=128 layer
sizes. In the loss function (9), the classification/regression loss
is weighted with α = 1.
For all models presented in the previous section, we train the
network using stochastic gradient descent (Adam optimizer)
with the initial learning rate of 0.01, the batch size of 64, the Fig. 2. Example of structured projection for Nr =2; a) projection method, b)
batch normalization, and a stopping criteria of the learning at Prediction projections on UJI Building 1.
20,000 iterations. Dropout with probability of 0.5 is applied
to all layers in the network. The localization error on the test
set is used as the evaluation metric. The average results over assume any map and count only on the training set for the
5 runs are reported. possible corrections.
Initially we considered several alternative compositions of The method which turns to be robust in the semi-supervised
the encoder-decoder pairs. In the first composition, we train setting, is based on the weighted neighbourhood projection.
a unique encoder-decoder network on the coordinate triples For each location prediction, we consider top Nr neighbors
(Lat, Lon, El). Unfortunately, training one unique encoder- in the available annotated set. The projection is given by the
decoder appears to be a poor solution, as the network weights weighted sum of the neighbors; these weights are calculated
can not catch complex dependencies among the coordinates. as an inverse of distances between the prediction and corre-
In the second composition, we train independently three sponding neighbours. This projection belongs to a convex hull
encoder-decoders on Lat, Lon and El coordinates, where defined by the Nr neighbours. Figure 2 shows an example of
the encoder outputs are concatenated in the input to the the structure projection where location predictions (in red) are
latent variables. Therefore, we train three encoders and three projected to the convex hull (here, segments) of the Nr = 2
decoders, one encoder-decoder pair for each of the three closest annotated neighbours (in blue).
coordinates (see Figure 1). The structural projection works well when all neighbours
Finally, as the third composition, we considered a two- are topologically close and the convex hull they define is
step pipeline specific to the multi-building nature of the a part of the feasibility space. However, if the neighbours
UJI-IndoorLoc dataset. Three regression models have been are topologically distant (for example, located in different
separately trained for three UJI buildings. Which of the three buildings), the error caused by the projection can increase.
models is used, is decided in the first step by the building To minimize the risk of error, we consider rather small values
classifier, trained with the building identifiers also available of Nr . For the UJI case, we found empirically that Nr =2 gives
in the dataset. All three per-building regression models are the best accuracy results.
more accurate that a unique regression model. Unfortunately,
the perfect building classification can not be archived. Even
VI. E VALUATION
a small building classification error of 0.43% undermines the
advantage of more accurate per-building models; it makes the We run a series of experiments in order to evaluate the
total error higher than in the second composition. performance of the VAE-based localization network and to
In the following, we consider the second network composi- compare it to the state of art semi-supervised methods. In all
tion, with three independent encoder-decoders. experiments, we test the methods on a public UJI-IndoorLoc
dataset used in the IPIN’16 Indoor Localization challenge [16].
The UJI-IndoorLoc dataset covers a surface of 108,703 m2 ,
B. Structured projection in 3 buildings of the University Jaume I Campus, Castelló,
In this section, we introduce an additional component which Spain. The buildings have 4 or 5 floors, see Figure 3 for
turns to be critical in the regression based localization. All the buildings colored in red, blue and green. 933 different
conventional regression methods often ignore the structure places (reference points) appeared in the dataset. In total,
of the output variables and therefore face the problem of 21,049 sample points have been collected, where 19,938 points
predictions outside the target space. Indeed, in the UJI case, were obtained for training and 1,111 were obtained for testing.
a number of predictions fail to fit the indoor building space. Testing samples were taken 4 months after the training samples
We therefore look for methods of a structured regression [18] to ensure dataset independence. 520 different Access Points
which guarantees that predictions fit the feasibility space. A appeared in the database. Data were collected by 20 users
naive solution assumes an access to an accurate location map; with 25 different mobile device models. Figure 4.left shows
then any location prediction is first tested for being inside the how 933 APs are distributed all over the three UJI campus
feasibility space, and a correction is required if the test fails. buildings. Radio Signal Strength (RSS) values from APs vary
For the fair comparison to the state of art methods, we do not between -110 (no signal) and -21 decibels (a very strong

Authorized licensed use limited to: IFSTTAR. Downloaded on December 28,2023 at 20:20:01 UTC from IEEE Xplore. Restrictions apply.
6

Fig. 5. Localization error under the annotation reduction.

C ∈ {1, 10, 102 , 103 }, γ ∈ {10−2 , . . . , 102 } for SVR,

and N ∈ {1, 2, 3, 5, 10} for kNN. The later is optionally
combined with the denoising autoencoder (AU) applied
to the signal strength values.
2) VAE: M1 and M2 models adopted for the independent
coordinate regressions as described in Section IV.
3) VAE-SP: M1 and M2 methods, extended with the struc-
Fig. 3. UJI Jaume I University campus buildings included in the indoor tured projection (SP) as described in Section V-B.
localization dataset.
Table I reports the localization error for all the methods on
the UJI-IndoorLoc dataset, using the public train/test split. In
this fully supervised setting, all methods report a comparable
performance, with the localization error being inferior to 6
m. The best performance of 4.65 m is achieved by VAE-M2
method complemented with the structured projection.

B. Semi-supervised learning
Beyond the fully supervised learning, we are interested in
testing the methods under a more realistic condition where
locations are only available for a small set of WiFi observa-
Fig. 4. left) AP distribution over three UJI buildings; right) RSS value tions. This corresponds to the typical scenario when WiFi data
histogram. is collected by navigating through the working space, with a
few observations annotated at the known locations.
We obtain such a semi-supervised setting by random sam-
signal); Figure 4.right plots the histogram of the RSS values pling from the UJI-IndoorLoc training set, where the sampling
in the dataset. ratio varies from 0.1% to 100%, the remaining part is the
The Euclidean distance between the estimated and the actual considered as non-annotated. Thus we can test the resistance
coordinates of the test point is defined as the localization error. of all methods to the shortage of annotated observations. We
The mean of the localization errors in the test set is reported report the average values over 5 sampling experiments.
as the performance measure of a method. Figure 5 shows the effect of low annotation ratio on the state
of the art and our methods. As the figure shows, all methods
A. Evaluation results resist reasonably well to a modest annotation reduction, when
We test our methods and compare them to the state of art 50% or more data are annotated. However when the reduction
semi-supervised methods that proved their efficiency [5], [18]– becomes very important, the VAE-based methods resist better
[20]: than other methods.
1) SVR and kNN: as the baseline we consider the semi- Figure 6 gives more details on the impact of annotation
supervised Support Vector Regression (SVR) [2] and reduction. It reports the cumulative distribution function (CDF)
the k-nearest neighbour regression (kNN). The opti- curbs for the VAE-M2-SP method when the ratio of labeled
mal hyper-parameters were determined through a 5- data vary from 100% to 0.1%. As the figure shows, the
fold inner cross-validation. Specifically, we searched method resists well to the reduction which can be as low

Authorized licensed use limited to: IFSTTAR. Downloaded on December 28,2023 at 20:20:01 UTC from IEEE Xplore. Restrictions apply.
7

Method SVR kNN AU-kNN VAE-M1 VAE-M2 VAE-M1-SP VAE-M2-SP

RMSE (m) 5.41 5.98 5.47 5.32 5.21 4.73 4.65
TABLE I
C OMPARISON TO THE STATE OF ART METHODS IN THE SUPERVISED SETTING .

Fig. 7. Localization error under both annotation and AP reduction.

Fig. 6. CDF for different values of annotated observations.

as 5%. Unfortunately, for all cases we observe the long tail

phenomenon, where the probability of error over 20 meters is
superior to 0.01.
Annotation ratio as density. The percentage of available
annotated data can be an indicator of the density when
collecting the ground truth. In the UJI dataset, 130K m2 a)
have been covered with 19K measurements and 993 reference
points, that makes 1 point covering on average 116 m2 . Note
that the dataset includes on average about 21 measurements per
one reference point; therefore any measurement has 0-distance
to 20 other ones. When working in semi-supervised setting,
using a small part of annotated data changes the distances
considerably. For example, 1% of annotated data corresponds
to an average distance of 4.02 m to the closest annotated
measurement. b)
Fig. 8. TSN-E projection before and after VAE colored for a) Floor b)
C. Absent Access Points Building.
It is common in the real world applications that some WiFi
access points stop emitting, due to contractual, technical or
legal issues. We therefore run experiments that simulate the for 100% and 6.43 m for 5% of annotated observations. This
absence of some APs at test time. We consider the ratio is comparable with errors of 4.65 m and 6.12 m for 100%
of available WiFi points varying from 1.0, when all WiFi available APs, respectively.
points are available at test time, to 0.5 when only the half of
APs is available. We consider the re-training scenario where
the model is retrained in the function of available APs. We D. VAE-based Reconstruction
proceed by retaining those APs from the training set which We complete our analysis by the VAE reconstruction ca-
are available at test time. pabilities. Figure 8 shows the TSN-E projection of WiFi
We test the VAE-based deep network under the AP reduc- observations, before and after the reconstruction with the
tion coupled with the annotation reduction considered in the VAE. Reconstructed data shows a better separability than the
previous section. Figure 7 is a 2D plot reporting the localiza- original data, thus making the localization tasks (classification
tion error of the VAE-M2-SP method under both annotation for floor/building or regression for Lat, Lon, El coordinates)
and AP reduction. The figure shows that the method resists easier. The figure demonstrates the benefit of VAE recon-
well to the absence of multiple APs at the test time. For the struction on the examples of Floor and Building attributes
50 percent of available APs, the localization error is 5.12 m provided in the dataset. By coloring the VAE projection with

Authorized licensed use limited to: IFSTTAR. Downloaded on December 28,2023 at 20:20:01 UTC from IEEE Xplore. Restrictions apply.
8

different attributes, a better separability can be seen for the [10] M. Mohammadi, A. Al-Fuqaha, M. Guizani, and J.-S. Oh. Semi-
Floor attribute in Figure 8.a and for the Building attribute in supervised deep reinforcement learning in support of IoT and smart
city services. IEEE Internet of Things Journal, 5(2):624–635.
Figure 8.b. [11] E. Mok and G. Retscher. Location determination using WiFi finger-
Conclusions. The conducted experiments show that the printing versus WiFi trilateration. Journal of Location Based Services,
VAE-based methods are well suited for WiFi based localiza- 1(2):145–159, 2007.
[12] Y. Ning, X. Chenxian, W. Yinfeng, and F. Renjian. A radio-map
tion in the semi-supervised setting. Properly re-designed for automatic construction algorithm based on crowdsourcing. Sensors,
a regression task and extended with the structured projection, 16(4), 2016.
the generative model VAE-M2 can use as little as 1-5% of [13] M. Nowicki and J. Wietrzykowski. Low-effort place recognition with
wifi fingerprints using deep learning. pages 575–584, 2017.
annotated data without penalizing the localization accuracy. [14] T. Pulkkinen, T. Roos, and P. Myllymäki. Semi-supervised learning for
Ablation studies also show that the methods are able to WLAN positioning. In Proc. ICANN, pages 355–362, 2011.
generate latent representation of WiFi data useful for better [15] B.-c. M. Suk-hoon Jung and D. Han. Unsupervised learning for crowd-
sourced indoor localization in wireless networks. IEEE Transactions on
understanding, analysis and visualization. The analysis shows Mobile Computing, 15(11):2892–2906, 2016.
that that further reduction of the labeled part, under 1%, lead [16] J. Torres-Sospedra, R. Montoliu, A. Martı́nez-Usó, J. P. Avariento, T. J.
to an important performance drop, but less important that the Arnau, M. Benedito-Bordonau, and J. Huerta. UJIIndoorLoc: A new
multi-building and multi-floor database for WLAN fingerprint-based
state of art methods. indoor localization problems. In International Conference on Indoor
Future work. In the previous section, we tested the VAE- Positioning and Indoor Navigation (IPIN), pages 261–270.
based methods under different scenarios, including small WiFi [17] S. Xia, Y. Liu, G. Yuan, M. Zhu, and Z. Wang. Indoor fingerprint
positioning based on wi-fi: An overview. ISPRS Int. J. Geo-Information,
annotated dataset and the absence of AP signals at test time. In 6(5):135, 2017.
the future, we intend to study other cases of mismatch between [18] J. Yoo and K. H. Johansson. Semi-supervised learning for mobile robot
the train and test data, in particular the appearance of new localization using wireless signal strengths. In International Conference
on Indoor Positioning and Indoor Navigation (IPIN), pages 1–8, 2017.
AP at test time. Another interesting direction is a continuous [19] A. S. Yoon, T. Lee, Y. Lim, D. Jung, P. Kang, D. Kim, K. Park, and
localization (i.e. tracking) in semi-supervised settings. Y. Choi. Semi-supervised learning with deep generative models for asset
failure prediction. CoRR, abs/1709.00845, 2017.
VII. C ONCLUSION [20] Y. Yuan, L. Pei, C. Xu, Q. Liu, and T. Gu. Efficient WiFi fingerprint
training using semi-supervised learning. In Proc. UPINLBS, pages 148–
We presented a semi-supervised VAE-based method for 155, 2014.
WiFi indoor localization able to make accurate predictions [21] F. Zafari and A. G. andKin K. Leung. A survey of indoor localization
systems and technologies. CoRR, abs/1709.01015, 2017.
using a model learned from a small set of annotated WiFi [22] C. Zhang, P. Patras, and H. Haddadi. Deep learning in mobile and
observations and a massive set of non-annotated observations. wireless networking: A survey. CoRR, abs/1803.04311, 2018.
We show that the proposed method allows to achieve a state of
the art performance with a reduced amount of annotations. We
complement the architecture with an additional component of
structural projection able to further improve the localization
accuracy in locations with a complex topology, like multi-
building and multi-floor setting. We evaluate our method on
the public UJI-IndoorLoc dataset.

R EFERENCES
[1] A. Belay Adege, H.-P. Lin, G. Berie Tarekegn, and S.-S. Jeng. Applying
deep neural network (dnn) for robust indoor localization in multi-
building environment. Applied Sciences, 8:1062, 06 2018.
[2] K. P. Bennett and A. Demiriz. Semi-supervised support vector machines.
In Proc. AAAI Conf. on Innovative Applications (IAAA), pages 4670–
4677, 2017.
[3] J. Bi, Y. Wang, Z. Li, S. Xu, J. Zhou, M. Sun, and M. Si. Fast radio
map construction by using adaptive path loss model interpolation in
large-scale building. Sensors, 19(3):712, 2019.
[4] A. Chriki, H. Touati, and H. Snoussi. Svm-based indoor localization in
wireless sensor networks. In Proc. IWCMC, pages 1144–1149, 06 2017.
[5] N. Ghourchian, M. Allegue-Martı́nez, and D. Precup. Real-time indoor
localization in smart homes using semi-supervised learning. In Proc.
AAAI Conf. on AI in Innovation, pages 4670–4677, 2017.
[6] D. Han, D. G. Andersen, M. Kaminsky, K. Papagiannaki, and S. Seshan.
Access point localization using local signal strength gradient. In S. B.
Moon, R. Teixeira, and S. Uhlig, editors, Passive and Active Network
Measurement, volume 5448, pages 99–108. Springer Berlin Heidelberg.
[7] S. H. Jung and D. Han. Automated construction and maintenance of
wi-fi radio maps for crowdsourcing-based indoor positioning systems.
IEEE Access, 6:1764–1777, 2018.
[8] D. P. Kingma, S. Mohamed, D. J. Rezende, and M. Welling. Semi-
supervised learning with deep generative models. In Proc. NIPS, pages
3581–3589, 2014.
[9] A. T. Mariakakis, S. Sen, J. Lee, and K.-H. Kim. SAIL: Single access
point-based indoor localization. In Proceedings of the 12th Annual
International Conference on Mobile Systems, Applications, and Services,
MobiSys ’14, pages 315–328. ACM.