Predicting Average Localization Errorof Underwater Wireless Sensorsvia Decision Tree Regressionand Gradient Boosted Regression
Predicting Average Localization Errorof Underwater Wireless Sensorsvia Decision Tree Regressionand Gradient Boosted Regression
1 Introduction
More than 80% of the ocean has yet to be discovered. We deploy a set of sensors
in underwater to investigate uncharted territory. We need to know the exact
location of this set of sensors when we acquire data from sensors to analyze it. It is
easier to compute the location in the terrestrial area. However, it is difficult to
calculate the exact location of sensors in underwater due to specific characteristics
of the Underwater Wireless Sensor Network. As a result, every localization
technique determines the location with some error. 'Prediction of Average
Localization Error in WSNs' dataset provides the average localization error for
different features of the Underwater Wireless Sensor Network. Different SVM
(Support Vector Machine) algorithms were used to precisely predict average
localization error [1]. This prediction, though, can be more precise. It had been
feasible to more precisely anticipate the average localization error in this paper.
Different strategies have been used to minimize the Root Mean Square Error.
2 Related Work
In this section, several techniques have been discussed. Through these techniques,
attempts have been made to improve localization accuracy. A number of location
schemes have been proposed to determine the localization of sensors. The
localization schemes can be broadly categorized into two categories. One is range-
based and another is range-free [2]. Range-free schemes do not use any distance or
angle. The centroid scheme Du-Hop and density aware Hop-count localization is
used in range-free schemes. In range-based schemes, accurate distance or angle
measurement is needed to estimate the location of sensors. There are different
types of range-based schemes. In [3] problem of node localization due to a large
number of parameters and the non-linear relationship between the measurements
and the parameters are estimated and they proposed a Bayesian algorithm for node
localization in Underwater Wireless Sensor Network. They referred to the
algorithm as an existing importance sampling method that is referred to as an
incremental correlation. In [4] they proposed two localization techniques: Neural
Fuzzy Interference System (ANFIS) and Artificial Neural Network (ANN).
Artificial Neural Network (ANN) was hybridized with Particle Swarm
Optimization (PSO), Gravitational Search Algorithm (GSA), and Backtracking
Search Algorithm (BSA). And in indoor and outdoor, the hybrid GSA-ANN
performs a mean absolute distance estimator error with 0.02 m and 0.2m
respectively. Another common technique is the Received Signal Strength indicator
(RSS). In [5] they conducted a machine learning technique survey for localization
in WSNs using Received Signal Strength Indicator. Decision Tree, Support Vector
Machine (SVM), Artificial Neural Network, Bayesian Node Localization were
used in this paper. In [6] they discussed the maximum likelihood (ML) estimator
for localization of mobile nodes. After that, they optimized the estimator for
ranging measurements exploiting the Received Signal Strength. Then they
investigated the performance of the derived estimator in Monte-Carlo simulations
and they compared it with the simple Least Squares (LS) method and exploited
RSS (Received Signal Strength) fingerprint. Another technique for improving
Received Signal Strength-based localization is discussed in [7]. In this paper, they
proposed the use of weighted multiliterate techniques to acquire robustness with
respect to inaccuracy. Techniques are standard hyperbolic and circular positioning
3 algorithms. In [8] they preferred range-based algorithm over range-free
algorithm. They proposed Bayesian formulation of the ranging problem alternative
to inverting the path-loss formula and reduced the gap with the more complex
range-free method. In [9] they also preferred range-based algorithm over range-
free algorithm. They proposed two step algorithms with reduced complexity
where in first phase, they exploited nodes to estimate the unknown RSS and TOA
(Time of Arrival) model parameters and in second phase, they combined a hybrid
TOA/ RSS range estimator with an iterative least square procedure to get
unknown position. A localization scheme has been established based on RSS to
determine the location of an unknown sensor from a set of anchor nodes [10].
Apart from RSS techniques there are also some other techniques to determine the
localization of sensors. A mathematical model has been proposed in [11] where
one beacon node and at least three static sensors are needed. One beacon node
from six different positions can determine the localization of static sensors by
using Cay-ley Menger determinant, but sensors plane need to be parallel to the
water surface. For non-parallel situation they are updated their propose model in
[12]. In another paper [13] they also again updated the [11] mathematical model to
determine the localization of mobile sensors. Further, in [14], a new mathematical
model has been developed to determine the location of a single mobile sensor
using the sensor's mobility. Another technique named as IF-Ensemble has been
proposed in [15] for Wi-Fi indoor localization environment by analyzing RSSs.
Another technique in [16] has been proposed to node localization and that is
Kernel Extreme Learning Machine based on Hop-count Quantization.
3 Methdology
The aim of this research paper is to analyze average localization errors from
conducting different machine learning algorithms that can predict precisely and
compare the outputs with the previous best output. In our study, we used
secondary quantitative data that was acquired by others and Modified Cuckoo
search simulations were used to generate this dataset. We had not undertaken
any studies to change variables because the dataset was already observational.
The dataset comprises four valuable attributes that are used to directly achieve
the research's main goal. All four features provide generalizable knowledge to
validate the research goal just as they do in quantitative research. Fig. 1 depicts
the workflow of the proposed methodology.
Fig. 1. Workflow of Proposed Methodology
The dataset utilized in this paper is "Average Localization Error (ALE) in sensor
node localization process in WSNs [17]." We have used the entire dataset with a
total of 107 instances and six attributes, where all attributes represent quantitative
data. The dataset contains no missing values as it was already observational data.
In order to gain Average Localization Error (ALE), only four variables (Anchor
ratio, Transmission range, Node density, and Iterations) have been used as input
variables, and an average localization error has been used as an output variable.
Another attribute (standard deviation value) was ignored in the pre-processing step
as our study only focused on generating localization errors.
The number of anchor nodes based on total number of sensors in the network is
known as the anchor ratio (AR), which is the first column of this dataset contains
numeric values. In the dataset, transmission range also contains numeric values
which is measured in meters, which represent the transmission range of a sensor to
measure the transmission speed. The node density attribute indicates how densely
the activity nodes are connected. The 4th column represents the iteration, which
means how many times we took the reading of sensors. The disparity between the
actual and projected coordinates of unknown nodes is known as an average locali-
zation error.
We have summarized our entire dataset using descriptive statistics. Descriptive
statistics provide information on the total count, mean, median, and mode, includ-
ing standard deviation, variance, and minimum and maximum values.
Y = output feature
The range of Pearson’s Correlation Coefficient,
Table 2. Pearson’s Correlatıon Coeffıcıent between Input Varıable and Output Varıable
Iterations -0.400394
We can see that the highest correlation coefficient exists between node density and
average localization error. The average localization error and iterations have the
second highest correlation coefficient. The average localization error has the
lowest correlation coefficients between anchor ratio and transmission range.
3.3 Machine Learning Model
The aim of this research is to predict as close as possible to the average
localization error for anchor ratio, transmission range, node density, and iterations.
Machine learning algorithms are divided into three categories: supervised
learning, unsupervised learning, and semi-supervised learning.There are two types
of supervised machine learning models for predicting something: Classification
and Regression. Regression is used when we want to predict a continuous
dependent variable from several dependent variables. The "average localization
error" variable is a continuous dependent variable, much like in our dataset. Two
regression models have been used to forecast the average localization error.
One of the most often used supervised learning techniques is the Decision Tree.
Decision tree is a collection of trees that can be used for both Classification and
Regression. It's a tree-structured hierarchical classifier with three sorts of nodes:
root, inner, and leaf. The complete sample is represented by root nodes, which can
be further divided into sub-nodes. Interior nodes are decision nodes that carry the
attributes of decision rules. Decision nodes have several branches, which are
called leaf nodes, and the outcome is represented by leaf nodes 1. A decision tree
divides each node into subsets to classify data. It travels over the entire tree, takes
into account all attributes, and estimates the average of the dependent variable
values from the multiple leaf nodes to produce the best results prediction. In
Decision Tree Regression, we use variance to separate variables.
𝑁
1
𝑉𝑎𝑟𝑖𝑒𝑛𝑐𝑒 = ∑(𝑦𝑖 − µ)2
𝑁
𝑖=1
𝑊ℎ𝑒𝑟𝑒, 𝑦𝑖 = 𝐿𝑎𝑏𝑒𝑙 𝑓𝑜𝑟 𝑎𝑛 𝑖𝑛𝑠𝑡𝑎𝑛𝑐𝑒
𝑁 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑛𝑠𝑡𝑎𝑛𝑐𝑒
𝑁
1
µ = 𝑀𝑒𝑎𝑛, 𝑔𝑖𝑣𝑒𝑛 𝑏𝑦 ∑ 𝑦𝑖
𝑁
𝑖=1
∑𝑛𝑖=1(𝑦̂𝑖 − 𝑦𝑖 )2
𝑅𝑀𝑆𝐸 = √
𝑛
𝑊ℎ𝑒𝑟𝑒, 𝑅𝑀𝑆𝐸 = 𝑅𝑜𝑜𝑡 𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝐸𝑟𝑟𝑜𝑟
𝑦𝑖 = 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑉𝑎𝑙𝑢𝑒
𝑦̂𝑖 = 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑉𝑎𝑙𝑢𝑒
𝑛 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4.4 Hyperparameter Tuning
The Decision Tree Regression and Gradient Boosted Regression have some hyper
parameters like maximum-depth, minimum-samples-split, minimum-sample-leaf,
minimum-weight-fraction-leaf, and maximum-leaf-nodes that can be tuned to get
better results. The most powerful hyper parameter is maximum-depth, which by
regularizing can solve the overfitting and underfitting of Decision Tree Regression
and Gradient Boosted Regression. In this research, the maximum depth from 1 to
5 has been regularized and then the output has been distinguished.
RMSE (m)
0.234 0.20 0.14 0.235 0.241 0.189 0.164 0.140 0.143 0.137 0.139
0 7 3 1 2 7 9 7 9 3
In previous research, the best output came from Range-Standardization SVR with
four input variables where Root Mean Square Error was 0.147m. And in this re-
search, the best output has come from Gradient Boosted Regression with the last
two variables, where Root Mean Square Error is 0.1379m with a maximum depth
of 1, which is better than in previous research. This best output has come out only
for two input variables (Node density and Iterations) and for this, further research
will be more convenient.
5 Conclution
References
1. Singh, A., Kotiyal, V., Sharma, S., Nagar, J. & Lee, C. C. A Machine Learning
Approach to Predict the Average Localization Error with Applications to Wireless
Sensor Networks. IEEE Access 8, 208253–208263 (2020).
2. Chandrasekhar, V., Seah, W. K., Choo, Y. S. & Ee, V. Localization in Underwater
Sensor Networks-Survey and Challenges. in 1st ACM international workshop on
Underwater networks 33–40 (2006). doi:10.1145/1161039.1161047.
3. Morelande, M. R., Moran, B. & Brazil, M. BAYESIAN NODE LOCALISATION
IN WIRELESS SENSOR NETWORKS. in 2008 IEEE International Conference
on Acoustics, Speech and Signal Processing (2008).
doi:10.1109/ICASSP.2008.4518167.
4. Gharghan, S. K., Nordin, R. & Ismail, M. A wireless sensor network with soft
computing localization techniques for track cycling applications. Sensors
(Switzerland) 16, (2016).
5. Ahmadi, H. & Bouallegue, R. Exploiting machine learning strategies and RSSI for
localization in wireless sensor networks: A survey. in 2017 13th International
Wireless Communications and Mobile Computing Conference (IWCMC) (2017).
doi:10.1109/IWCMC.2017.7986447.
6. Waadt, A. E. , Kocks, C., Wang, S., Bruck, G. H. , & Jung, P. Maximum
Likelihood Localization Estimation based on Received Signal Strength. in 2010
3rd International Symposium on Applied Sciences in Biomedical and
Communication Technologies. (2010). doi:10.1109/ISABEL.2010.5702817.
7. Tarrío, P., Bernardos, A. M. & Casar, J. R. Weighted least squares techniques for
improved received signal strength based localization. Sensors 11, 8569–8592
(2011).
8. Coluccia, A. & Ricciato, F. RSS-Based localization via bayesian ranging and
iterative least squares positioning. IEEE Communications Letters 18, 873–876
(2014).
9. Coluccia, A. & Fascista, A. Hybrid TOA/RSS range-based localization with self-
calibration in asynchronous wireless networks. Journal of Sensor and Actuator
Networks 8, (2019).
10. Nguyen, T. L. N. & Shin, Y. An efficient rss localization for underwater wireless
sensor networks. Sensors (Switzerland) 19, (2019).
11. Rahman, A., Muthukkumarasamy, V. & Sithirasenan, E. Coordinates
determination of submerged sensors using cayley-menger determinant. in
Proceedings - IEEE International Conference on Distributed Computing in Sensor
Systems, DCoSS 2013 466–471 (2013). doi:10.1109/DCOSS.2013.62.
12. Rahman, A. & Muthukkumarasamy, V. Localization of Submerged Sensors with a
Single Beacon for Non-Parallel Planes State. in 2018 Tenth International
Conference on Ubiquitous and Future Networks (ICUFN) (IEEE, 2018).
doi:10.1109/ICUFN.2018.8437041.
13. Rahman, Md. M., Tanim, K. M. & Nisher, S. A. Coordinates Determination of
Submerged Mobile Sensors for Non parallel State using Cayley-Menger
Determinant. in 2021 International Conference on Information and
Communication Technology for Sustainable Development (ICICT4SD) 25–30
(IEEE, 2021). doi:10.1109/ICICT4SD50815.2021.9396837.
14. Rahman, M. M. Coordinates Determination of Submerged Single Mobile Sensor
Using Sensor’s Mobility. in 2021 International Conference on Electronics,
Communications and Information Technology (ICECIT), 14–16 September 2021,
Khulna, Bangladesh (2021). doi:10.1109/ICECIT54077.2021.9641096.
15. Bhatti, M. A. et al. Outlier detection in indoor localization and Internet of Things
(IoT) using machine learning. Journal of Communications and Networks 22, 236–
243 (2020).
16. Wang, L., Er, M. J. & Zhang, S. A Kernel Extreme Learning Machines Algorithm
for Node Localization in Wireless Sensor Networks. IEEE Communications
Letters 24, 1433–1436 (2020).
17. Singh, A., Average Localization Error (ALE) in sensor node localization process
in WSNs Data Set. UCL Machine Learning Repository, viewed 31 December
2022, < https://archive.ics.uci.edu/ml/>(2021)