0% found this document useful (0 votes)
7 views

Predicting Average Localization Errorof Underwater Wireless Sensorsvia Decision Tree Regressionand Gradient Boosted Regression

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Predicting Average Localization Errorof Underwater Wireless Sensorsvia Decision Tree Regressionand Gradient Boosted Regression

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Predicting Average Localization Error of Underwater

Wireless Sensors via Decision Tree Regression and


Gradient Boosted Regression

Md. Mostafizur Rahman1, Sumiya Akter Nisher2

Department of Computer Science and Engineering


East West University, Dhaka, Bangladesh
[email protected]
[email protected]

Abstract. Underwater communication in underwater Wireless Sensor


Networks is still being researched. For the underwater environment, finding
the exact localization of the sensors is a major challenge. It is critical to
have accurate sensor location data in order to undertake relevant underwater
research. Some algorithms have been proposed that find out the localization
of sensors. Due to some factors of underwater Wireless Sensor Networks,
the localization error of sensors has occurred. In this paper, the comparative
results of predicting Average Localization Error (ALE) have been presented
and using minimal input variable it has been tried out by applying the
Decision Tree Regression and the Gradient Boosted Regression and the
optimal result has been found. Along with these algorithms’ simulations,
Hyper Parameter Tuning and variables selection have been tuned. Among
two algorithm approaches, Gradient Boosted Regression outperformed the
previous algorithm approaches, with a RMSE of 0.1379m.

Keywords: Average Localization Error, Decision Tree Regression,


Gradient Boosted Regression, Wireless Sensor Networks.

1 Introduction

More than 80% of the ocean has yet to be discovered. We deploy a set of sensors
in underwater to investigate uncharted territory. We need to know the exact
location of this set of sensors when we acquire data from sensors to analyze it. It is
easier to compute the location in the terrestrial area. However, it is difficult to
calculate the exact location of sensors in underwater due to specific characteristics
of the Underwater Wireless Sensor Network. As a result, every localization
technique determines the location with some error. 'Prediction of Average
Localization Error in WSNs' dataset provides the average localization error for
different features of the Underwater Wireless Sensor Network. Different SVM
(Support Vector Machine) algorithms were used to precisely predict average
localization error [1]. This prediction, though, can be more precise. It had been
feasible to more precisely anticipate the average localization error in this paper.
Different strategies have been used to minimize the Root Mean Square Error.

2 Related Work

In this section, several techniques have been discussed. Through these techniques,
attempts have been made to improve localization accuracy. A number of location
schemes have been proposed to determine the localization of sensors. The
localization schemes can be broadly categorized into two categories. One is range-
based and another is range-free [2]. Range-free schemes do not use any distance or
angle. The centroid scheme Du-Hop and density aware Hop-count localization is
used in range-free schemes. In range-based schemes, accurate distance or angle
measurement is needed to estimate the location of sensors. There are different
types of range-based schemes. In [3] problem of node localization due to a large
number of parameters and the non-linear relationship between the measurements
and the parameters are estimated and they proposed a Bayesian algorithm for node
localization in Underwater Wireless Sensor Network. They referred to the
algorithm as an existing importance sampling method that is referred to as an
incremental correlation. In [4] they proposed two localization techniques: Neural
Fuzzy Interference System (ANFIS) and Artificial Neural Network (ANN).
Artificial Neural Network (ANN) was hybridized with Particle Swarm
Optimization (PSO), Gravitational Search Algorithm (GSA), and Backtracking
Search Algorithm (BSA). And in indoor and outdoor, the hybrid GSA-ANN
performs a mean absolute distance estimator error with 0.02 m and 0.2m
respectively. Another common technique is the Received Signal Strength indicator
(RSS). In [5] they conducted a machine learning technique survey for localization
in WSNs using Received Signal Strength Indicator. Decision Tree, Support Vector
Machine (SVM), Artificial Neural Network, Bayesian Node Localization were
used in this paper. In [6] they discussed the maximum likelihood (ML) estimator
for localization of mobile nodes. After that, they optimized the estimator for
ranging measurements exploiting the Received Signal Strength. Then they
investigated the performance of the derived estimator in Monte-Carlo simulations
and they compared it with the simple Least Squares (LS) method and exploited
RSS (Received Signal Strength) fingerprint. Another technique for improving
Received Signal Strength-based localization is discussed in [7]. In this paper, they
proposed the use of weighted multiliterate techniques to acquire robustness with
respect to inaccuracy. Techniques are standard hyperbolic and circular positioning
3 algorithms. In [8] they preferred range-based algorithm over range-free
algorithm. They proposed Bayesian formulation of the ranging problem alternative
to inverting the path-loss formula and reduced the gap with the more complex
range-free method. In [9] they also preferred range-based algorithm over range-
free algorithm. They proposed two step algorithms with reduced complexity
where in first phase, they exploited nodes to estimate the unknown RSS and TOA
(Time of Arrival) model parameters and in second phase, they combined a hybrid
TOA/ RSS range estimator with an iterative least square procedure to get
unknown position. A localization scheme has been established based on RSS to
determine the location of an unknown sensor from a set of anchor nodes [10].
Apart from RSS techniques there are also some other techniques to determine the
localization of sensors. A mathematical model has been proposed in [11] where
one beacon node and at least three static sensors are needed. One beacon node
from six different positions can determine the localization of static sensors by
using Cay-ley Menger determinant, but sensors plane need to be parallel to the
water surface. For non-parallel situation they are updated their propose model in
[12]. In another paper [13] they also again updated the [11] mathematical model to
determine the localization of mobile sensors. Further, in [14], a new mathematical
model has been developed to determine the location of a single mobile sensor
using the sensor's mobility. Another technique named as IF-Ensemble has been
proposed in [15] for Wi-Fi indoor localization environment by analyzing RSSs.
Another technique in [16] has been proposed to node localization and that is
Kernel Extreme Learning Machine based on Hop-count Quantization.

3 Methdology

The aim of this research paper is to analyze average localization errors from
conducting different machine learning algorithms that can predict precisely and
compare the outputs with the previous best output. In our study, we used
secondary quantitative data that was acquired by others and Modified Cuckoo
search simulations were used to generate this dataset. We had not undertaken
any studies to change variables because the dataset was already observational.
The dataset comprises four valuable attributes that are used to directly achieve
the research's main goal. All four features provide generalizable knowledge to
validate the research goal just as they do in quantitative research. Fig. 1 depicts
the workflow of the proposed methodology.
Fig. 1. Workflow of Proposed Methodology

3.1 Dataset Description

The dataset utilized in this paper is "Average Localization Error (ALE) in sensor
node localization process in WSNs [17]." We have used the entire dataset with a
total of 107 instances and six attributes, where all attributes represent quantitative
data. The dataset contains no missing values as it was already observational data.
In order to gain Average Localization Error (ALE), only four variables (Anchor
ratio, Transmission range, Node density, and Iterations) have been used as input
variables, and an average localization error has been used as an output variable.
Another attribute (standard deviation value) was ignored in the pre-processing step
as our study only focused on generating localization errors.
The number of anchor nodes based on total number of sensors in the network is
known as the anchor ratio (AR), which is the first column of this dataset contains
numeric values. In the dataset, transmission range also contains numeric values
which is measured in meters, which represent the transmission range of a sensor to
measure the transmission speed. The node density attribute indicates how densely
the activity nodes are connected. The 4th column represents the iteration, which
means how many times we took the reading of sensors. The disparity between the
actual and projected coordinates of unknown nodes is known as an average locali-
zation error.
We have summarized our entire dataset using descriptive statistics. Descriptive
statistics provide information on the total count, mean, median, and mode, includ-
ing standard deviation, variance, and minimum and maximum values.

The dataset is described in Table 1.

Table 1. Descrıptıon Of Dataset

Valid Mismat Miss Mean Std. Quantiles


ched ing deviatio
n Mi 25 50 75 Ma
n % % % x

Anchor 107 0 0 20.5 6.71 10 15 18 30 30


ratio

Transmi 107 0 0 17.9 3.09 12 15 17 20 25


ssion
range

Node 107 0 0 160 70.9 100 100 100 200 300


density

Iteration 107 0 0 47.9 24.6 14 30 40 70 100


s

Average 107 0 0 0.98 0.41 0.3 0.6 0.9 1.2 2.5


localizat 9 5 7
ion error
3.2 Variable Importance
In this paper, before applying a machine learning algorithm, the relationship
between every input variable and output variable has been evaluated. In this
dataset, there are four input variables (anchor ratio, transmission range, node
density, and iterations) and one output variable (average localization error). In
order to evaluate the relationship between every input variable and output
variable, Pearson's correlation coefficient formula has been applied.

So, Pearson's correlation coefficient between two variables,

𝑛(∑ 𝑥𝑦) − (∑ 𝑥)(∑ 𝑦)


𝑟𝑥𝑦 =
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]
Where, x = input feature

Y = output feature
The range of Pearson’s Correlation Coefficient,

0 < 𝑟𝑥𝑦 < 1 (𝑖𝑓 𝑡𝑤𝑜 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠 ℎ𝑎𝑣𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛)


−1 < 𝑟𝑥𝑦 < 0 (𝑖𝑓 𝑡𝑤𝑜 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠 ℎ𝑎𝑣𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛)
Table 2 show the Pearson’s Correlation Coefficient between every input variable
and output variable.

Table 2. Pearson’s Correlatıon Coeffıcıent between Input Varıable and Output Varıable

Average localization error

Anchor ratio -0.074997

Transmission range 0.109309

Node density -0.645927

Iterations -0.400394

We can see that the highest correlation coefficient exists between node density and
average localization error. The average localization error and iterations have the
second highest correlation coefficient. The average localization error has the
lowest correlation coefficients between anchor ratio and transmission range.
3.3 Machine Learning Model
The aim of this research is to predict as close as possible to the average
localization error for anchor ratio, transmission range, node density, and iterations.
Machine learning algorithms are divided into three categories: supervised
learning, unsupervised learning, and semi-supervised learning.There are two types
of supervised machine learning models for predicting something: Classification
and Regression. Regression is used when we want to predict a continuous
dependent variable from several dependent variables. The "average localization
error" variable is a continuous dependent variable, much like in our dataset. Two
regression models have been used to forecast the average localization error.

1. Decision Tree Regression


2. Gradient Boosted Regression

Decision Tree Regression

One of the most often used supervised learning techniques is the Decision Tree.
Decision tree is a collection of trees that can be used for both Classification and
Regression. It's a tree-structured hierarchical classifier with three sorts of nodes:
root, inner, and leaf. The complete sample is represented by root nodes, which can
be further divided into sub-nodes. Interior nodes are decision nodes that carry the
attributes of decision rules. Decision nodes have several branches, which are
called leaf nodes, and the outcome is represented by leaf nodes 1. A decision tree
divides each node into subsets to classify data. It travels over the entire tree, takes
into account all attributes, and estimates the average of the dependent variable
values from the multiple leaf nodes to produce the best results prediction. In
Decision Tree Regression, we use variance to separate variables.
𝑁
1
𝑉𝑎𝑟𝑖𝑒𝑛𝑐𝑒 = ∑(𝑦𝑖 − µ)2
𝑁
𝑖=1
𝑊ℎ𝑒𝑟𝑒, 𝑦𝑖 = 𝐿𝑎𝑏𝑒𝑙 𝑓𝑜𝑟 𝑎𝑛 𝑖𝑛𝑠𝑡𝑎𝑛𝑐𝑒
𝑁 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑛𝑠𝑡𝑎𝑛𝑐𝑒
𝑁
1
µ = 𝑀𝑒𝑎𝑛, 𝑔𝑖𝑣𝑒𝑛 𝑏𝑦 ∑ 𝑦𝑖
𝑁
𝑖=1

Gradient Boosted Regression

The Gradient Boosting algorithm is frequently used to find any nonlinear


relationship from any model that uses a tabular dataset. When a machine learning
model has poor predictability, gradient boosting can help to improve the model's
quality by performing model interpretation. Gradient Boosted Regression (GBR)
is an iterative process that optimizes the predictive value of a model in every
learning process. It can generalize the model by dealing with missing values and
outliers. The primary goal of gradient boosting is to improve the performance of a
predictive model and optimize the loss function by boosting a weak learner (the
measure of the difference between the predicted and actual target values). This
algorithm starts work by training a decision tree. It observe the weight of each tree
and classify by it’s difficulty. In each iterative approach, Gradient Boost combine
multiple weak models and make a better strong model by minimizing bias error.

4 Experiment and Analysis

4.1 Implementation Environment


To conduct the dataset, the Anaconda3 platform has been employed to simulate
the code in Python 3.8 with a lot of packages for machine learning. The processor
is an Intel Core i5-8265U (1.6 GHz base frequency, up to 3.9 GHz). The RAM is
4GB and the operating system is Windows 10 (64 bit).

4.2 Splitting Dataset


We have collected our data from a website (UCI Machine Learning Data
Repository). We have evaluated algorithms by splitting the dataset in two. We use
80% of the data for training and 20% of the data for testing.

4.3 Calculation Formula For Output


A Root Mean Square Error (RMSE) has been applied to analyze the output of this
dataset for these two machine learning algorithms.

∑𝑛𝑖=1(𝑦̂𝑖 − 𝑦𝑖 )2
𝑅𝑀𝑆𝐸 = √
𝑛
𝑊ℎ𝑒𝑟𝑒, 𝑅𝑀𝑆𝐸 = 𝑅𝑜𝑜𝑡 𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝐸𝑟𝑟𝑜𝑟
𝑦𝑖 = 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑉𝑎𝑙𝑢𝑒
𝑦̂𝑖 = 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑉𝑎𝑙𝑢𝑒
𝑛 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4.4 Hyperparameter Tuning
The Decision Tree Regression and Gradient Boosted Regression have some hyper
parameters like maximum-depth, minimum-samples-split, minimum-sample-leaf,
minimum-weight-fraction-leaf, and maximum-leaf-nodes that can be tuned to get
better results. The most powerful hyper parameter is maximum-depth, which by
regularizing can solve the overfitting and underfitting of Decision Tree Regression
and Gradient Boosted Regression. In this research, the maximum depth from 1 to
5 has been regularized and then the output has been distinguished.

4.5 Result and Analysis


The dataset has been analyzed in two combinations of variable selection in every
machine learning method. For getting a better result, we have applied our four
variables in two different ways for both algorithms. First anchor ratio,
transmission range, node density, and iterations have been selected as input
variables, and then, only node density and iterations have been selected as input
variables, because these two variables have a high correlation coefficient with the
average localization error variable that is described in section 3. In addition, in
each approach, the maximum-depth hyperparameter has been fine-tuned. Table 3
shows the Root Mean Square Error (RMSE) for every individual approach.

Table 3. Analysis the Output from Different Approaches

RMSE (m)

Maximum Maximum Maximum Maximum Maximum


Depth-1 Depth-2 Depth-3 Depth-4 Depth-5

With 4 0.2353 0.2411 Null Null Null


Variables
Decision
Tree With Last 2 0.1892 0.1647 0.2343 0.2244 0.2252
Regression Variables

With 4 0.1409 0.1451 0.1437 0.1568 0.1715


Variables
Gradient
Boosted With Last 2 0.1379 0.1393 0.152 0.1548 0.1585
Regression Variables
Here, we can see that Gradient Boosted Regression for two input variables (Node
density and Iterations) give the best output where Root Mean Square Error
0.1379m for this dataset.

4.6 Comparison With Previous Models


SVR with Scale-Standardization, Z-score Standardization, and Range-
Standardization [100] were all simulated on this dataset. The comparison between
the previous and proposed methods is shown in Table 4.

Table 4. Comparıson between Prevıous Methods and Proposed Methods

Previous Methods Proposed Methods

S- Z- R- Decision Tree Regression Gradient Boosted Regression


SVR SV SV
R R With 4 Varia- With Last 2 With 4 Varia- With Last 2
bles Variables bles Variables

Max Max Max Max Max Max Max Max


RMS Dept Dept Dept Dept Dept Dept Dept Dept
E(m) h-1 h-2 h-1 h-2 h-1 h-3 h-1 h-2

0.234 0.20 0.14 0.235 0.241 0.189 0.164 0.140 0.143 0.137 0.139
0 7 3 1 2 7 9 7 9 3

In previous research, the best output came from Range-Standardization SVR with
four input variables where Root Mean Square Error was 0.147m. And in this re-
search, the best output has come from Gradient Boosted Regression with the last
two variables, where Root Mean Square Error is 0.1379m with a maximum depth
of 1, which is better than in previous research. This best output has come out only
for two input variables (Node density and Iterations) and for this, further research
will be more convenient.

5 Conclution

In the field of underwater sensor deployment, it is critical to accurately predict the


localization error. In this research article, we have predicted average localization
error using two machine learning algorithm. Decision Tree and Gradient Boosting
Regression algorithms have been conducted and their outputs have been compared
with the outputs of previous methods; better results have been found. It can be
concluded that this research will play a good role in predicting errors of Average
Localization Error (ALE) in the future. In the future, we hope to collect more un-
derwater sensor data in order to operate these algorithms and find more efficient
prediction of localization error.

References

1. Singh, A., Kotiyal, V., Sharma, S., Nagar, J. & Lee, C. C. A Machine Learning
Approach to Predict the Average Localization Error with Applications to Wireless
Sensor Networks. IEEE Access 8, 208253–208263 (2020).
2. Chandrasekhar, V., Seah, W. K., Choo, Y. S. & Ee, V. Localization in Underwater
Sensor Networks-Survey and Challenges. in 1st ACM international workshop on
Underwater networks 33–40 (2006). doi:10.1145/1161039.1161047.
3. Morelande, M. R., Moran, B. & Brazil, M. BAYESIAN NODE LOCALISATION
IN WIRELESS SENSOR NETWORKS. in 2008 IEEE International Conference
on Acoustics, Speech and Signal Processing (2008).
doi:10.1109/ICASSP.2008.4518167.
4. Gharghan, S. K., Nordin, R. & Ismail, M. A wireless sensor network with soft
computing localization techniques for track cycling applications. Sensors
(Switzerland) 16, (2016).
5. Ahmadi, H. & Bouallegue, R. Exploiting machine learning strategies and RSSI for
localization in wireless sensor networks: A survey. in 2017 13th International
Wireless Communications and Mobile Computing Conference (IWCMC) (2017).
doi:10.1109/IWCMC.2017.7986447.
6. Waadt, A. E. , Kocks, C., Wang, S., Bruck, G. H. , & Jung, P. Maximum
Likelihood Localization Estimation based on Received Signal Strength. in 2010
3rd International Symposium on Applied Sciences in Biomedical and
Communication Technologies. (2010). doi:10.1109/ISABEL.2010.5702817.
7. Tarrío, P., Bernardos, A. M. & Casar, J. R. Weighted least squares techniques for
improved received signal strength based localization. Sensors 11, 8569–8592
(2011).
8. Coluccia, A. & Ricciato, F. RSS-Based localization via bayesian ranging and
iterative least squares positioning. IEEE Communications Letters 18, 873–876
(2014).
9. Coluccia, A. & Fascista, A. Hybrid TOA/RSS range-based localization with self-
calibration in asynchronous wireless networks. Journal of Sensor and Actuator
Networks 8, (2019).
10. Nguyen, T. L. N. & Shin, Y. An efficient rss localization for underwater wireless
sensor networks. Sensors (Switzerland) 19, (2019).
11. Rahman, A., Muthukkumarasamy, V. & Sithirasenan, E. Coordinates
determination of submerged sensors using cayley-menger determinant. in
Proceedings - IEEE International Conference on Distributed Computing in Sensor
Systems, DCoSS 2013 466–471 (2013). doi:10.1109/DCOSS.2013.62.
12. Rahman, A. & Muthukkumarasamy, V. Localization of Submerged Sensors with a
Single Beacon for Non-Parallel Planes State. in 2018 Tenth International
Conference on Ubiquitous and Future Networks (ICUFN) (IEEE, 2018).
doi:10.1109/ICUFN.2018.8437041.
13. Rahman, Md. M., Tanim, K. M. & Nisher, S. A. Coordinates Determination of
Submerged Mobile Sensors for Non parallel State using Cayley-Menger
Determinant. in 2021 International Conference on Information and
Communication Technology for Sustainable Development (ICICT4SD) 25–30
(IEEE, 2021). doi:10.1109/ICICT4SD50815.2021.9396837.
14. Rahman, M. M. Coordinates Determination of Submerged Single Mobile Sensor
Using Sensor’s Mobility. in 2021 International Conference on Electronics,
Communications and Information Technology (ICECIT), 14–16 September 2021,
Khulna, Bangladesh (2021). doi:10.1109/ICECIT54077.2021.9641096.
15. Bhatti, M. A. et al. Outlier detection in indoor localization and Internet of Things
(IoT) using machine learning. Journal of Communications and Networks 22, 236–
243 (2020).
16. Wang, L., Er, M. J. & Zhang, S. A Kernel Extreme Learning Machines Algorithm
for Node Localization in Wireless Sensor Networks. IEEE Communications
Letters 24, 1433–1436 (2020).
17. Singh, A., Average Localization Error (ALE) in sensor node localization process
in WSNs Data Set. UCL Machine Learning Repository, viewed 31 December
2022, < https://archive.ics.uci.edu/ml/>(2021)

You might also like