A Real Time Indoor Localization Application
A Real Time Indoor Localization Application
Amulya Yadav Indian Institute of Technology Patna [email protected] Mentor- Dr. Bhaskar Krishnamachari Viterbi School Of Engineering [email protected] September 7, 2011
Abstract
In this paper, the implementation of a real time indoor localization application is discussed. The algorithms and techniques used are described. Also, results showing the effect of variation of various parameters on the accuracy of the system are presented. Further, various modications to the algorithm are made and their eect on the system discussed.
Introduction
This work was done as part of a summer internship project undertaken at the Autonomous Networks Research Group(ANRG) at University of Southern California under Dr. Bhaskar Krishnamachari. The aim of the project was to develop an application for real time indoor localization. For this, a wireless sensor network(WSN) was employed. Some nodes formed the reference nodes(whose location is predetermined) and one node formed the unknown node(whose location was to be ascertained). Wireless telosb motes were used as both the reference nodes as well as the unknown node. So, basically, a network of more than 50 motes was set up in the ceiling of our building as shown in Fig 1[1]. This gure shows the oor map of the 4th oor of 1
our building.The blue dots are the locations of the reference nodes. Any number of these could be used to form our network.These motes were programmed to behave as stations. The unknown node was programmed to behave like a beacon. A beacon brodcasted a beacon packet into the network after a xed interval of time. Each station would receive this beacon packet, process it appropriately and send over the processed information to a basestation node where the localization algorithm would run to nd out the location of the node. The algorithm used to localize the unknown node was Ecolocation[2], a sequence based RF localization technique. So, our application would be able to track a persons movements inside a building if he moved with the beacon in hand. The output of the application was something similar to what you see on GPS devices. Once an initial implementation of the algorithm was in place, eorts were put to improve the accuracy of the application. Modications to the algorithm such as introduction of a smoothening factor, using hashing techniques to reduce the amount of time spent searching for the correct sequence, using dierent metrics to nd out the distance between two sequences and forming shortened sequences by not considering those reference nodes to which the beacon packet was not sent, were some of the changes that were tried on the basic algorithm. The impact of these changes on the accuracy of the system was noted. Further, various parameters were ascertained which play an important role
in changing the accuracy of the system. Their eect on the application was found out by a representative set of experiments in which these parameters were varied and changes in accuracy were observed. At the end of this analysis, the values of the parameters for which the system worked best were deduced. The rest of the paper is organized as follows: Section 2 describes the various algorithms and techniques that were used in building the application. Pseudocodes of the algorithms are presented. Section 3 gives an initial implementation of the algorithm. Section 4 describes in detail the various modications that were tested on the algorithm. Section 5 discusses the various parameters that play a role in the performance of the application. Section 6 describes the setting of the various experiments that were performed on the application. Section 7 gives the results obtained from the experiments and then, Section 8 gives some future directions in which the accuracy of the application can be improved further. Finally, the paper is concluded in Section 9.
Ecolocation
Ecolocation[3] is an sequence based RF localization algorithm. This algorithm determines the location of unknown node by looking at received signal strength(RSS) based rank sequences. The key idea of Ecolocation is that the distance based rank order of reference nodes constitutes a unique signature for dierent regions in the localization space. In this algorithm, we obtain the ordered sequence of reference nodes by ranking them on one-way RSS measurements between them and the unknown node. This measured sequence is then compared with the ideal distance based sequence for each location to determine which locations sequence is closest to our RSS sequence. The closeness or distance between any two sequences is calculated by dierent metrics. The location whose sequence is the closest, is the best estimate of the unknown nodes location.
2.1
Steps of Ecolocation
Before the localization process, a location sequence table(LST) is formed in which the entire localization space is divided into regions and a Euclidean distance based sequence for each and every region is formed. 2
So, after this step, we have a huge table in which there exists an injective mapping from the regions to the sequences.After that, the localization process is initialized by the beacon node by brodcasting a beacon packet onto the entire WSN. This beacon packet is received by the reference nodes of the network, which calculate the RSS value of the incoming packets. The reference nodes collect these RSS measurements and send these packets to the base station node. Based on these RSS values, a sequence is formed. Note that for forming a sequence, RSS values from the same beacon packet are to be considered. The sequence is formed on the assumption that distance of a reference node from the unknown node increases with decreasing RSS value of the beacon packet. Thus, in the sequence, each reference node is given a rank. For a reference node ranked at position i in the ordered sequence, Ri > Rj = di < dj for all i,j where Ri and di are the RSS measurement and the distance of the ith ranked reference node from the unknown node,respectively.
such that every sequence is unique to a region. The way in which we did it was by division of the localization space into distinct regions by the perpendicular bisectors of lines joining pairs of reference nodes. So, assume that a localization space consists of n reference nodes. Consider any 2 reference nodes and draw a perpendicular bisector of the line joining the 2 nodes. This bisector partitions the entire localization space into 3 regions, that are distinguished by their proximity to either reference node, as illustrated in Fig 2.
If we repeat this process for each pair of reference nodes, giving us a total of n(n 1)/2 pairs, they Though, in ideal conditions, this above relation divide the localization space into many regions of is expected to hold, but due to the multipath and 3 dierent types: edges, vertices and faces. This shadowing eects of RF communication, this is subdivision of a 2D space into many regions is called not always the case. It might happen that refer- an arrangement induced by that set. ence nodes that are far from the unknown node measure greater RSS values that reference nodes Now, for region created by the arrangement that are nearer. Hence, the sequence that is formed induced by the set of perpendicualr bisectors, determight be a corrupted version of the original sequence. mine the ordered sequence of the reference nodes ranks based on their distances from them. Note that The next step of the algorithm searches through the the location sequence of a given region is unique to LST for the closest possible sequence to the sequence that region. Further, if a region is represented by which we have formed from the RSS values. The its centroid, then is a one to one mapping between centroid of the region corresponding to the sequence a location sequence and the centroid of the region is given by the algorithm as the unknown nodes that it represents.The order in which the ranks of estimated location. Now, we will discuss all the steps reference nodes are written in a location sequence is determined by a predened order of reference node of the algorithm in detail. IDs. 2.1.1 For eg, in Fig 2, if the predened order of refThe rst question that needs to be asked is how to erence nodes would be AB, then sequence for dark divide the localization space into dierent regions shaded region is 12 and the sequence of light shaded 3 LST Construction
region is 21. The sequence for the region represented by the edge is 11. If the predened order would be BA, then the sequences would have been reversed in this case. A generic pseudocode of the LST construction algorithm is presented below:
VL contains pointers to all vertices of the arrange- ment induced by the set L. EL contains pointers to all edges of the arrangement induced by the set L.
FL contains pointers to all faces of the arrangeAlgorithm 1 LST Construction ment induced by the set L. Require: Location coordinates of reference nodes GETEDGECENTROID takes in an edge pointer (axi , ayi )|i = 0 n 1. as the input and returns the centroid of the edge. Boundaries of the localization space B. Ensure: Location Sequence Table GETFACECENTROID takes in a face pointer L = {li |i = 0 (n(n 1)/2 1)} as the input and returns the centroid of the face. BISECT ORLIN ES({axi , ayi )|i = 0 n 1}, B) GETSEQUENCE takes in the coordinates of a (F L, EL, V L) CON ST RU CT ARRAN GEM EN T (L)point in the localization space and returns the for i 0to(|V L| 1) do location sequence for that point with respect to Centroid[i] V L[i] the locations of the reference nodes. Sequence[i] GET SEQU EN CE(Centroid[i]) end for 2.1.2 RSS Sequence Formation for i |V L|to(|V L| + |EL| 1) do Centroid[i] GET EDGECEN T ROID(EL[i]) Now that the LST has been constructed, the actual Sequence[i] GET SEQU EN CE(Centroid[i]) Ecolocation begins. All reference nodes send out RSS end for values of received beacon packets to the basestation for i (|V L| + |EL|)to(|V L| + |EL| + |F L| 1) where the rank sequence is formed based on the RSS do values. This sequence is then compared with each Centroid[i] GET F ACECEN T ROID(F L[i]) and every sequence in the LST to nd out the closest Sequence[i] GET SEQU EN CE(Centroid[i]) sequence. This is how the algorithm works. end for return Sequence,Centroid
Initial Implementation
The code of the algorithm was implemented in C++. On the network side, motes were programmed in nesC. The GUI portion of the code, which displayed the output of the application, was initially written using OpenGL in C++. However, it was seen that this OpenGL implementation could not handle high speeds of streaming location coordinates and thus, the output from this program was blotchy and not regularly. In order to rectify this, the source code of a Java based open source application LIVEGRAPH[4] CONSTRUCTARRANGEMENT constructs the was modied to t our specic needs. arrange- ment, given a set of lines as input, and We will discuss the code in a bottom to up fashion, returns a doubly connected edge list (EL) that with the intricacies of the code being told rst and consists of a vertex list VL, an EL, and a face the general picture being presented later. Very list FL important in code writing was to ensure that the BISECTORLINES takes in the locations of the refer- ence nodes and the boundaries of the localization space as input and returns the set L of all pairwise perpendicular bisector lines within the boundaries of the localization space. Each line is represented by the intersection points on the left and right boundaries of the localization space. 4
code should be robust, if tomorrow, someone else wanted to add some extra feature to the algorithm, for eg, using some other feature of the network, then no major rewriting of code would be required. Firstly, a packet coming in from a reference node is represented as a C structure with 6 elds: Target id(tid): Every unknown node has a unique ID associated with it called the tid. In case there was only one person moving around, this eld would remain constant. Reference id(rid): rid represents the reference node from which the packet was sent to the basestation. Like the unknown nodes, all reference nodes are given a unique id. Time at reference(tr): Represents the time at which the packet was sent from the reference node. This eld was redundant in the current implementation but is likely to be used in future work. Time at target(tt): Represents the time at which the packet reached the basestation. This eld is also redundant in the current implementation.
This buer stores all packets with the same psn in one row. So, whenever, a packet with a new psn would come in, a new row would be added to the buer and the packet stored in that row. Also, we set the maximum size of the buer as a constant, so that the rows with the oldest psn would be deleted in case the maximum size is exceeded. There were several ways in which the algorithm could be implemented. One possible way was that whenever a new packet came in, the buer would be sent for calculation of the location of the unknown node. But it was seen that, more often than not, before one full row of the buer could be completely lled, a new psn bearing packet came in. This would lead to incorrect sequences being formed and accuracy of the system suering. To overcome this diculty, it was thought that collection of packets into buer would take place in a separate thread and there would be a separate thread which would wait for a xed interval of time for the buer to ll up and then automatically take the buer to calculate.
Our implementation began with a call to INITLST() function which takes the list of coordinates of the reference nodes and creates the LST. In our implementation, which is slighly dierent from the normal implementation, we have a factor that determines Packet sequence number(psn): Every beahow nely is the localization space divided into con that was brodcasted had a psn which started regions. The pseudocode of INITLST is descibed from 0 and incremented every time. To underbelow: stand why this eld was essential, consider the GETSEQ(x,y): This function takes an (x,y) following scenario: Say, at time t = 5, a beacoordinate and calculates the Euclidean discon was brodcasted. Now ideally, we would want tances based rank sequence of all the reference that the RSS sequence that we form at the basesnodes. It calculates distance of each reference tation would be formed by RSS values correnode from (x,y) and then ranks them in increassponding to the same beacon packet. By checking order of distance. ing the psn, we can make sure that only packets with the same psn (there will be n such packets Next, in our implementation, two separate threads where n is the number of reference nodes) are run: used in forming the sequence. The rst thread waits for incoming packets from the Received signal strength(rssi):This value network and stores them in the buer as they come. Also, it maintains the size of the buer by deleting gave the RSS value of the packet. old psn bearing rows and some other bookkeeping Now, as these packets come to the basestation,they tasks. are put into a buer, implemented as a C++ vector. 5
Algorithm 2 INITLST Algorithm 3 GETNEWSEQ Require: Location coordinates of reference Require: Buer B nodes.(xi , yi )f or each i = 1..n Ensure: (x, y) Location coordinate Boundaries of the localization space B. HammDist = (xmax , ymax ), (xmin , ymin ) Sequence = AV ERAGE BU F F ER(B) N ewSeq = GET SEQ(Sequence) for i = 0 to Size of LST do Ensure: Location Sequence Table thisDist = HAM M (N ewSeq, LST [i]) M axgridsize = (xmax xmin )/ (ymax ymin )/ if thisDist < HammDist then for i = ymin to ymax do index = i for j = xmin to xmax do HammDist = thisDist Sequence = GET SEQ(i, j) end if if Sequence is already seen then end for Add (i,j) to old sequences region (x, y) = CEN T ROID(index, LST ) else return (x, y) Add (i,j) to new sequence end if end for end for HAMM: calculates the distance between two sequences based on a distance metric. This metric can be anything. In our implementation, The second thread sleeps for a given amount of HAM M (A, B), which calculates the distance time,called CALC TIME, and then takes the current between A and B, does the following when n is buer and sends it for processing by a function the number of reference nodes: GETNEWSEQ. This function takes in the buer, seHmDist = HmDist + (20/A[i]) (A[i] b[i]) lects the most oldest NUM SAMPLES psn rows and for i=1..n averages them to create one row containing averaged RSS value. This averaging is done to compensate for CENTROID: gives the centroid corresponding the inconsistencies of the network as several packets to the sequence from the LST. dont reach the basestation due to congestion and severe delay. Also, while averaging ,care is taken that only valid RSS values are considered while averaging. If there is some spot in the buer that is empty because the packet corresponding to that 4 Modications to algorithm place did not reach the buer, that empty spot is not considered in averaging. Then, this averaged RSS A lot of problems showed up when this initial implesequence is used to create a rank sequence. Further, mentation was tested. It was observed that the outthis sequence is searched for in the LST and the put of the algorithm lagged behind a human walking centroid corresponding to the best match is returned. around, thus, creating the illusion that the applicaThe pseudocode of the function GETNEWSEQ is tion was not working in real time.Moreover, the outgiven below: put was seen to be jagged and not smooth. The output was found to dip at specic points on the oor AVERAGE BUFFER(B) takes the buer map on a consistent basis. In this form, the appliand averages the oldest NUM SAMPLES num- cation was of no practical use to anyone. Hence, a ber of psn rows and creates a single row which number of modications were made to improve the algorithm: has averaged RSS values. 6
4.1
Smoothening factor
When the initial implementation of the algorithm was tested, it was observed that the output was jagged. In order to decrease the jaggedness, a smoothening factor, termed was introduced that functioned as follows: xnew = xnew + xold ynew = ynew + yold This factor is nothing but a weigthed average of the old and new coordinates. This factor makes sure that the new location is not very far o from the initial location. It also mitigates the eect that an erroneous solution has on the output. This should improve the smoothness of the location plot that we get.
(u = 1 6(n(ni2vi) 1)
4.2
Hashing
Kendell Taus metric In contrast to Spearmans coe- cient, in which the correlation of exact ranks is calculated, this metric calculates the correlation between the relative ordering of ranks of the two sequences. It compares all the n(n 1)/2 possible pairs of ranks (ui , vi ) and (uj , vj ) to determine the number of matching and nonmatching pairs. A pair is matching or concordant if ui > uj vi > vj or ui < uj vi < vj and nonmatching or discordant if ui > uj vi < vj or ui < uj vi > vj . The correlation between the two sequences is calculated as follows:
One major deciency of the algorithm is that the size of the LST is very large. This large table is fully searched every time for the closest possible sequence. In order to overcome this eciency, some method was sought with which the time spent in searching the table would be minimized. Hashing seemed to be an obvious choice.
4.3
Distance metric
This was the method used in calculating the distances between any two sequences. Obviously, this had an eect on which sequence would be chosen as the closest sequence and eventually, aect the output of the algorithm. Three standard and one novel approach were employed and tested. The rst one was the calculation of the hamming distance as specied in Section 3. The rest of the approaches are discussed here: Given two location sequences U = ui and V = vi , 1 i n, where ui and vi are the ranks of reference nodes, the above two metrics are dened as follows: Spearmans Coecient Rank Order Correlation
Projection: This modication borrows ideas from the Global Positioning Systems(GPS) implementation. In GPS, the application assumes that the vehicle can only move on a designated road. So, even if the vehicle moves o the road and goes somewhere else, the code shows the location of the vehicle as the closest point of the road from the vehicle. This is what we incorporated in our code as well. A person was constrained to move only in the corridors. And hence, no matter where the person went, the closest point in the corridor to his current location was sent as output. This drastically improved the presentability of the output.
There are many parameters that aect the accuracy of the algorithm; some of which are due to the algorithm and some are due to the method of implemen7
tation. All the major parameters are described and their eect on the algorithm given below: Number of reference nodes As the number of reference nodes increases, greater number of RSS packets would be sent to the basestation node. For n reference nodes, n packets would be sent to the basestation. Therefore, the sequences formed would also be n numbers long. This would increase the complexity of searching for the closest sequence in the LST. However, as the number of reference nodes increases, the accuracy of the algorithm is also expected to increase. Beacon transmitting power As the transmitting power of the beacon is increased, the beacon would be able to reach out to a greater number of reference nodes. Therefore, greater number of RSS packets would reach the basestation and more accurate sequences would be formed. This would lead to better localization. In the case of lesser transmitting power, sequences would only contain valid RSS information from a very localized area thereby decreasing the accuracy of localization. Further, as the transmitting power would increase, so would the congestion in the network and thus greater dealy would be experienced in receiving RSS packets at the basestation. Reference node transmitting power As the transmitting power of the reference node is increased, the RSS packets would reach the basestation in fewer hops and thus decrease the congestion in the network. Theoretically, greater number of relay nodes should help in reducing the congestion that is caused by the low transmitting power of the reference nodes. Smooth factor Smooth factor, also called is used in smoothening the output of localization in the following way: As is increased, so does the jaggedness of the output of the algorithm. is based on the simple idea that a persons new location cannot be 8
very far o from the persons old location. As alpha is decreased, the smoothness of the output improves but at the same time, the output lags behind the moving person. This is because, as decreases, the distance between new location and old location starts decreasing. Thus, choosing an optimal value of is important which provides a good enough smooth output and at the same time maintains a considerable amount of speed. Beacon transmit time This is the rate at which the unknown node transmits beacons out to the reference nodes. As the beacon transmit time decreases, more beacons begin to reach the reference nodes in one second. Thus, the reference nodes send more RSS packets per second to the basestation. This allows for more real time updation of the persons current location. However, this also increases the congestion in the network because of the increased number of packets. Choosing the beacon transmit time provides an interesting tradeo decision. Number of samples to average This quantity represents the number of PSNs whose RSS values are averaged. As this number increases, the accuracy of localization is expected to improve because by averaging, we are trying to compensate for the inadequacies of the network such as lost packets. If this number decreases, lesser data is averaged, inaccuracy of sequences formed increases, and accuracy of localization decreases. Buer size This represents the maximum number of PSNs which can be stored in the buer at a given time. This size also signies the duration of time period for which we store information in the buer. In order to understand this, consider the beacon transmit time to be m beacons per sec. And the buer size is say, n. Then, the buer holds nm sec of data in the buer. As the buer size increases, the accuracy of the algorithm is expected to increase. Due to an inecient network,
it is possible that RSS packets from some reference nodes do not reach the basestation due to which the buer remains empty values in those locations. With time, as newer PSNs come in, it is highly likely that we get a full sequence of RSS values. Calculation time This time, represented by stands for the time after which the algorithm spews out the location of the person. After every sec, the data in the uer is averaged and a sequence is formed. As this time increases, the speed which the algorithm localizes decreases but the accuracy is expected to increase. This is another way in which we make up for the inadequacies of the network. Thus, by allowing for a greater calculation time, we are waiting for more complete and accurate sequences to be formed.
Out of all the distance metrics, Kendell Tau was found to be the most stable, followed by HammingDist and then lastly Spearmans method. In the graph of CALC TIME vs MSE, at low values, as expected, the MSE is larger, due to incomplete buers at lower times. As the CALC TIME is increased, it settles at a constant value. In the graph of Beacon Transmit Time vs MSE, as values of Beacon Transmit Time is increased, MSE decreases due to reducing congestion in the network. Hence, the results verify whatever was hypothesized.
Related work
The setting of the experiments is described in this section. A network consisting of a given number of reference nodes was set up on the Tutornet network(the WSN that is installed on the ceiling of out building). Then, one unknown mote which would act as beacon was put at a specic spot on the oor. And then the algorithm was allowed to run for 60 sec. All the x and y values that we got over this time period were averaged. And then the mean square error(MSE) was calculated. We got some very interesting insights by seeing the results. One of them was the eect that had on the output. Earlier, we had thought that only had a smoothening eect on the system. But after the results, it was seen that also made it dicult for the output to get back on the right track once an errorneous solution crept in. For eg, if the value of is 0.2, and an erroneous solution comes in, then it would take 5 consecutive correct solutions to negate the eect that an erroneous solution might have had on the output. The graph of vs MSE conrms this suspicion as at very high values of , MSE increases again.
Over the past few years many solutions have been proposed for RF-only localization in wireless ad-hoc and sensor networks which can be broadly classied into two main categories range based and range free. Range based techniques estimate distances (range) from RSS measurements between the unknown node and the reference nodes and use them to triangulate the location of the unknown node. On the other hand range free techniques estimate the location of the unknown node without determining the range.We discuss about four selected localization tech- niques proximity localization, centroid, approximate point in triangle and maximum likelihood estimation based on the criterion that they should use RSS of RF signals to calculate the location estimate over a single hop.
7.1
Proximity localization
It is a simple localization scheme in which the location of the closest reference node, based on RSS measurements, is the unknown node location estimate. It can be considered as an extreme special case of Ecolocation where only the rst ranking reference node is considered.
10
7.2
Centroid
Future work
The authors of this technique propose a range free, Some directions in which future work can take place proximity based solution for localization where the in improving this application are: location estimate is the centroid of all the reference nodes which are in the proximity of the unknown node. The authors suggest an enhancement to this The motes can be t with accelerometers that technique by adaptively placing reference nodes only transmit beacons when we are in moto minimize location error. We do not consider tion.This would reduce the jaggedness that this enhancement as this requires extra information we get in our system.As our acceleration ingathering and processing. creases,so does the rate of transmitting beacons.Beacuse when we are moving slowly,we dont need to nd out our location that much.Similarly,if we move fast then beacons 7.3 Approximate point in triangle should be transmitted more quickly because then we are covering more distance.It is also felt that a T. He et al in propose a range free localization simple accelerometer can be used to develop this technique called approximate point in triangle application from scratch in a much more simpler (APIT) in which the RSS value at the unknown node way. is compared with RSS values at its neighbors and based on this comparison a decision is made whether the unknown node location is inside various triangles Markov modeling can be used to improve the formed by the reference nodes. This comparison output. Our location at time t + 1 denitely test is done for all the locations in the location depends on our location at time t. Hence, a circle space and for all the triangles that can be formed of radius s can be drawn around the persons by the reference nodes. The location estimate is the current location as center. This s represents the centroid of the locations which are in a maximum maximum distance that a normal human being number of triangles. The accuracy of the location can cover in a short span of time. If our new estimate also depends on the non reference node location falls out of this circle, then the solution neighbor density of the unknown node. is erroneous. Hence, it can be discarded. If the new point falls inside the circle, then the circle can be redrawn with the new centre.
7.4
Out of the many maximum likeli- hood location estimation (MLE) techniques proposed, we consider a simple, representative MLE technique. In this, the authors calculate the location which maximizes a likelihood function, which is based on the distance estimate and its standard deviation, using the gradient climbing method. All RF based MLE methods need good ranging techniques that use radio frequencies to estimate distances. This either requires expensive ranging equipment and/or time consuming preconguration surveys of the location space.
Acknowledgements
I would like to express my heartfelt gratitude to Dr. Bhaskar Krishnamachari, my mentor during this project. His insights and expertise in the matter were invaluable. Also, I would like to thank Suvil Singh Deora, doctoral candidate at ANRG. Without him, this project would have been a distant vision.
11
References
[1] http:://www.enl.usc.edu [2] Kiran Yedavalli and Bhaskar Krishnamachari ,Sequence-Based Localization in Wireless Sensor Networks,IEEE Transactions on Mobile Computing, Vol. 7, No. 1, Jan 2008 [3] Kiran Yedavalli , Bhaskar Krishnamachari , Sharmila Ravula , Bhaskar Srinivasan, Ecolocation: A Sequence Based Technique for RF Localization in Wireless Sensor Networks [4] http:://www.livegraph.org [5] http:://www.tinyos.net
12