Astar 2
Astar 2
Abstract—There are several shortest-path search algorithms The hardware implementations of the A* algorithm have
such as A-star, D-star and Dijkstra. These algorithms are been studied in the previous researches. Z. K. Baker and M.
widely used in automotive vehicles and mobile navigation Gokhale investigate a small bubble sort core to produce the
systems. As the number of nodes is increased considerably, the extract-min function [2]. But its time complexity is O(N).
shortest-path algorithms implemented in software produce Also pipelined heap can be adopted [5], but time complexity
heavily computational overhead. In this paper, in order to is O(log N).
avoid computational overhead, we propose a hardware model Compared to the previous work, proposed architecture is
of the A-star algorithm for the shortest-path search engine.
based on a sorted shift register like a priority queue. And
Especially, we propose shift register based on efficient
particularly, time complexity is O(1). In Section II, we
hardware model and show simulation results in comparison
with previous works.
explain the A* algorithm and the point of improvement. The
details of the proposed architecture be described in Section
Keywords - shortest-path search algorithm; A-star algorithm; III and Section IV shows a simulation results. Finally, we
shift register; priority queue; sorting; conclude our work in Section V.
II. RELATED WORK
I. INTRODUCTION
There are many shortest-path search algorithms, Dijkstra,
Transportation infrastructure is a highly complex Bellman-Ford and A* algorithm [1, 6]. The one of them, A*
network and there are huge traffics. For this reason, the algorithm, is usually used in the industry of transportation.
telematics has been prospered in the automotive industry. The reason why A* algorithm is usually used is its flexibility
More recently, it has been specifically applied to Global that it can reduce the computational time. Fig. 1 represents
Positioning System(GPS) technology integrated with pseudo code of A* algorithm.
computers and mobile communications technology in
automotive navigation systems.
One of the most important parts of the automotive ` A* (start, goal)
navigation system is a shortest path search algorithm. In 1. Closed set = the empty set
general, as the number of nodes is consistently increased, the 2. Open set = includes start node
shortest-path search algorithms implemented in software 3. G[start] = 0, H[start] = H_calc[start, goal]
produce heavily computational overhead. Thus, the overall 4. F[start] = H[start]
5. While Open set
performance efficiency of the automotive navigation system
6. do CurNode Å EXTRACT-MIN- F(Open set)
could be decreased. Moreover, this performance degradation
7. if ( CurNode == goal ), then return BestPath
finally influence to the users of navigation because of 8. For each Neighbor Node N of CurNode
increased searching time of the shortest path and disturbance 9. If ( N is in Closed set ), then Nothing
of the telematics operation. 10. else if ( N is in Open set ),
As mentioned above, a shortest path algorithm 11. calculate N’s G, H, F
implemented in software produces huge overhead. But if the 12. If ( G[N on the Open set] > calculated G[N] )
shortest-path algorithm is implemented with reasonable 13. RELAX(N, Neighbor in Open set, w)
hardware, navigation system can efficiently avoid heavy 14. N’s parent=CurNode & add N to Open set
overhead. Consequently, the overall performance of the 15. else, then calculate N’s G, H, F
navigation system can be increased. 16. N’s parent = CurNode & add N to Open
The most important algorithm of the shortest path
problem is the A* algorithm which finds single pair shortest- Figure 1. A* Algorithm pseudo code
path using heuristic function to speed up the search. For this
reason, we aim to design hardware model of the A* Initial conditions are represented from line 1 through line
algorithm. 4. The open set is defined as a set of already known nodes.
The computational time satisfies (1). Nopen represents the
ͳǦʹ ͳǦͳ
number of nodes in the open set and Taccess is the time of
memory access. Taccess can be neglected, because it is
affected by what kind of memory is used. It is system’s Ƭ Ƭ
specification, so we are not interested in that term. Therefore,
its performance is determined by the number of nodes in the ሺሻ ሺሻ
open set, Nopen.
Reference [2] is a case of comparing all nodes. It
proposes bubble sort and computational time is determined
by Nopen. But its time complexity is O(Nopen). Also heap is ʹǦʹ ʹǦ͵ ʹǦʹ ʹǦ͵
possible to be implemented ([5]). Its time complexity to sort
is nearly O(log Nopen). But our proposed architecture shows
better performance, O(1). ʹǦͳ ͳǦʹ ͳǦʹ ʹǦͳ
1500
1488
First time, start node is stored in shift register in (a), Fig. In according to the map, the number of nodes is a striking
3. The open set’s element is only start node, so start node is contrast. Because of that reason, architecture included
extracted to the CurNode. Because of changing the CurNode, memory of enough size. The format of the memory is not
the NeiNodeSearch & Calculation module figures out only important, but also reducing memory access is a key of
information of each neighbor node of the CurNode. implementation. The OpenListSort & Update module is
Neighbor nodes are defined like Nei X-Y. The X means Xth using linked list after valid loops of the shift register. Linked
loop and the Y is numbered in the order of smaller value of F. list minimize memory access, so it is suitable to our
In (b), Fig. 3, four neighbor nodes are calculated and two architecture.
minimum of them, Nei 1-1 and Nei 1-2, are stored and others, Fig. 4 represents the top-level block diagram. There is the
Nei 1-3 and Nei 1-4, are removed. central controller to control each module, and the Bestpath
On the next loop, we can execute EXTRACT-MIN-F, module returns the shortest path. The SRAM stores node
because shift register guarantees an element having a information and the open set. Two shaded modules in Fig. 4
minimum value of F. Then the Nei 1-1 is extracted, and the manage the open set.
NeiNodeSearch & Calculation module outputs second loop
set of neighbor nodes, the Nei 2-Y. In (c), (d), Fig. 3, it IV. SIMULATION RESULTS
shows possible array of the shift register. The Nei 1-2 having In order to verify, we made maps having variable nodes,
a minimum value of F and the Nei 2-Y are compared and it 16, 64, 256 and 1024. They are quite reliable, because each
makes a candidate node of the next current node. This node node is located in rectangular coordinates and the cost (or
is guaranteed to be a current node until next loop, because distance) of each edge is calculated in that coordinate
next loop is not affected by removed node in (b), Fig. 3. systems. And we design an A* algorithm model by using C
In other words, two nodes in the shift register in (b), Fig. language.
3 are candidate nodes which can be the CurNode until next Fig. 5 represents the comparison of each way with
two loop and two nodes removed from the shift register can different number of nodes.
be the CurNode after three loops later.
From this, we can analogize the fact that the number of The clock cycles to sort = Tsort + Taccess (3)
registers determines the number of valid loops. And valid
loop is started from the event that the shift register is full of In order to sort, the clock cycles are calculated like (3).
node information. Therefore the structure in Fig. 2 is valid Tsort is the time to sort and Taccess is the time to access
until 10 loops. memory. The sort is executed after 10 loops. In order to
But after 10 cycles, this architecture is not reliable, so we simulate the proposed model, we randomly choose 15 pairs
need an additional module. This additional module has to of nodes, the start and goal nodes.
sort the open set and update the sorted shift register. It is
nearly same to Fig. 2, but it has more registers and can
access to memory.
1501
1489
Fig. 6 shows memory fetch clock cycles with specific But you should keep in mind that the relation between
start and goal nodes. This simulation is carried out with a the number of shift registers and cost is a trade-off. The more
map having 256 nodes. The start node and goal node are registers are implemented, the more size of area is needed.
randomly selected.
V. CONCLUSION AND FUTURE WORK
This paper gives novel idea to design a shortest path
search engine. The essential point of our work is reducing
memory access and managing the open set efficiently. Many
times of memory accesses cause large power consumption in
the system. Thus we propose a sorted shift register, because
it is appropriate to be implemented as hardware.
Sorted shift register’s EXTRACT-MIN-F is constant
time complexity, O(1). And sorting executes each valid
loops later, not every loop. Those features make possible to
implement pipeline architecture.
Also linked list data structure is efficient to our hardware
model. It minimizes memory access, so our architecture has
a feature of low power consumption.
Our future work is designing in hardware using proposed
Figure 6. The comparison of clock cycles of memory fetch between architecture, not modeling. If it is implemented as hardware,
heap and proposed architecture with specific node there are some ways to reduce cost. For example, we can put
two modules managing the open set together. Because both
To sort with heap architecture, it should execute many of them are formed with an array of shift registers. And
comparisons between parent node and two children nodes. Implementing pipeline is one of them. Pipeline architecture
So it causes more memory access. On the other hand, improves the system significantly. Also [2], [3] and [4]
proposed architecture needs short time to sort, O(1). Memory commonly said parallel architecture improves performance
access is occurred after 10 loops, and it is minimized by efficiently. Our proposed architecture is also possible to be
linked list architecture. Proposed architecture needs nearly implemented in parallel.
half of time to memory access. The more memory accesses
cause the more power consumption. Accordingly, proposed
architecture is suitable for the hardware implementation.
There are simulation results with variable number of shift
registers in Fig. 7. As the number of shift registers is
increased, the average clock cycles are reduced, because the
number of registers indicates the number of valid loops. The REFERENCES
number of valid loops means the period of sorting the [1] T. H. Cormen, C. E. Leiserson, R. L. Rivest and C. Stein,
memory. As the period is longer, the number of times to sort Introduction to Algorithms, 2rd ed., The MIT Press, 2001, pp. 580-
the memory is less needed. It can be expected to reduce 619.
sorting time and to improve performance with variable [2] Z. K. Baker and M. Gokhale, “On the Acceleration of Shortest Path
Calculations in Transportation Networks”, Proc. The Symposium on
number of them. Field-Programmable Custom Computing Machines (FCCM’07),
April 2007, pp 23-34, doi:10.1109/FCCM.2007.46
[3] I. Fernandez, J. Castillo, C. Pedraza, C. Sanchez and J. I. Martinez,
“Parallel Implementation of The Shortest Path Algorithm on FPGA”,
Proc. The Southern Conference on Programmable Logic, March 2008,
pp. 245-248, doi : 10.1109/SPL.2008.4547768
[4] M. Tommiska and J. Skyttl, “Dijkstra’s Shortest Path Routing
Algorithm in Reconfigurable Hardware”, in Lecture Notes in
Computer Science(LNCS), vol. 2147/2001, Springer Berlin /
Heidelberg, 2001, pp. 653-657
[5] A. Ioannou and M. Katevenis. “Pipelined Heap (Priority Queue)
Management for Advanced Scheduling in High Speed Networks”. In
IEEE/ACM Transactions on Networking, vol. 15, issue 2, April 2007,
pp. 450-461, doi: 10.1109/TNET.2007892882
[6] A. Patel, Amit’s A* Pages,
http://theory.stanford.edu/~amitp/GameProgramming/
Figure 7. The comparison of the average clock cycles with variable number
of shift registers
1502
1490