0% found this document useful (0 votes)
54 views4 pages

Astar 2

The document proposes an efficient hardware architecture for the A* shortest path search algorithm using a sorted shift register as a priority queue. It compares the proposed architecture to previous works and shows it has O(1) time complexity for extracting the minimum value node. Simulation results are presented to evaluate the proposed architecture.

Uploaded by

truht
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views4 pages

Astar 2

The document proposes an efficient hardware architecture for the A* shortest path search algorithm using a sorted shift register as a priority queue. It compares the proposed architecture to previous works and shows it has O(1) time complexity for extracting the minimum value node. Simulation results are presented to evaluate the proposed architecture.

Uploaded by

truht
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2009 Fifth International Joint Conference on INC, IMS and IDC

An Efficient Hardware Architecture of the A-star Algorithm


for the Shortest Path Search Engine

Woo-Jin Seo, Seung-Ho Ok Jin-Ho Ahn Sungho Kang Byungin Moon


School of Electrical Eng. & Dept. of Electronic Dept. of Electrical and School of Electrical Eng. &
Computer Science Engineering Electronic Engineering Computer Science
Kyungpook National University Hoseo University Yonsei University Kyungpook National University
Daegu, Korea Chungcheongnam-do, Korea Seoul, Korea Daegu, Korea
{swj82, wintiger}@ee.knu.ac.kr [email protected] [email protected] [email protected]

Abstract—There are several shortest-path search algorithms The hardware implementations of the A* algorithm have
such as A-star, D-star and Dijkstra. These algorithms are been studied in the previous researches. Z. K. Baker and M.
widely used in automotive vehicles and mobile navigation Gokhale investigate a small bubble sort core to produce the
systems. As the number of nodes is increased considerably, the extract-min function [2]. But its time complexity is O(N).
shortest-path algorithms implemented in software produce Also pipelined heap can be adopted [5], but time complexity
heavily computational overhead. In this paper, in order to is O(log N).
avoid computational overhead, we propose a hardware model Compared to the previous work, proposed architecture is
of the A-star algorithm for the shortest-path search engine.
based on a sorted shift register like a priority queue. And
Especially, we propose shift register based on efficient
particularly, time complexity is O(1). In Section II, we
hardware model and show simulation results in comparison
with previous works.
explain the A* algorithm and the point of improvement. The
details of the proposed architecture be described in Section
Keywords - shortest-path search algorithm; A-star algorithm; III and Section IV shows a simulation results. Finally, we
shift register; priority queue; sorting; conclude our work in Section V.
II. RELATED WORK
I. INTRODUCTION
There are many shortest-path search algorithms, Dijkstra,
Transportation infrastructure is a highly complex Bellman-Ford and A* algorithm [1, 6]. The one of them, A*
network and there are huge traffics. For this reason, the algorithm, is usually used in the industry of transportation.
telematics has been prospered in the automotive industry. The reason why A* algorithm is usually used is its flexibility
More recently, it has been specifically applied to Global that it can reduce the computational time. Fig. 1 represents
Positioning System(GPS) technology integrated with pseudo code of A* algorithm.
computers and mobile communications technology in
automotive navigation systems.
One of the most important parts of the automotive ` A* (start, goal)
navigation system is a shortest path search algorithm. In 1. Closed set = the empty set
general, as the number of nodes is consistently increased, the 2. Open set = includes start node
shortest-path search algorithms implemented in software 3. G[start] = 0, H[start] = H_calc[start, goal]
produce heavily computational overhead. Thus, the overall 4. F[start] = H[start]
5. While Open set  ‫׎‬
performance efficiency of the automotive navigation system
6. do CurNode Å EXTRACT-MIN- F(Open set)
could be decreased. Moreover, this performance degradation
7. if ( CurNode == goal ), then return BestPath
finally influence to the users of navigation because of 8. For each Neighbor Node N of CurNode
increased searching time of the shortest path and disturbance 9. If ( N is in Closed set ), then Nothing
of the telematics operation. 10. else if ( N is in Open set ),
As mentioned above, a shortest path algorithm 11. calculate N’s G, H, F
implemented in software produces huge overhead. But if the 12. If ( G[N on the Open set] > calculated G[N] )
shortest-path algorithm is implemented with reasonable 13. RELAX(N, Neighbor in Open set, w)
hardware, navigation system can efficiently avoid heavy 14. N’s parent=CurNode & add N to Open set
overhead. Consequently, the overall performance of the 15. else, then calculate N’s G, H, F
navigation system can be increased. 16. N’s parent = CurNode & add N to Open
The most important algorithm of the shortest path
problem is the A* algorithm which finds single pair shortest- Figure 1. A* Algorithm pseudo code
path using heuristic function to speed up the search. For this
reason, we aim to design hardware model of the A* Initial conditions are represented from line 1 through line
algorithm. 4. The open set is defined as a set of already known nodes.

978-0-7695-3769-6/09 $26.00 © 2009 IEEE 1487


1499
DOI 10.1109/NCM.2009.371
Nodes in the open set have been neighbor nodes of certain extract the node having the minimum value of the F. The
current nodes and they can be current nodes. And the closed priority queue is one of the examples.
set is defined as a set of nodes that have been current nodes. Fig. 2 represents the sorted shift register. Every register
In order to find the shortest path, the value of G, H and F corresponds to the comparator. The comparator transmits
are used. The value of the G is the actual shortest distance control signals to the controller. According to control signals,
from the initial node to the current node. The value of the H the controller gives signals to shift register and shift registers
is the estimated distance from the current node to the goal are sorted. On this account, time complexity of the sorted
node, and the value of the F is the sum of the G and H. shift register is O(1).
As shown in line 5, this A* algorithm is executed until
the open set is not empty. If the open set is not empty, then
EXTRACT-MIN-F function is executed. It makes a current
node which has minimum value of F in the open set. Then
this current node is deleted from the open set and added to
the closed set.
After that, the next step is checking whether the current
node and the goal node are same or not. If it is the goal node,
it means that we get the shortest path and return the solution.
On the contrary, if it is not the goal node, its neighbor nodes
Ns are in three cases, line 9, 10 and 15.
If the neighbor node N is in the closed set, then nothing Figure 2. Sorted Shift Register
to do (line 9). Two other cases have a common function of
calculating the values of G, H and F. The difference of two Now, we would consider that shift register is finite. As
cases is whether it needs to execute a relaxation or not. If the the loop is repeated, the number of elements of the open set
neighbor node is in the open set, the relaxation is needed. is increased. If it is bigger than the number of shift register,
The relaxation is checking the possibility to improve the then proposed architecture isn’t reliable any more. Reference
value of G. After the relaxation, the value of G can be [2] mentioned how the number of registers is determined.
changed to lower value, if the calculated value of G in the They experiment on the specific map, Los Angeles street
relaxation is lower than the previous value of G (lines graph. As a result, they implement queues having the size of
12~13). 16 elements. In this case, 16 elements are determined by
The point of design in hardware is a EXTRACT-MIN-F specific map. The size of queues should be changed, if the
function. Its function includes searching the open set and map is not about Los Angeles street graph. It means that
finding a node having minimum value of the F. But the open additional experiment is needed, whenever you adopt other
set is controlled independently, so searching the open set maps. But it is not our goal. Our proposal is universally
isn’t a problem. Only finding a minimum value is applicable architecture.
considerable. In order to finding a minimum value, many We’ll describe our main idea in the sorted shift register.
comparisons are needed. Therefore it takes long time There are two shift registers in Fig. 3.
relatively. ‡‘˜‡†‘†‡ ‡‹ ‡‹
ͳǦ͵ ͳǦͶ
EXTRACT-MIN-F time ‫ ן‬Nopen * Taccess (1) Š‹ˆ–‡‰‹•–‡” Š‹ˆ–‡‰‹•–‡”

‡‹ ‡‹
The computational time satisfies (1). Nopen represents the ‡’–› •–ƒ”– —”‘†‡
ͳǦʹ ͳǦͳ
—”‘†‡
number of nodes in the open set and Taccess is the time of
memory access. Taccess can be neglected, because it is
affected by what kind of memory is used. It is system’s ‡‹‘†‡‡ƒ” ŠƬƒŽ —Žƒ–‹‘ ‡‹‘†‡‡ƒ” ŠƬƒŽ —Žƒ–‹‘
specification, so we are not interested in that term. Therefore,
its performance is determined by the number of nodes in the ሺƒሻ ሺ„ሻ
open set, Nopen.
Reference [2] is a case of comparing all nodes. It
proposes bubble sort and computational time is determined ‡‘˜‡†‘†‡ ‡‹ ‡‹ ‡‘˜‡†‘†‡ ‡‹ ‡‹
by Nopen. But its time complexity is O(Nopen). Also heap is ʹǦʹ ʹǦ͵ ʹǦʹ ʹǦ͵
possible to be implemented ([5]). Its time complexity to sort Š‹ˆ–‡‰‹•–‡” Š‹ˆ–‡‰‹•–‡”

is nearly O(log Nopen). But our proposed architecture shows ‡‹ ‡‹ ‡‹ ‡‹
—”‘†‡ —”‘†‡
better performance, O(1). ʹǦͳ ͳǦʹ ͳǦʹ ʹǦͳ

III. PROPOSED ARCHITECTURE


A shift register is a group of flip flops set up in a linear ‡‹‘†‡‡ƒ” ŠƬƒŽ —Žƒ–‹‘ ‡‹‘†‡‡ƒ” ŠƬƒŽ —Žƒ–‹‘
fashion which have their inputs and outputs connected ሺ ሻ ሺ†ሻ
together. If we can sort shift register, then it is possible to
Figure 3. Sorted shift register function on A* algorithm

1500
1488
First time, start node is stored in shift register in (a), Fig. In according to the map, the number of nodes is a striking
3. The open set’s element is only start node, so start node is contrast. Because of that reason, architecture included
extracted to the CurNode. Because of changing the CurNode, memory of enough size. The format of the memory is not
the NeiNodeSearch & Calculation module figures out only important, but also reducing memory access is a key of
information of each neighbor node of the CurNode. implementation. The OpenListSort & Update module is
Neighbor nodes are defined like Nei X-Y. The X means Xth using linked list after valid loops of the shift register. Linked
loop and the Y is numbered in the order of smaller value of F. list minimize memory access, so it is suitable to our
In (b), Fig. 3, four neighbor nodes are calculated and two architecture.
minimum of them, Nei 1-1 and Nei 1-2, are stored and others, Fig. 4 represents the top-level block diagram. There is the
Nei 1-3 and Nei 1-4, are removed. central controller to control each module, and the Bestpath
On the next loop, we can execute EXTRACT-MIN-F, module returns the shortest path. The SRAM stores node
because shift register guarantees an element having a information and the open set. Two shaded modules in Fig. 4
minimum value of F. Then the Nei 1-1 is extracted, and the manage the open set.
NeiNodeSearch & Calculation module outputs second loop
set of neighbor nodes, the Nei 2-Y. In (c), (d), Fig. 3, it IV. SIMULATION RESULTS
shows possible array of the shift register. The Nei 1-2 having In order to verify, we made maps having variable nodes,
a minimum value of F and the Nei 2-Y are compared and it 16, 64, 256 and 1024. They are quite reliable, because each
makes a candidate node of the next current node. This node node is located in rectangular coordinates and the cost (or
is guaranteed to be a current node until next loop, because distance) of each edge is calculated in that coordinate
next loop is not affected by removed node in (b), Fig. 3. systems. And we design an A* algorithm model by using C
In other words, two nodes in the shift register in (b), Fig. language.
3 are candidate nodes which can be the CurNode until next Fig. 5 represents the comparison of each way with
two loop and two nodes removed from the shift register can different number of nodes.
be the CurNode after three loops later.
From this, we can analogize the fact that the number of The clock cycles to sort = Tsort + Taccess (3)
registers determines the number of valid loops. And valid
loop is started from the event that the shift register is full of In order to sort, the clock cycles are calculated like (3).
node information. Therefore the structure in Fig. 2 is valid Tsort is the time to sort and Taccess is the time to access
until 10 loops. memory. The sort is executed after 10 loops. In order to
But after 10 cycles, this architecture is not reliable, so we simulate the proposed model, we randomly choose 15 pairs
need an additional module. This additional module has to of nodes, the start and goal nodes.
sort the open set and update the sorted shift register. It is
nearly same to Fig. 2, but it has more registers and can
access to memory.

The number of registers Ą 10 * 4 -10 = 30 (2)

Equation (2) represents a approximate calculation. First


number 10 means the number of loops. The number 4 is the
maximum number of neighbor nodes(we assume that an
intersection has four edges), and second number 10 means
the number of nodes is changed from the open set to the
closed set. Therefore additional module’s number of registers
is determined by 30. Figure 5. Average clock cycles as a function of the number of nodes and
sorting algorithm

The time complexity of the bubble sort algorithm is


O(N2). But our architecture’s time complexity is only O(1),
because every register corresponds to each one of
comparators. For that reason, we can reduce average clock
cycles.
Fig. 5 represents log scale on y-axis. As mentioned above,
the average clock cycles include memory fetch time and the
time to sort each list. The clock cycles of bubble sort need
approximately 10 times more than the proposed sort time. As
the number of nodes is increased, the increment of the
average clock cycles to sort is reduced. It means proposed
architecture is efficient as the number of nodes is increased.
Figure 4. The top-level block diagram of proposed A* algorithm

1501
1489
Fig. 6 shows memory fetch clock cycles with specific But you should keep in mind that the relation between
start and goal nodes. This simulation is carried out with a the number of shift registers and cost is a trade-off. The more
map having 256 nodes. The start node and goal node are registers are implemented, the more size of area is needed.
randomly selected.
V. CONCLUSION AND FUTURE WORK
This paper gives novel idea to design a shortest path
search engine. The essential point of our work is reducing
memory access and managing the open set efficiently. Many
times of memory accesses cause large power consumption in
the system. Thus we propose a sorted shift register, because
it is appropriate to be implemented as hardware.
Sorted shift register’s EXTRACT-MIN-F is constant
time complexity, O(1). And sorting executes each valid
loops later, not every loop. Those features make possible to
implement pipeline architecture.
Also linked list data structure is efficient to our hardware
model. It minimizes memory access, so our architecture has
a feature of low power consumption.
Our future work is designing in hardware using proposed
Figure 6. The comparison of clock cycles of memory fetch between architecture, not modeling. If it is implemented as hardware,
heap and proposed architecture with specific node there are some ways to reduce cost. For example, we can put
two modules managing the open set together. Because both
To sort with heap architecture, it should execute many of them are formed with an array of shift registers. And
comparisons between parent node and two children nodes. Implementing pipeline is one of them. Pipeline architecture
So it causes more memory access. On the other hand, improves the system significantly. Also [2], [3] and [4]
proposed architecture needs short time to sort, O(1). Memory commonly said parallel architecture improves performance
access is occurred after 10 loops, and it is minimized by efficiently. Our proposed architecture is also possible to be
linked list architecture. Proposed architecture needs nearly implemented in parallel.
half of time to memory access. The more memory accesses
cause the more power consumption. Accordingly, proposed
architecture is suitable for the hardware implementation.
There are simulation results with variable number of shift
registers in Fig. 7. As the number of shift registers is
increased, the average clock cycles are reduced, because the
number of registers indicates the number of valid loops. The REFERENCES
number of valid loops means the period of sorting the [1] T. H. Cormen, C. E. Leiserson, R. L. Rivest and C. Stein,
memory. As the period is longer, the number of times to sort Introduction to Algorithms, 2rd ed., The MIT Press, 2001, pp. 580-
the memory is less needed. It can be expected to reduce 619.
sorting time and to improve performance with variable [2] Z. K. Baker and M. Gokhale, “On the Acceleration of Shortest Path
Calculations in Transportation Networks”, Proc. The Symposium on
number of them. Field-Programmable Custom Computing Machines (FCCM’07),
April 2007, pp 23-34, doi:10.1109/FCCM.2007.46
[3] I. Fernandez, J. Castillo, C. Pedraza, C. Sanchez and J. I. Martinez,
“Parallel Implementation of The Shortest Path Algorithm on FPGA”,
Proc. The Southern Conference on Programmable Logic, March 2008,
pp. 245-248, doi : 10.1109/SPL.2008.4547768
[4] M. Tommiska and J. Skyttl, “Dijkstra’s Shortest Path Routing
Algorithm in Reconfigurable Hardware”, in Lecture Notes in
Computer Science(LNCS), vol. 2147/2001, Springer Berlin /
Heidelberg, 2001, pp. 653-657
[5] A. Ioannou and M. Katevenis. “Pipelined Heap (Priority Queue)
Management for Advanced Scheduling in High Speed Networks”. In
IEEE/ACM Transactions on Networking, vol. 15, issue 2, April 2007,
pp. 450-461, doi: 10.1109/TNET.2007892882
[6] A. Patel, Amit’s A* Pages,
http://theory.stanford.edu/~amitp/GameProgramming/

Figure 7. The comparison of the average clock cycles with variable number
of shift registers

1502
1490

You might also like