FPGA-based Custom Microprocessor Architectures
FPGA-based Custom Microprocessor Architectures
Jonathan Phillips
Electrical and Computer Engineering
Utah State University
4120 Old Main Hill, Logan, UT 84322; (435) 757-8341
[email protected]
ABSTRACT
Autonomous dynamic event scheduling using Iterative Repair techniques is an essential component of successful
space missions, as it enables spacecraft to adaptively schedule tasks in a dynamic, real-time environment. Event
rescheduling is a compute-intensive process. Typical applications involve scheduling hundreds of events that share
tens or hundreds of resources. We are developing a set of tools for automating the derivation of application-specific
processors (ASIPs) from ANSI C source code that perform this scheduling in an efficient manner. The tools will
produce VHDL code targeted for a Xilinx Virtex 4 FPGA (Field Programmable Gate Array). Features of FPGAs,
including large processing bandwidth and embedded ASICs and block RAMs, are exploited to optimize the design.
Iterative Repair problems are generally solved using Simulated Annealing, which works by gradually improving an
initial solution over thousands of iterations. We propose an FPGA-based architectural framework derived from
ANSI C function-level blocks for accelerating these computations by optimizing the process of (1) generating a new
solution, (2) evaluating the solution, and (3) determining whether the new solution should be accepted. Each step is
implemented in VHDL through data- and control-flow analysis of the source C code. We discuss an architecture
template for automated processor design.
Alter Processor
Figure 3: Top-level architecture depiction for a The second stage in the Iterative Repair pipeline is the
pipelined Iterative Repair processor. Black lines Alter Processor. One event is selected at random from
represent data buses and red lines signify control the solution string. The start time of this event is
signals.
Evaluate Processor
The Evaluate Processor is by far the most complex of
all the pipeline stages in the Iterative Repair
architecture. This processor’s job is to compute a
numerical score for a potential solution. The score of a
solution to the Iterative Repair problem consists of 3
components. A penalty is incurred for total clock
cycles consumed by the schedule. A second penalty is
assessed for double-booking a resource on a given
clock cycle. Thirdly, a penalty is assigned for
dependency violations, which occur when event “b”
Figure 5: The Alter Processor. A random number depends upon the results of event “a”, but event “b” is
generator is used to modify the incoming solution scheduled before event “a”.
changed to a random value smaller than the maximum
latency. This stage shown in fig. 5 could be accelerated Fig. 6 shows an intermediate output of the tool as it
by introducing an additional random number generator works upon the Evaluate Processor. Fig. 6 is a control-
and an additional divider, allowing for maximum data flow graph depicting basic blocks, data
concurrency. But it is not necessary as a 15-cycle dependencies, control dependencies, and data
integer divider allows this stage to terminate in 21 clock operations for the evaluate function described above.
cycles, regardless of the size of the solution string. As Each of the evaluation components described above are
solutions generally consist of hundreds of events, even implemented as an individual pipelined processor.
the simple Copy Processor will have a greater latency Because the three components of the score can be
than the Alter Processor. The alter controller is based computed independently, all three processors can run in
on a counter that starts when the “step” signal is parallel, thus saving substantial clock cycles. The first
received from the Main Controller, control logic to sub-processor, termed the Dependency Graph Violation
enable register writing on the “address” and “data” Processor, or DGVP, is shown in fig. 7. The processor
registers on the proper clock cycles, and a “done” signal is a 4-stage pipeline. In the first and second stages, an
adjacency matrix is used to index the solution memory
and determine when parent/child pairs of events are
scheduled. The third and fourth stages determine the
magnitude of the penalty, if any, to be incurred because
the child event is scheduled before the parent event
terminates. This penalty has a magnitude in order to
encourage offending parent/child pairs to gradually
move toward each other, thus decreasing the penalty
over several iterations and causing the schedule to
become more optimized.
Figure 6: Control-Data Flow Graph of the Evaluate Figure 7: Dependency Graph Violation Processor
function. Information contained in this graph can architecture. This four-stage pipelined processor
be used to create an optimal application-specific computes all dependency graph violations for a
processor given schedule
Finally, the tool will be designed to accept additional 8. Mehta, G., R. R. Hoare, J. Stander, and A. K.
Jones, "Design space exploration for low-power
optimization constraints such as using Triple Modular
reconfigurable fabrics," in Parallel and
Redundancy (TMR) or other techniques for
implementing fault tolerance along with power Distributed Processing Symposium, 2006. IPDPS
2006. 20th International, 2006, p. 4 pp.
optimization strategies.
9. Miramond, B. and J. M. Delosme, "Design space
exploration for dynamically reconfigurable
architectures," in Design, Automation and Test in
Europe, 2005. Proceedings, 2005, pp. 366-371
Vol. 1.