Ug Nios2 Custom Instruction
Ug Nios2 Custom Instruction
User Guide
Preliminary Information
ii Altera Corporation
Preliminary
Contents
Generate the SOPC Builder System & Compile in Quartus II Software ............................... 3–10
Accessing the Custom Instruction from Software .......................................................................... 3–11
iv Altera Corporation
About this User Guide
Date Description
December 2004 Updates for the Nios II version 1.1 release.
September 2004 Updates for the Nios II version 1.01 release.
May 2004 First release of custom instruction user guide for the Nios II
processor.
How to Find ■ The Adobe Acrobat Find feature allows you to search the contents of
a PDF file. Click the binoculars toolbar icon to open the Find dialog
Information box
■ Bookmarks serve as an additional table of contents
■ Thumbnail icons, which provide miniature previews of each page,
provide a link to the pages
■ Numerous links, shown in green text, allow you to jump to related
information
Altera Corporation v
Preliminary
How to Find Information Nios II Custom Instruction User Guide
How to Contact For the most up-to-date information about Altera products, go to the
Altera world-wide web site at www.altera.com. For technical support on
Altera this product, go to www.altera.com/mysupport. For additional
information about Altera products, consult the sources shown below.
Note to table:
(1) You can also contact your local Altera sales office or sales representative.
vi Altera Corporation
Preliminary
About this User Guide How to Find Information
Conventions
Variable names are enclosed in angle brackets (< >) and shown in italic type.
Example: <file name>, <project name>.pof file.
Initial Capital Letters Keyboard keys and menu names are shown with initial capital letters. Examples:
Delete key, the Options menu.
“Subheading Title” References to sections within a document and titles of on-line help topics are
shown in quotation marks. Example: “Typographic Conventions.”
Courier type Signal and port names are shown in lowercase Courier type. Examples: data1,
tdi, input. Active-low signals are denoted by suffix n, e.g., resetn.
Anything that must be typed exactly as it appears is shown in Courier type. For
example: c:\qdesigns\tutorial\chiptrip.gdf. Also, sections of an
actual file, such as a Report File, references to parts of files (e.g., the AHDL
keyword SUBDESIGN), as well as logic function names (e.g., TRI) are shown in
Courier.
1., 2., 3., and Numbered steps are used in a list of items when the sequence of the items is
a., b., c., etc. important, such as the steps listed in a procedure.
■ ● • Bullets are used in a list of items when the sequence of the items is not important.
v The checkmark indicates a procedure that consists of one step only.
1 The hand points to information that requires special attention.
The caution indicates required information that needs special consideration and
c understanding and should be read prior to starting or continuing with the
procedure or process.
The warning indicates information that should be read prior to starting or
w
continuing the procedure or processes
r The angled arrow indicates you should press the Enter key.
f The feet direct you to more information on a particular topic.
Introduction With the Altera® Nios® II embedded processor, system designers can
accelerate time-critical software algorithms by adding custom
instructions to the Nios instruction set. With custom instructions, system
designers can reduce a complex sequence of standard instructions to a
single instruction implemented in hardware. System designers can use
this feature for a variety of applications, e.g., to optimize software inner
loops for digital signal processing (DSP), packet header processing, and
computation-intensive applications. The Nios II CPU configuration
wizard, which is accessed via the Quartus® II software’s SOPC Builder,
provides a graphical user interface (GUI) used to add up to 256 custom
instructions to the Nios II processor.
Custom
Logic
Nios II
A ALU
+
-
<< Result
>>
&
B
This chapter:
Custom With Nios II processor custom instructions, system designers are able to
take full advantage of the flexibility of FPGAs to meet system
Instruction performance requirements. Custom instructions allow system designers
Overview to add custom functionality to the Nios II processor ALU.
Custom
Logic
dataa[31..0]
Combinatorial result [31..0]
datab[31..0]
clk
clk_en
Multi-cycle done
reset
start
n[7..0] Extended
a[4..0]
readra
b[4..0] Internal
readrb Register File
c[4..0]
writerc
Figure 1–2 also shows an optional interface to external logic. The interface
to external logic allows designers to include a custom interface to system
resources outside of the Nios II processor data path.
Custom There are different custom instruction architectures available to suit the
application’s requirements. The architectures range from a simple, single-
Instruction cycle combinatorial architecture to an extended variable-length, multi-
Architectural cycle custom instruction architecture. The chosen architecture determines
what the hardware interface looks like.
Types
Table 1–1 shows custom instruction architectural types, application, and
the associated hardware interface.
Table 1–1. Custom Instruction Architectural Types, Application & Hardware Interface
dataa[31..0]
Combinatorial result[31..0]
datab[31..0]
dataa[31..0]
result[31..0]
datab[31..0]
clk
clk_en done
reset
start
■ Fixed length: You specify the required number of clock cycles during
system generation
■ Variable length: The start and done signals are used in a
handshaking scheme to determine when the custom instruction
execution is complete.
As indicated in Table 1–3, the clk, clk_en, and reset signals are
required for multi-cycle custom instructions. However, the start, done,
dataa[31..0], datab[31..0], and result[31..0] signals are
optional, and should only be used if required for the specific application.
■ The CPU asserts the active high start port on the first clock cycle of
execution when the custom instruction issues through the ALU. At
this time, the dataa[31..0] and datab[31..0] signals have
valid values and remain valid throughout the duration of the custom
instruction execution.
■ Fixed or variable length custom instruction port operation:
● Fixed length: The CPU asserts start, waits a specified number
of clock cycles, and then reads result[31..0]. For an n-cycle
operation, the custom logic block must present valid data on the
(n-1) rising edge after the start signal is asserted.
● Variable length: The CPU waits until the active high done signal
is asserted. The CPU reads the result[31..0] port on the
clock edge that done is asserted. The custom logic block should
present data on the result[31..0] port on the same clock
that the done signal is asserted.
■ The Nios II system clock feeds the custom logic block’s clk signal,
and the Nios II master reset feeds the active high reset signal. The
reset signal is asserted only when the whole Nios II system is reset.
■ The custom logic block should use the active high clk_en signal as
a conventional clock qualifier signal and should ignore all clock
rising edges while clk_en is deasserted.
■ Any port in the custom logic block that is not recognized as a custom
instruction signal is considered to be an external interface signal.
■ Multi-cycle custom instructions can be further optimized utilizing
the extended, internal register file, and external interface custom
instructions. Refer to “Extended Custom Instruction Architecture”
on page 1–9, “Internal Register File Custom Instruction
Architecture” on page 1–10, or “External Interface Custom
Instruction” on page 1–12.
dataa[31..0] bit-swap
operation
byte-swap
1 result[31..0]
operation
half-word-swap
operation
n[7..0]
The Figure 1–7 swap operations are performed on data coming in via the
dataa[31..0] port. The n[7..0] port is used as a select signal on an
output multiplexer to select which operation is presented to the
result[31..0] port.
dataa[31..0]
readrb
Table 1–4 lists the internal register file custom instructions signals. The
signals are optional and should only be used if required by the
application.
dataa[31..0]
result[31..0]
datab[31..0]
clk
clk_en done
reset
start
Introduction The Nios II processor custom instruction details are abstracted from the
application code. During the build process the Nios II integrated
development environment (IDE) automatically generates macros that
allow easy access from application code to custom instructions.
1. #include "system.h"
2.
3.
4. int main (void)
5. {
6. int a = 0x12345678;
7. int a_swap = 0;
8.
9. a_swap = ALT_CI_BSWAP(a);
10. return 0;
11.}
In this example, the system.h file is included on line 1 to locate the custom
instruction macro definitions. Two integers are declared; one on line 6 and
one on line 7. Integer a is passed as input to the bit swap custom
instruction with the results loaded into a_swap on line 9.
Example C:
To support non-integer input types, you should define macros that map
to the specific built-in function required for the application.
Example D:
readra
readrb
writerc
Instruction Fields: A = Register index of operand A
B = Register index of operand B
C = Register index of operand C
N = 8-bit number that selects instruction
readra = 1 if instruction uses rA, 0 otherwise
readrb = 1 if instruction uses rB, 0 otherwise
writerc = 1 if instruction provides result for rC, 0 otherwise
The following shows the syntax for two examples of custom instruction
assembler calls:
Introduction This chapter walks you through the process of implementing a Nios® II
processor custom instruction, and illustrates the enormous time-savings
that are possible with Nios II custom instructions.
Hardware & The instructions in this chapter require the following hardware and
software:
Software
■ Quartus® II software version 4.1, SP1 or later
Requirements ■ Nios II development kit
■ Nios development board, Stratix® II, Stratix, Stratix Professional, or
Cyclone™ Edition
Tutorial Files The tutorial design files are installed with the Nios II development kit.
The hardware design files are stored in the tutorials directory: <Nios II kit
path>\tutorials\Nios2_Custom_Instruction\<board version>\
Each development board has its own tutorial design file directory (see
Table 3–1). The Quartus II project files are contained in the
quartus_project directory and the hardware for the custom instruction is
contained in the rtl directory.
Running the The following guides you through the steps required to run the leading-
zeros-detector-software algorithm, while providing an opportunity to see
Software the design’s functionality and software algorithm’s performance.
Algorithm in This section includes:
Nios II IDE
■ Creating a new Nios II IDE project
■ Building and downloading the software application
2. Choose New > C/C++ Application… (File menu). The first page of
the New Project wizard appears. See Figure 3–1 on page 3–3.
4. Leave the default selection for the project’s name and ensure that
Use Default Location is checked.
8. Click Open to return to the New Project wizard. The SOPC Builder
System field from the Select Target Hardware window is now
specified with the custom instruction project SOPC Builder system.
(See Figure 3–1.) In addition, the CPU field now contains the name
of the CPU in the system.
9. Click Finish.
At this point, the new Nios II project file creation is complete. Figure 3–2
shows that upon successful project creation, the C/C++ Projects window
contains the following:
■ The contents of main should include three sets of test data, i.e., best
case, worse case, and random data sets that have the leading zeros
counted and placed into another array.
■ There are conditional compile statements based on the existence of
the ALT_CI_LEADING_ZERO_DETECTOR symbol. This is the name
of the macro that is defined when the leading-zeros custom
instruction is added to the system later in the tutorial.
When the Nios II IDE detects that the SOPC Builder system for the
current project differs from the SOPC Builder system on the board, you
must download an appropriate configuration file for the FPGA.
8. In the Nios II IDE, choose Run As > Nios II Hardware (Run menu).
This will start the build process and download the software image
to the development board.
After the image is downloaded, the terminal will display the results of
running 500 samples through the leading zeros detector in software. The
worse case number is if all the samples are a value of 0x1. The best-case
numbers are for the case of 0x80000000. The random case is random
samples. The following is an example of the three sets of test data:
Now measuring the time to find leading zeros for 500 samples
**********************************************
Worst Case
[Software] Number of clocks 138410
The number mills-seconds: 2.7681999207
Random Case
[Software] Number of clocks 21926
The number mills-seconds: 0.4385199845
Best Case
[Software] Number of clocks 12434
The number mills-seconds: 0.2486800104
**********************************************
Program Complete.
Implementing This section walks you through the process of implementing Nios II
custom instructions in hardware, and also provides custom instruction
Custom tool-flow explanations.
Instruction To implement the Nios II custom instruction for the leading-zeros design,
Hardware in you must:
SOPC Builder 1. Open the custom instruction tutorial hardware design.
8. Click the Read port-list from files button. This will read the port
information from the HDL files. The Figure 3–5 example uses
dataa[31..0] and result[31..0] ports.
Figure 3–6 shows that once the custom instruction is imported, the top
level module name is listed in the Name field.
The Clock Cycles field shows that the instruction is a combinatorial logic
custom instruction. If the tutorial custom instruction was a fixed length,
multi-cycle custom instruction instead, you can edit this field to specify
the number of clocks. In the case of a variable length multi-cycle custom
instruction, the Clock Cycles field displays Variable.
The N port field displays a "-," indicating that the leading_zero_ detector
design is not an extended custom instruction. In the case of an extended
custom instruction, this field shows the width of the N port. The op-code
extension displays 00000000 0, that indicates the encoding of the N
field in the instruction word.
10. Click Finish to add the leading zeros detector custom instruction to
the system and return to the SOPC Builder window.
Accessing the Now that you have added the custom logic block to hardware, you are
ready to access it from software. Because there is a change to the SOPC
Custom Builder system contents, the Nios II IDE project needs to be rebuilt to
Instruction from accommodate the changes. One important change will be that the
system.h header file will be updated with the macros for the custom
Software instruction.
Now measuring the time to find leading zeros for 500 samples
**********************************************
Worst Case
[Software] Number of clocks 139632
The number mills-seconds: 2.7926399708
[Hardware] Number of clocks 8443
The number mills-seconds: 0.1688600034
Random Case
[Software] Number of clocks 22419
The number mills-seconds: 0.4483799934
[Hardware] Number of clocks 8056
The number mills-seconds: 0.1611199975
Best Case
[Software] Number of clocks 12402
The number mills-seconds: 0.2480400205
[Hardware] Number of clocks 8032
The number mills-seconds: 0.1606400013
**********************************************
VHDL & Verilog This section provides VHDL and Verilog HDL custom instruction
templates that you can reference when writing custom instructions in
HDL Templates VHDL and Verilog HDL. You can download the template files from the
Altera® world-wide website at www.altera.com/nios.
VHDL Template
Sample VHDL template file:
LIBRARY __library_name;
USE __library_name.__package_name.ALL;
ENTITY __entity_name IS
PORT(
signal clk : IN STD_LOGIC; -- CPU's master-input clk <required for multi-cycle>
signal reset : IN STD_LOGIC; -- CPU's master asynchronous reset <required for multi-cycle>
signal clk_en: IN STD_LOGIC; -- Clock-qualifier <required for multi-cycle>
signal start: IN STD_LOGIC; -- True when this instr. issues <required for multi-cycle>
signal done: OUT STD_LOGIC; -- True when instr. completes <required for variable muli-cycle>
signal dataa: IN STD_LOGIC_VECTOR (31 DOWNTO 0); -- operand A <always required>
signal datab: IN STD_LOGIC_VECTOR (31 DOWNTO 0); -- operand B <optional>
signal n: IN STD_LOGIC_VECTOR (7 DOWNTO 0); -- N-field selector <required for extended>
signal a: IN STD_LOGIC_VECTOR (4 DOWNTO 0); -- operand A selector <used for Internal register
file access>
signal b: IN STD_LOGIC_VECTOR (4 DOWNTO 0); -- operand B selector <used for Internal register
file access>
signal c: IN STD_LOGIC; -- result destination selector <used for Internal register file
access>
signal readra: IN STD_LOGIC; -- register file index <used for Internal register file access>
signal readrb: IN STD_LOGIC; -- register file index <used for Internal register file access>
signal writerc: IN STD_LOGIC; -- register file index <used for Internal register file access>
signal result : OUT STD_LOGIC_VECTOR (31 DOWNTO 0) -- result <always required>
);
END __entity_name;
ARCHITECTURE a OF __entity_name IS
signal clk: IN STD_LOGIC;
signal reset : IN STD_LOGIC;
signal clk_en: IN STD_LOGIC;
signal start: IN STD_LOGIC;
signal readra: IN STD_LOGIC;
signal readrb: IN STD_LOGIC;
signal writerc: IN STD_LOGIC;
signal n: IN STD_LOGIC_VECTOR (7 DOWNTO 0);
signal a: IN STD_LOGIC_VECTOR (4 DOWNTO 0);
signal b: IN STD_LOGIC_VECTOR (4 DOWNTO 0);
signal c: IN STD_LOGIC_VECTOR (4 DOWNTO 0);
signal dataa: IN STD_LOGIC_VECTOR (31 DOWNTO 0);
signal datab: IN STD_LOGIC_VECTOR (31 DOWNTO 0);
END a;
module __module_name(
clk, // CPU's master-input clk <required for multi-cycle>
reset, // CPU's master asynchronous reset <required for multi-cycle>
clk_en, // Clock-qualifier <required for multi-cycle>
start, // True when this instr. issues <required for multi-cycle>
done, // True when instr. completes <required for variable muli-cycle>
dataa, // operand A <always required>
datab, // operand B <optional>
n, // N-field selector <required for extended>
a, // operand A selector <used for Internal register file access>
b, // operand b selector <used for Internal register file access>
c, // result destination selector <used for Internal register file access>
readra, // register file index <used for Internal register file access>
readrb, // register file index <used for Internal register file access>
writerc,// register file index <used for Internal register file access>
result // result <always required>
);
input clk;
input reset;
input clk_en;
input start;
input readra;
input readrb;
input writerc;
input [7:0] n;
input [4:0] a;
input [4:0] b;
input [4:0] c;
input [31:0]dataa;
input [31:0]datab;
output[31:0]result;
output done;
// Port Declaration
// Wire Declaration
// Integer Declaration
// Concurrent Assignment
// Always Construct
endmodule
Built-In This section lists the following custom instruction built-in functions:
Hardware & Most first-generation Nios custom instructions will port over to a Nios II
system with minimal changes. This section clarifies hardware and
Software Porting software considerations when porting first-generation Nios custom
Considerations instructions to your Nios II system.
Any other use of the prefix may be accomplished with one of the Nios II
custom instruction architecture types. Refer to “Custom Instruction
Architectural Types” on page 1–4.