Ug Nios2 Custom Instruction-683242-666927
Ug Nios2 Custom Instruction-683242-666927
Guide
Contents
2
Contents
3
683242 | 2020.04.27
Send Feedback
The custom instruction logic connects directly to the Nios II arithmetic logic unit (ALU)
as shown in the following figure.
Custom
Logic
Nios II
A ALU
+
–
<< Result
>>
&
B
Related Information
• Custom Instruction Software Interface on page 16
• Building the CRC Example Hardware on page 23
Intel Corporation. All rights reserved. Intel, the Intel logo, and other Intel marks are trademarks of Intel
Corporation or its subsidiaries. Intel warrants performance of its FPGA and semiconductor products to current
specifications in accordance with Intel's standard warranty, but reserves the right to make changes to any ISO
products and services at any time without notice. Intel assumes no responsibility or liability arising out of the 9001:2015
application or use of any information, product, or service described herein except as expressly agreed to in Registered
writing by Intel. Intel customers are advised to obtain the latest version of device specifications before relying
on any published information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
1. Nios II Custom Instruction Overview
683242 | 2020.04.27
Custom
Logic
dataa[31..0]
Combinational
Combinatorial result [31..0]
datab[31..0]
clk
clk_en
Multi-cycle done
reset
start
n[7..0] Extended
a[4..0]
readra
b[4..0] Internal
readrb Register File
c[4..0]
writerc
A Nios II custom instruction logic block interfaces with the Nios II processor through
three ports: dataa, datab, and result.
The custom instruction logic provides a result based on the inputs provided by the
Nios II processor. The Nios II custom instruction logic receives input on its dataa
port, or on its dataa and datab ports, and drives the result to its result port.
The Nios II processor supports several types of custom instructions. The figure above
shows all the ports required to accommodate all custom instruction types. Any
particular custom instruction implementation requires only the ports specific to its
custom instruction type.
The figure above also shows a conduit interface to external logic. The interface to
external logic allows you to include a custom interface to system resources outside of
the Nios II processor datapath.
5
1. Nios II Custom Instruction Overview
683242 | 2020.04.27
For each custom instruction, the Nios II Embedded Design Suite (EDS) generates a
macro in the system header file, system.h. You can use the macro directly in your C
or C++ application code, and you do not need to program assembly code to access
custom instructions. Software can also invoke custom instructions in Nios II processor
assembly language.
Related Information
Custom Instruction Software Interface on page 16
6
683242 | 2020.04.27
Send Feedback
(1)
The clk_en input signal must be connected to the clk_en signals of all the registers in the
custom instruction, in case the Nios II processor needs to stall the custom instruction during
execution.
Intel Corporation. All rights reserved. Intel, the Intel logo, and other Intel marks are trademarks of Intel
Corporation or its subsidiaries. Intel warrants performance of its FPGA and semiconductor products to current
specifications in accordance with Intel's standard warranty, but reserves the right to make changes to any ISO
products and services at any time without notice. Intel assumes no responsibility or liability arising out of the 9001:2015
application or use of any information, product, or service described herein except as expressly agreed to in Registered
writing by Intel. Intel customers are advised to obtain the latest version of device specifications before relying
on any published information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
2. Custom Instruction Hardware Interface
683242 | 2020.04.27
Internal register file Custom logic blocks that access internal register • dataa[31:0]
files for input or output or both. • datab[31:0]
• result[31:0]
• clk
• clk_en(1)
• start
• reset
• done
• n[7:0]
• a[4:0]
• readra
• b[4:0]
• readrb
• c[4:0]
• writerc
External interface Custom logic blocks that interface to logic Standard custom instruction ports, plus user-
outside of the Nios II processor’s datapath defined interface to external logic.
A basic combinational custom instruction block, with the required ports shown in
"Custom Instruction Types", implements a single custom operation. This operation has
a selection index determined when the instruction is instantiated in the system using
Platform Designer.
Related Information
• Extended Custom Instructions on page 11
• Custom Instruction Types on page 7
List of standard custom instruction hardware ports, to be used as signal types
8
2. Custom Instruction Hardware Interface
683242 | 2020.04.27
dataa[31..0]
Combinational result[31..0]
datab[31..0]
In the figure above, the dataa and datab ports are inputs to the logic block, which
drives the results on the result port. Because the logic function completes in a single
clock cycle, a combinational custom instruction does not require control ports.
The only required port for combinational custom instructions is the result port. The
dataa and datab ports are optional. Include them only if the custom instruction
requires input operands. If the custom instruction requires only a single input port,
use dataa.
The processor presents the input data on the dataa and datab ports on the rising
edge of the processor clock. The processor reads the result port on the rising edge
of the following processor clock cycle.
Related Information
Combinational Custom Instruction Ports on page 9
Block diagram showing the dataa, datab, and result ports
9
2. Custom Instruction Hardware Interface
683242 | 2020.04.27
dataa[31..0]
result[31..0]
datab[31..0]
clk Multi-cycle
clk_en done
reset
start
A basic multicycle custom instruction block, with the required ports shown in "Custom
Instruction Types", implements a single custom operation. This operation has a
selection index determined when the instruction is instantiated in the system using
Platform Designer.
You can further optimize multicycle custom instructions by implementing the extended
internal register file, or by creating external interface custom instructions.
Related Information
• Extended Custom Instructions on page 11
• Internal Register File Custom Instructions on page 13
• External Interface Custom Instructions on page 15
• Custom Instruction Types on page 7
List of standard custom instruction hardware ports, to be used as signal types
done Output No Custom instruction logic indicates to the processor that execution is
complete
continued...
10
2. Custom Instruction Hardware Interface
683242 | 2020.04.27
The clk, clk_en, and reset ports are required for multicycle custom instructions.
The start, done, dataa, datab, and result ports are optional. Implement them
only if the custom instruction requires them.
The Nios II system clock feeds the custom logic block’s clk port, and the Nios II
system’s master reset feeds the active high reset port. The reset port is asserted
only when the whole Nios II system is reset.
The custom logic block must treat the active high clk_en port as a conventional clock
qualifier signal, ignoring clk while clk_en is deasserted.
The processor asserts the active high start port on the first clock cycle of the custom
instruction execution. At this time, the dataa and datab ports have valid values and
remain valid throughout the duration of the custom instruction execution. The start
signal is asserted for a single clock cycle.
For a fixed length multicycle custom instruction, after the instruction starts, the
processor waits the specified number of clock cycles, and then reads the value on the
result signal. For an n-cycle operation, the custom logic block must present valid
data on the nth rising edge after the custom instruction begins execution.
For a variable length multicycle custom instruction, the processor waits until the active
high done signal is asserted. The processor reads the result port on the same clock
edge on which done is asserted. The custom logic block must present data on the
result port on the same clock cycle on which it asserts the done signal.
11
2. Custom Instruction Hardware Interface
683242 | 2020.04.27
Extended custom instruction components occupy multiple select indices. The selection
indices are determined when the custom instruction hardware block is instantiated in
the system using Platform Designer.
Extended custom instructions use an extension index to specify which operation the
logic block performs. The extension index can be up to eight bits wide, allowing a
single custom logic block to implement as many as 256 different operations.
The following block diagram shows an extended custom instruction with bit-swap,
byte-swap, and half-word swap operations.
Custom
dataa[31..0] Instruction
bit-swap
operation
0
byte-swap result[31..0]
operation 1
2
half-word-swap
operation
n[1..0]
The custom instruction in the preceding figure performs swap operations on data
received at the dataa port. The instruction hardware uses the two bit wide n port to
select the output from a multiplexer, determining which result is presented to the
result port.
Note: This logic is just a simple example, using a multiplexer on the output. You can
implement function selection based on an extension index in any way that is
appropriate for your application.
12
2. Custom Instruction Hardware Interface
683242 | 2020.04.27
Therefore, if n is <m> bits wide, the extended custom instruction component occupies
2<m> select indices.
For example, the custom instruction illustrated above occupies four indices, because n
is two bits wide. Therefore, when this instruction is implemented in a Nios II system,
256 - 4 = 252 available indices remain.
Related Information
Custom Instruction Assembly Language Interface on page 20
Information about the custom instruction index
The n port timing is the same as that of the dataa port. For example, for an extended
variable multicycle custom instruction, the processor presents the extension index to
the n port on the same rising edge of the clock at which start is asserted, and the n
port remains stable during execution of the custom instruction.
Internal register file access gives you the flexibility to specify whether the custom
instruction reads its operands from the Nios II processor’s register file or from the
custom instruction’s own internal register file. In addition, a custom instruction can
write its results to the local register file rather than to the Nios II processor’s register
file.
Custom instructions containing internal register files use readra, readrb, and
writerc signals to determine if the custom instruction should use the internal
register file or the dataa, datab, and result signals. Ports a, b, and c specify the
internal registers from which to read or to which to write. For example, if readra is
deasserted (specifying a read operation from the internal register), the a signal value
provides the register number in the internal register file. Ports a, b, and c are five bits
each, allowing you to address as many as 32 registers.
13
2. Custom Instruction Hardware Interface
683242 | 2020.04.27
Related Information
Instruction Set Reference
Further details about Nios II custom instruction implementation in the Nios II
Processor Reference Guide
dataa[31..0] result[31..0]
datab[31..0]
Multiplier Adder
D Q
CLR
writerc
This example shows how a custom instruction can access the Nios II internal register
file.
When writerc is deasserted, the Nios II processor ignores the value driven on the
result port. The accumulated value is stored in an internal register. Alternatively, the
processor can read the value on the result port by asserting writerc. At the same
time, the internal register is cleared so that it is ready for a new round of multiply and
accumulate operations.
The following table lists the internal register file custom instruction-specific optional
ports. Use the optional ports only if the custom instruction requires them.
readra Input No If readra is high, Nios II processor register a supplies dataa. If readra
is low, custom instruction logic reads internal register a.
readrb Input No If readrb is high, Nios II processor register b supplies datab. If readrb
is low, custom instruction logic reads internal register b.
writerc Input No If writerc is high, the Nios II processor writes the value on the result
port to register c. If writerc is low, custom instruction logic writes to
internal register c.
a[4:0] Input No Custom instruction internal register number for data source A.
b[4:0] Input No Custom instruction internal register number for data source B.
c[4:0] Input No Custom instruction internal register number for data destination.
14
2. Custom Instruction Hardware Interface
683242 | 2020.04.27
The readra, readrb, writerc, a, b, and c ports behave similarly to dataa. When
the custom instruction begins, the processor presents the new values of the readra,
readrb, writerc, a, b, and c ports on the rising edge of the processor clock. All six
of these ports remain stable during execution of the custom instructions.
To determine how to handle the register file, custom instruction logic reads the active
high readra, readrb, and writerc ports. The logic uses the a, b, and c ports as
register numbrs. When readra or readrb is asserted, the custom instruction logic
ignores the corresponding a or b port, and receives data from the dataa or datab
port. When writerc is asserted, the custom instruction logic ignores the c port and
writes to the result port.
All other custom instruction port operations behave the same as for combinational and
multicycle custom instructions.
At system generation, conduits propagate out to the top level of the Platform Designer
system, where external logic can access the signals. By enabling custom instruction
logic to access memory external to the processor, external interface custom
instructions extend the capabilities of the custom instruction logic.
dataa[31..0]
result[31..0]
datab[31..0]
clk
clk_en done
reset
start
Custom instruction logic can perform various tasks such as storing intermediate
results or reading memory to control the custom instruction operation. The conduit
interface also provides a dedicated path for data to flow into or out of the processor.
For example, custom instruction logic with an external interface can feed data directly
from the processor’s register file to an external first-in first-out (FIFO) memory buffer.
15
683242 | 2020.04.27
Send Feedback
During the build process the Nios II software build tools generate macros that allow
easy access from application code to custom instructions.
The following example shows a portion of the system.h header file that defines a
macro for a bit-swap custom instruction. This bit-swap example accepts one 32 bit
input and performs only one function.
#define ALT_CI_BITSWAP_N 0x00
#define ALT_CI_BITSWAP(A) __builtin_custom_ini(ALT_CI_BITSWAP_N,(A))
The next example illustrates application code that uses the bit-swap custom
instruction.
#include "system.h"
a_swap = ALT_CI_BITSWAP(a);
return 0;
}
The code in this example includes the system.h file to enable the application software
to use the custom instruction macro definition. The example code declares two
integers, a and a_swap. Integer a is passed as input to the bit swap custom
instruction and the results are loaded in a_swap.
The example above illustrates how most applications use custom instructions. The
macros defined by the Nios II software build tools use C integer types only.
Occasionally, applications require input types other than integers. In those cases, you
can use a custom instruction macro to process non-integer return values.
Intel Corporation. All rights reserved. Intel, the Intel logo, and other Intel marks are trademarks of Intel
Corporation or its subsidiaries. Intel warrants performance of its FPGA and semiconductor products to current
specifications in accordance with Intel's standard warranty, but reserves the right to make changes to any ISO
products and services at any time without notice. Intel assumes no responsibility or liability arising out of the 9001:2015
application or use of any information, product, or service described herein except as expressly agreed to in Registered
writing by Intel. Intel customers are advised to obtain the latest version of device specifications before relying
on any published information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
3. Custom Instruction Software Interface
683242 | 2020.04.27
Note: You can define custom macros for Nios II custom instructions that allow other 32 bit
input types to interface with custom instructions.
Related Information
Built-in Functions and User-defined Macros on page 17
More information about the GCC built-in functions
By default, the integer type custom instruction is defined in a system.h file. However,
by using built-in functions, software can use 32 bit non-integer types with custom
instructions. Fifty-two built-in functions are available to accommodate the different
combinations of supported types.
<return type> and <parameter types> represent the input and output types, encoded
as follows:
• i—int
• f—float
• p—void *
• (empty)—void
The following example shows the prototype definitions for two built-in functions.
void __builtin_custom_nf (int n, float dataa);
float __builtin_custom_fnp (int n, void * dataa);
To support non-integer input types, define macros with mnemonic names that map to
the specific built-in function required for the application.
17
3. Custom Instruction Software Interface
683242 | 2020.04.27
16. return 0;
17. }
On lines 2 through 6, the user-defined macros are declared and mapped to the
appropriate built-in functions. The macro UDEF_MACRO1() accepts a float as an
input parameter and does not return anything. The macro UDEF_MACRO2() accepts a
pointer as an input parameter and returns a float. Lines 14 and 15 show code that
uses the two user-defined macros.
Related Information
• GCC, the GNU Compiler Collection
More information about GCC built-in functions
• GCC Floating-point Custom Instruction Support Overview
• GCC Single-precision Floating-point Custom Instruction Command Line
18
3. Custom Instruction Software Interface
683242 | 2020.04.27
19
3. Custom Instruction Software Interface
683242 | 2020.04.27
You designate registers in one of two formats, depending on whether you want the
custom instruction to use a Nios II register or an internal register:
• r<i>—Nios II register <i>
• c<i>—Custom register <i> (internal to the custom instruction component)
The use of r or c controls the readra, readrb, and writerc fields in the the custom
instruction word.
Custom registers are only available with internal register file custom instructions.
Related Information
Custom Instruction Word Format on page 21
Detailed information about instruction fields and register file selection
20
3. Custom Instruction Software Interface
683242 | 2020.04.27
The example above shows a call to a custom instruction with selection index 0. The
input to the instruction is the current contents of the Nios II processor registers r7
and r8, and the results are stored in the Nios II processor register r6.
custom 3, c1, r2, c4
The example above shows a call to a custom instruction with selection index 3. The
input to the instruction is the current contents of the Nios II processor register r2 and
the custom register c4, and the results are stored in custom register c1.
custom 4, r6, c9, r2
The example above shows a call to a custom instruction with selection index 4. The
input to the instruction is the current contents of the custom register c9 and the Nios
II processor register r2, and the results are stored in Nios II processor register r6.
Related Information
custom
More information about the binary format of custom instructions in the Nios II
Processor Reference Guide
The instruction word specifies the 8-bit custom instruction selection index and register
usage.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
A B C N OP=0x32
readra
readrb
writerc
21
3. Custom Instruction Software Interface
683242 | 2020.04.27
The register file selectors determine whether the custom instruction component
accesses Nios II processor registers or custom registers, as follows:
Related Information
• R-Type
Information about R-type instructions in the Nios II Processor Reference Guide
• custom
More information about the binary format of custom instructions in the Nios II
Processor Reference Guide
The Nios II processor supports up to 256 distinct custom instructions through the
custom opcode. A custom instruction component can implement a single instruction,
or multiple instructions.
In the case of a simple (non-extended) custom instruction, the select index is a simple
8-bit value, assigned to the custom instruction block when it is instantiated in Platform
Designer.
7 wn wn -1 0
. . . . . .
N
wn = width of n
Note: Do not confuse N, the selection index field of the custom instruction, with n, the
extension index port. Although n can be 8 bits wide, it generally corresponds to the
low-order bits of N.
Related Information
Extended Custom Instructions on page 11
22
683242 | 2020.04.27
Send Feedback
The CRC algorithm detects the corruption of data during transmission. It detects a
higher percentage of errors than a simple checksum. The CRC calculation consists of
an iterative algorithm involving XOR and shift operations. These operations are carried
out concurrently in hardware and iteratively in software. Because the operations are
carried out concurrently, the execution is much faster in hardware.
The CRC design files demonstrate the steps to implement an extended multicycle Nios
II custom instruction.
Related Information
• Nios II Custom Instruction Design Example
Downloadable design files
• Design Example: Use of custom instruction for the NIOS II processor in Intel®
Cyclone® 10 LP devices.
Intel Corporation. All rights reserved. Intel, the Intel logo, and other Intel marks are trademarks of Intel
Corporation or its subsidiaries. Intel warrants performance of its FPGA and semiconductor products to current
specifications in accordance with Intel's standard warranty, but reserves the right to make changes to any ISO
products and services at any time without notice. Intel assumes no responsibility or liability arising out of the 9001:2015
application or use of any information, product, or service described herein except as expressly agreed to in Registered
writing by Intel. Intel customers are advised to obtain the latest version of device specifications before relying
on any published information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
4. Design Example: Cyclic Redundancy Check
683242 | 2020.04.27
Related Information
• Creating Platform Designer Components
Detailed information about the Platform Designer component editor in the
Quartus Prime Pro Edition Handbook Volume 1: Design and Synthesis
• Creating Platform Designer Components
Detailed information about the Platform Designer component editor in the
Quartus Prime Standard Edition Handbook Volume 1: Design and Synthesis.
Related Information
Nios II Custom Instruction Design Example
Downloadable design files
Before performing this task, you must perform the steps in “Setting up the
Environment for the CRC Example Design”. After performing these steps, you have an
Intel Quartus Prime project located in the <project_dir> directory and open in the
Intel Quartus Prime software.
Related Information
Setting up the Environment for the CRC Example Design on page 24
Instructions for setting up the design environment
24
4. Design Example: Cyclic Redundancy Check
683242 | 2020.04.27
First, make sure that the component editor displays the Component Type tab.
To specify the initial details in the custom instruction parameter editor, follow these
steps:
1. For Name and for Display Name, type CRC.
2. For Version, type 1.0.
3. Leave the Group field blank.
4. Optionally, set the Description, Created by, and Icon fields as you prefer.
25
4. Design Example: Cyclic Redundancy Check
683242 | 2020.04.27
26
4. Design Example: Cyclic Redundancy Check
683242 | 2020.04.27
Note: The Intel Quartus Prime Analysis and Synthesis program checks the design
for errors when you add the files. Confirm that no error message appears.
5. Open the File Attributes dialog box by double-clicking the Attributes column in
the CRC_Custom_Instruction.v line.
27
4. Design Example: Cyclic Redundancy Check
683242 | 2020.04.27
6. In the File Attributes dialog box, turn on the Top-level File attribute, as shown
in the figure above. This attribute indicates that CRC_Custom_Instruction.v is
the top-level HDL file for this custom instruction.
7. Click OK.
Note: The Intel Quartus Prime Analysis and Synthesis program checks the design
for errors when you select a top-level file. Confirm that no error message
appears.
8. Click Analyze Synthesis Files to synthesize the top-level file.
9. To simulate the system with the ModelSim* - Intel FPGA Edition simulator, you can
add your simulation files under Verilog Simulation Files or VHDL Simulation
Files in the in the Files tab.
28
4. Design Example: Cyclic Redundancy Check
683242 | 2020.04.27
The Editable checkbox next to each parameter indicates whether the parameter
will appear in the custom component's parameter editor. By default, all
parameters are editable.
2. To remove a parameter from the custom instruction parameter editor, you can
turn off Editable next to the parameter. For the CRC example, you can leave all
parameters editable.
When Editable is off, the user cannot see or control the parameter, and it is set to
the value in the Default Value column. When Editable is on, the user can control
the parameter value, and it defaults to the value in the Default Value column.
3. To see a preview of the custom component's parameter editor, you can click
Preview the GUI.
29
4. Design Example: Cyclic Redundancy Check
683242 | 2020.04.27
You can specify additional interfaces if your custom instruction logic requires special
interfaces, either to the Avalon®-Memory Mapped fabric or outside the Platform
Designer system. The design example does not require additional interfaces.
Note: Most custom instructions use some combination of standard custom instruction ports,
such as dataa, datab, and result, and do not require additional interfaces.
The following instructions provide the information you need if a custom instruction in
your own design requires additional interfaces. You do not need these steps if you are
implementing the design example.
30
4. Design Example: Cyclic Redundancy Check
683242 | 2020.04.27
31
4. Design Example: Cyclic Redundancy Check
683242 | 2020.04.27
The parameters for Clock Cycle Type automatically change to "Variable" because
the design example builds a variable multicycle type custom instruction. For other
designs, you enter the correct clock cycle type for your custom instruction design:
• "Variable" for a variable multicycle type custom instruction
• "Multicycle" for a fixed multicycle type custom instruction
• "Combinatorial" for a combinational type custom instruction.
If the interface does not include a clk signal, the component editor automatically
infers that the interface is a combinational type interface. If the interface includes
a clk signal, the component editor automatically infers that the interface is a
multicycle interface. If the interface does not include a done signal, the
component editor infers that the interface is a fixed multicycle type interface. If
the interface includes a done signal, the component editor infers that the interface
is a variable multicycle type interface.
Related Information
Custom Instruction Types on page 7
List of standard custom instruction hardware ports, to be used as signal types
32
4. Design Example: Cyclic Redundancy Check
683242 | 2020.04.27
4. Click Add to add the new instruction to the Platform Designer system.
Platform Designer automatically assigns an unused selection index to the new
custom instruction. You can see this index in the System Contents tab, in the
Base column, in the form "Opcode <N>". <N> is represented as a decimal
number. The selection index is exported to system.h when you generate the
system.
5. In the Connections panel, connect the new CRC_0 component’s
nios_custom_instruction_slave interface to the cpu component’s
custom_instruction_master interface.
6. Optional: You can change the custom instruction's selection index in the System
Contents tab. In the Base column across from the custom instruction slave, click
on "Opcode <N>", and type the desired selection index in decimal.
Related Information
• Creating a System with Platform Designer
Detailed information about the Platform Designer Pro component editor in the
Quartus Prime Pro Edition Handbook Volume 1: Design and Synthesis.
• Creating a System with Platform Designer (Standard Edition)
For detailed information about the Platform Designer (Standard) component
editor, refer to "Creating a System with Platform Designer" in the Quartus
Prime Standard Edition Handbook Volume 1: Design and Synthesis.
The downloadable design files include the software source files. The following table
lists the CRC application software source files and their corresponding descriptions.
33
4. Design Example: Cyclic Redundancy Check
683242 | 2020.04.27
crc_main.c Main program that populates random test data, executes the CRC both in software and with the
custom instruction, validates the output, and reports the processing time.
To run the application software, you must create an Executable and Linking Format
File (.elf) first. To create the .elf file, follow the instructions in the "Nios II Software
Build Flow" section in the readme_qsys.txt file in the extracted design files.
The application program runs three implementations of the CRC algorithm on the
same pseudo-random input data: an unoptimized software implementation, an
optimized software implementation, and the custom instruction CRC. The program
calculates the processing time and throughput for each of the versions, to
demonstrate the improved efficiency of a custom instruction compared to a software
implementation.
The output shows that the custom instruction CRC is 68 times faster than the
unoptimized CRC calculated purely in software and is 39 times faster than the
optimized version of the software CRC. The results you see using a different target
device and board may vary depending on the memory characteristics of the board and
the clock speed of the device, but these ratios are representative.
34
4. Design Example: Cyclic Redundancy Check
683242 | 2020.04.27
4.2.1.1. Output of the CRC Design Example Software Run on a Cyclone V E FPGA
Development Kit using the Intel Quartus Prime Software v15.1.
******************************************************************************
Comparison between software and custom instruction CRC32
******************************************************************************
System specification
--------------------
System clock speed = 50 MHz
Number of buffer locations = 32
Size of each buffer = 256 bytes
Speedup ratio
-------------
Custom instruction CRC vs software CRC = 68
Custom instruction CRC vs optimized software CRC = 39
Optimized software CRC vs software CRC = 1
The following example shows the macro that is defined in the ci_crc.c file.
#define CRC_CI_MACRO(n, A) \
__builtin_custom_ini(ALT_CI_CRC_CUSTOM_COMPONENT_0_N + (n & 0x7), (A))
This macro accepts a single int type input operand and returns an int type value.
The CRC custom instruction has extended type; the n value in the macro
CRC_CI_MACRO() indicates the operation to be performed by the custom instruction.
35
4. Design Example: Cyclic Redundancy Check
683242 | 2020.04.27
To initialize the custom instruction, for example, you can add the initialization code in
the following example to your application software.
/* Initialize the custom instruction CRC to the initial remainder value: */
CRC_CI_MACRO (0,0);
For details of each operation of the CRC custom instruction and the corresponding
value of n, refer to the comments in the ci_crc.c file.
The examples above demonstrate that you can define the macro in your application to
accommodate your requirements. For example, you can determine the number and
type of input operands, decide whether to assign a return value, and vary the
extension index value, n. However, the macro definition and usage must be consistent
with the port declarations of the custom instruction. For example, if you define the
macro to return an int value, the custom instruction must have a result port.
Related Information
Custom Instruction Software Interface on page 16
36
683242 | 2020.04.27
Send Feedback
These floating point instructions are implemented as custom instructions. The table
below lists a detailed description of the conformance to the IEEE standards.
Table 8. Hardware Conformance with IEEE 754-1985 and IEEE 754-2008 Floating
Point Standard
Feature Floating Point Hardware Floating Point Hardware 2
Implementation with IEEE Implementation with IEEE
754-1985 754-2008
Intel Corporation. All rights reserved. Intel, the Intel logo, and other Intel marks are trademarks of Intel
Corporation or its subsidiaries. Intel warrants performance of its FPGA and semiconductor products to current
specifications in accordance with Intel's standard warranty, but reserves the right to make changes to any ISO
products and services at any time without notice. Intel assumes no responsibility or liability arising out of the 9001:2015
application or use of any information, product, or service described herein except as expressly agreed to in Registered
writing by Intel. Intel customers are advised to obtain the latest version of device specifications before relying
on any published information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
5. Introduction to Nios® II Floating Point Custom Instructions
683242 | 2020.04.27
Exception Invalid operation Result is Not a Number (NaN) Result is Not a Number (NaN)
conditions
Division by zero Result is ±infinity Result is ±infinity
Status flags Not implemented. IEEE 754-1985 Not implemented. IEEE 754-2008
exception conditions are detected and exception conditions are detected and
handled as described elsewhere in this handled as described elsewhere in this
table. table.(3)
38
5. Introduction to Nios® II Floating Point Custom Instructions
683242 | 2020.04.27
Note: The FPH2 component also supports faithful rounding, which is not an IEEE 754-defined
rounding mode. Faithful rounding rounds results to either the upper or lower nearest
single-precision numbers. Therefore, the result produced is one of two possible values
and the choice between the two is not defined. The maximum error of faithful
rounding is 1 unit in the last place (ulp). Errors may not be evenly distributed.
FRAC Fraction Specifies the fractional portion (right of the binary point) of the mantissa.
The integer value (left of the binary point) is always assumed to be 1 for
normal values so it is omitted. This omitted value is called the hidden bit.
The mantissa ranges from ≥1.0 to <2.0.
EXP Biased Exponent Contains the exponent biased by the value 127. The biased exponent
value 0x0 is reserved for zero and subnormal values. The biased exponent
value 0xff is reserved for infinity and NaN. The biased exponent ranges
from 1 to 0xfe for normal numbers (-126 to 127 when the bias is
subtracted out).
S Sign Specifies the sign. 1 = negative, 0 = positive. Normal values, zero, infinity,
and subnormals are all signed. NaN has no sign, so the S field is ignored.
Note: Zero, subnormal, and infinity are signed as specified by the S field. NaN has no sign so
the S field is ignored.
39
5. Introduction to Nios® II Floating Point Custom Instructions
683242 | 2020.04.27
The table below shows how single-precision values are encoded across the 32-bit
range from 0x0000_0000 to 0xffff_ffff. Single-precision floating point numbers
have the following characteristics:
• Precision (ρ) = 24 bits (23 bits in FRAC plus one hidden bit)
• Radix (β) = 2
• emin = -126
• emax = 127
The most-significant bit of FRAC is 0 for signaling NaNs (sNaN) and 1 for quiet NaNs
(qNaN).
1.40129846e–45 (βemin-ρ+1 =
0x0000_0001 min pos subnormal 0 0x00 0x00_0001
2-126-24+1 = 2-149)
40
5. Introduction to Nios® II Floating Point Custom Instructions
683242 | 2020.04.27
The IEEE 754-2008 standard defines the default rounding mode to be “Round-to-
Nearest RoundTiesToEven”. In the IEEE 754-1985 standard, this is called “Round-to-
Nearest-Even”. Both standards also define additional rounding modes called “Round-
to-Zero”, “Round-to-Negative-Infinity”, and “Round-to-Positive-Infinity”. The IEEE
754-2008 standard introduced a new optional rounding mode called “Round-to-
Nearest RoundTiesAway”.
Nearest Rounding has a maximum error of one-half ULP. Errors are not evenly
distributed, because nearest rounding chooses the upper number more often than the
lower number when results are randomly distributed.
Truncation Rounding has a maximum error of one ULP. Errors are not evenly
distributed.
Faithful Rounding has a maximum error of one ULP. Errors are not guaranteed to be
evenly distributed.
41
5. Introduction to Nios® II Floating Point Custom Instructions
683242 | 2020.04.27
The table below lists the results of some IEEE 754 special cases. The x represents a
normal value. The FPH2 are compliant for all of these cases.
Results are assumed to be correctly signed so signs are omitted when they are not
important. When the sign is relevant, signs are shown with extra parenthesis around
the value such as (+∞). The value x in the table represents any non-NaN value.
Comparisons ignore the sign of zero for equality. For example, (-0) == (+0) and (+0)
≤ (-0). Comparisons that don’t include equality, like > and <, don’t consider -0 to be
less than +0. Comparisons return false if either or both inputs are NaN. The min and
max operations return the non-NaN input if one of their inputs is NaN and the other is
non-NaN. Other operations that produce floating point results return NaN if any or all
of their inputs are NaN.
42
5. Introduction to Nios® II Floating Point Custom Instructions
683242 | 2020.04.27
43
683242 | 2020.04.27
Send Feedback
The FPH2 component provides low cycle count implementations of add, sub, multiply,
and divide operations, and custom instruction implementations of additional floating
point operations.
The FPH2 component is the preferred floating point implementation for the Nios II
processor. Intel recommends FPH2 rather than the legacy FPH1 because it provides
better performance and a smaller device footprint.
You should compile newlib from source code with individual –mcustom-<operation>
options, selected to match your hardware configuration. This allows newlib to
incorporate the benefits of all FPH2 operations that can be inferred by GCC. If you use
the Nios II software build tools, the BSP generator takes care of this for you.
Intel Corporation. All rights reserved. Intel, the Intel logo, and other Intel marks are trademarks of Intel
Corporation or its subsidiaries. Intel warrants performance of its FPGA and semiconductor products to current
specifications in accordance with Intel's standard warranty, but reserves the right to make changes to any ISO
products and services at any time without notice. Intel assumes no responsibility or liability arising out of the 9001:2015
application or use of any information, product, or service described herein except as expressly agreed to in Registered
writing by Intel. Intel customers are advised to obtain the latest version of device specifications before relying
on any published information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
6. Nios II Floating Point Hardware 2 Component
683242 | 2020.04.27
• Supports FPH1 operations (add, sub, multiply, divide) and adds support for square
root, comparisons, integer conversions, minimum, maximum, negate, and
absolute
• Single-precision floating point values are stored in the Nios II general purpose
registers
• VHDL only
• Platform Designer support only
• Single-precision only
• Optimized for FPGAs with 4-input LEs and 18-bit multipliers
• GCC and Nios II SBT (Software Build Tools) software support
• IEEE 754-2008 compliant except for:
— Simplified rounding
— Simplified NaN handling
— No exceptions
— No status flags
— Subnormal supported on a subset of operations
• Binary-compatibility with FPH1
— FPH1 implements Round-To-Nearest rounding. Because FPH2 implements
different rounding, results might be subtly different between the two
generations
• Resource consumption in a typical system:
— Approximately 2500 4-input LEs
— Nine 9-bit multipliers
— Three M9K memories or larger
continued...
(4) These names match the names of the corresponding GCC command-line options except for
round, which GCC does not support.
(5) Specifies the 8 bit fixed custom instruction for the operation.
45
6. Nios II Floating Point Hardware 2 Component
683242 | 2020.04.27
The cycles column specifies the number of cycles required to execute the instruction.
A combinatorial custom instruction takes 1 cycle. A multi-cycle custom instruction
requires at least 2 cycles. An N-cycle multi-cycle custom instruction has N - 2 register
stages inside the custom instruction because the Nios II processor registers the result
from the custom instruction and allows another cycle for g wire delays in the source
operand bypass multiplexers. The number of cycles does not include the extra cycles
(maximum of 2) that an instruction following the multi-cycle custom instruction is
stalled by the Nios II/f if the instruction uses the result within 2 cycles. These extra
cycles occur because multi-cycle instructions are late result instructions
The Nios II Software Build Tools (SBT) include software support for the FPH2
component. When the FPH2 component is present in hardware, the Nios II compiler
compiles the software codes to use the custom instructions for floating point
operations.
(4) These names match the names of the corresponding GCC command-line options except for
round, which GCC does not support.
(5) Specifies the 8 bit fixed custom instruction for the operation.
(6) Nios II GCC version 4.7.3 is not able to reliably replace calls to newlib floating point functions
with the equivalent custom instruction even though it has Flush to 0 -mcustom-
<operation> command-line options and pragma support for these operations. Instead, the
custom instruction must be invoked directly using the GCC __builtin_custom_* facility.
The FPH2 component includes a C header file that provides the required #define macros to
invoke the custom instruction directly.
46
6. Nios II Floating Point Hardware 2 Component
683242 | 2020.04.27
FPH2 operations are compliant with the IEEE 754-2008 standard, except for the
following:
• No traps/exceptions.
• No status flags.
• Remainder and conversions between binary and decimal operations are not
supported. These are provided by the software emulation library.
• No support for round-to-nearest-even mode. Nearest Rounding, Truncation
Rounding, or Faithful Rounding is used, depending on the operator.
• Subnormals are not supported by the add, subtract, multiply, divide, and square
root operations. Subnormal inputs are treated as signed zero and subnormal
outputs are never created (result is signed zero instead). This treatment of
subnormal values called flush-to-zero.(7)
• Subnormals cannot be created by the integer2float conversion operation. This
behavior is IEEE 754 compliant.
• No distinction between signaling and quiet NaNs as input operands. Any result that
produces a NaN may produce either a signaling or quiet NaN.
• A NaN result with one or more NaN input operands is not guaranteed to return any
of the input NaN values; the NaN result can be a different NaN than the input
NaNs.
The FPH2 component does not support exceptions. Instead, it creates a specific result.
The following table shows the FPH2 results created for operations that would trigger
an IEEE 754 exception.
Invalid NaN
(7) Subnormals are supported by comparison, minimum, maximum, float-to-integer, negate, and
absolute operations, so these operations are IEEE 754-2008 compliant.
47
6. Nios II Floating Point Hardware 2 Component
683242 | 2020.04.27
floatis 250 4 int_to_float(a) Does not apply Does not apply Casting
48
6. Nios II Floating Point Hardware 2 Component
683242 | 2020.04.27
Related Information
• Nios II Processer Reference Guide
• Rounding Schemes on page 41
• -mcustom-<operation> on page 53
• C Macros for round(), fmins(), and fmaxs() on page 58
• GCC Command Line Options
• Newlib Documentation page
• GCC Floating-point Custom Instruction Support Overview
• GCC Single-precision Floating-point Custom Instruction Command Line
The FPH2 component editor, shown in the figure below, allows you to selectively
enable any of several groups of floating point custom instructions. By default, all
instructions are enabled.
49
6. Nios II Floating Point Hardware 2 Component
683242 | 2020.04.27
In most cases, you should leave all floating point custom instructions enabled.
However, for the MAX 10 device family in certain configurations, you might need to
disable the Roots group.
MAX 10 devices cannot support the FPH2 square root instruction in the following
configurations:
• Dual configuration mode
• Compressed configuration mode
• External RAM initialization disabled
The square root instruction uses a lookup table, requiring initialization that the MAX
10 cannot support in these configurations. Turn off the Roots option if you are
targeting a MAX 10 device in one of these configurations.
When you disable one of the floating point instruction groups, software must
implement the functions in that group (in this case, square root) if they are required.
The BSP generator automatically creates this support. Refer to "Building the FPH2
Example Software" for details.
The figure below shows Platform Designer with Nios II connected to the FPH2. The
FPH2 has two slaves (s1 and s2). One slave is for the combinatorial custom instruction
and the other slave is for the multi-cycle custom instruction. Connect both slaves to
the Nios II custom_instruction_master by clicking the dot in the connections patch
panel. The following figure shows how the connection should look.
50
6. Nios II Floating Point Hardware 2 Component
683242 | 2020.04.27
The example in the figure above targets a MAX 10 device. Note the warning message,
reminding you that there could be an issue with RAM initialization for the square root
function.
After connecting the FPH2 to the Nios II, generate your system in Platform Designer
as you normally would. Then use the Intel Quartus Prime software to compile the
generated RTL, or use an RTL simulator, like ModelSim - Intel FPGA Edition, to perform
simulations.
Note: If you use the Nios II software build tools (SBT) to create your software projects, the
BSP generator creates a custom newlib library for your floating point hardware. If you
modify your floating point hardware configuration, you must regenerate and rebuild
your BSP to ensure that newlib is built correctly. For details, refer to "Building the
FPH2 Example Software".
Related Information
• Building the FPH2 Example Software on page 51
• Quartus Prime Standard Edition Handbook Volume 1: Design and Synthesis
• Nios II FPH2 and the Newlib Library on page 57
The Software Build Tools (SBT) are used to create Intel HAL-based Board Support
Packages (BSP) and application and library makefiles for embedded software running
on a Nios II. These tools come in command-line and Eclipse GUI-based forms.
When these tools are used to generate a BSP for a Nios II with the FPH2 component
connected to that Nios II, the sw.tcl file in the component causes the BSP and any
applications or libraries that use that BSP to be aware of the presence of the FPH2. In
particular, sw.tcl performs the following functions:
51
6. Nios II Floating Point Hardware 2 Component
683242 | 2020.04.27
• Examines the system you created in Platform Designer, and determines the
correct GCC flags for your floating point hardware.
• Creates makefile rules to pass the -mcustom-<operation> options to GCC, so it
knows to use the available FPH2 operations instead of the software emulation code
to implement the specified floating point operations.
• Creates makefile rules to pass the -fno-math-errno option to GCC, to eliminate
the overhead of detecting NaN results and setting the errno variable for calls to
sqrtf().
• Adds #define macro declarations to system.h for the newlib math library
routines that GCC does not reliably replace with custom instructions. For more
information, refer to "C Macros for round(), fmins(), and fmaxs()".
• Creates makefile rules to generate a correct version of newlib. Uses the GCC flags
determined from your hardware system.
Note: If you modify your floating point hardware configuration, you must regenerate and
rebuild your BSP to ensure that newlib is built correctly.
Related Information
• C Macros for round(), fmins(), and fmaxs() on page 58
• Floating Point Hardware 2 Operations on page 47
• Nios II Software Developer's Handbook
• GCC Floating-point Custom Instruction Support Overview
Note: GCC does not infer newlib math functions. These functions can be replaced with their
equivalent custom instruction using the __builtin_custom_* facility of GCC.
The system.h header file provides a C #define macro declaration that redefines the
required newlib math functions to use the corresponding custom instruction instead.
Related Information
• Floating Point Hardware 2 Operations on page 47
• Newlib Documentation page
The FPH2 component does not provide functions for conversion between unsigned
integer types and floating point. When converting between unsigned integer types and
float types, the compiler implements software emulation. Therefore conversion to and
from unsigned integers is much slower than conversion to and from signed integers.
52
6. Nios II Floating Point Hardware 2 Component
683242 | 2020.04.27
If you do not need the extra range of positive values obtained when converting a float
to an unsigned integer directly, you can use the FPH2 and avoid using the software
emulation if you modify your C code to first cast the float type to an int type or long
type and then cast to the desired unsigned integer type.
use:
float f;
unsigned int s = (unsigned int)(int)f; // FPH2
The FPH2 provides two operations for converting single-precision floating point values
to signed integer values:
• fixsi
• round
The fixsi operation performs truncation when converting a float to a signed integer.
For example, fixsi converts 4.8 to 4 and -1.5 to -1. GCC follows the C standard and
invokes the fixsi operation whenever source code uses a cast or any time that C
automatically converts a float to a signed integer.
6.6.3.1. -mcustom-<operation>
N custom instruction value, an unsigned decimal. For a complete list of the operations
and their N values, refer to the table in "Floating Point Hardware 2 Operations".
By default, the compiler implements all floating point operations in software. You can
also specify software emulation for an individual instruction with the -mno-custom-
<operation> command-line option.
Note: The command line can specify multiple -mcustom- switches. If there is a conflict, the
last switch on the command line takes effect.
53
6. Nios II Floating Point Hardware 2 Component
683242 | 2020.04.27
The following command-line options should be passed to GCC to instruct it to use all
operations provided by the FPH2 that can be inferred by GCC. For more information,
refer to "FPH2 and Nios II GCC".
For users of the Nios II SBT, these command-line arguments are automatically added
to the invocation of GCC by the generated makefiles. For more information, refer to
"Building the FPH2 Example Software".
-mcustom-fabss=224
-mcustom-fnegs=225
-mcustom-fcmpnes=226
-mcustom-fcmpeqs=227
-mcustom-fcmpges=228
-mcustom-fcmpgts=229
-mcustom-fcmples=230
-mcustom-fcmplts=231
-mcustom-fmins=232
-mcustom-fmaxs=233
-mcustom-round=248
-mcustom-fixsi=249
-mcustom-floatis=250
-mcustom-fmuls=252
-mcustom-fadds=253
-mcustom-fsubs=254
-mcustom-fdivs=255
Related Information
• FPH2 and Nios II GCC on page 52
• Floating Point Hardware 2 Operations on page 47
• Building the FPH2 Example Software on page 51
GCC supports pragmas located in source code files to override the -mcustom
command-line options. The pragmas affect the entire source file.
The following pragma tells GCC to call custom instruction N (where N is a decimal
integer from 0 to 255) to implement the specified floating point operation:
#pragma GCC target(“custom-<operation>=N”)
The following pragma tells GCC to use the software emulation instead of the custom
instruction to implement the specified floating point operation:
#pragma GCC targer(“no-custom-<operation>”)
6.6.3.3. -mcustom-fpu-cfg
If you specify the -mcustom-fpu-cfg option on the GCC linker command line, it
chooses a precompiled newlib library with floating point support. The precompiled
libraries only use operations (add, subtract, multiply, and divide) supported by FPH1.
Note: With FPH2, Intel does not recommend using the -mcustom-fpu-cfg option.
54
6. Nios II Floating Point Hardware 2 Component
683242 | 2020.04.27
Related Information
• Newlib Documentation page
• Nios II FPH2 and the Newlib Library on page 57
• "Nios II Options" in GCC Command Options (gcc.gnu.org)
6.7.1. -fno-math-errno
From the GCC documentation:
“Do not set ERRNO after calling math functions that are executed with a single
instruction, e.g., sqrt. A program that relies on IEEE exceptions for math error
handling may want to use this flag for speed while maintaining IEEE arithmetic
compatibility.”
If you specify -fno-math-errno on the GCC command line, the compiler maps calls
to sqrtf() directly to the fsqrts custom instruction. Otherwise, by default GCC
adds several instructions after the fsqrts custom instruction to check for a NaN
result, indicating an attempt to take the square root of a negative number. If fsqrts
returns NaN, the code calls the newlib sqrtf() function to set the C errno variable.
Typically, this overhead is undesirable. Intel recommends that you enable -fno-
math-errno to eliminate the overhead of calling sqrtf().
If you use the Nios II SBT, the generated makefiles set -fno-math-errno by default.
You can override this behavior by setting -fmath-errno in the CPPFLAGS make
variable.
The -ffinite-math-only option also eliminates the overhead of checking for NaN
result for square root. However, this option also has other effects. Refer to "-ffinite-
math-only" for details about this option.
Related Information
• Building the FPH2 Example Software on page 51
• -ffinite-math-only on page 56
• Newlib Documentation page
6.7.2. -fsingle-precision-constant
55
6. Nios II Floating Point Hardware 2 Component
683242 | 2020.04.27
For FPH2, the Nios II SBT omits -fsingle-precision-constant from the makefile
GCC command line by default. This behavior contrasts with SBT support for FPH1,
which sets this option with -mcustom-fpu-cfg. The SBT does not use -fsingle-
precision-constant for FPH2 because it can cause problems for double-precision
code.
You can enable -fsingle-precision-constant if you are sure it will not cause
problems for your code. In general, it is better to cast floating point constants to the
float type, or use the 'f' suffix (for example 3.14f), because these approaches are
localized and independent of compiler options.
Related Information
Building the FPH2 Example Software on page 51
6.7.3. -funsafe-math-optimizations
From the GCC documentation:
“Allow optimizations for floating-point arithmetic that (a) assume that arguments
and results are valid and (b) may violate IEEE or ANSI standards. When used at
link-time, it may include libraries or startup files that change the default FPU
control word or other similar optimizations.”
This option would be required if the floating point hardware implemented the
transcendental functions. GCC requires this option to ensure that application code
does not inadvertently use hardware accelerators that might be problematic.
6.7.4. -ffinite-math-only
From the GCC documentation:
“Allow optimizations for floating-point arithmetic that assume that arguments and
results are not NaNs or +-Infs.”
The -ffinite-math-only option also eliminates the GCC overhead created on calls
to sqrtf() like –fno-math-errno.
Related Information
-fno-math-errno on page 55
6.7.5. -fno-trapping-math
56
6. Nios II Floating Point Hardware 2 Component
683242 | 2020.04.27
6.7.6. -frounding-math
From the GCC documentation:
You should compile newlib from source code with individual –mcustom-<operation>
options, selected to match your hardware configuration. This allows newlib to
incorporate the benefits of all FPH2 operations that can be inferred by GCC. If you use
the Nios II software build tools, the BSP generator takes care of this for you.
The newlib fmaxf() and fminf() functions return the maximum or minimum
numeric value of their arguments. NaN arguments are treated as missing data: if one
argument is a NaN and the other numeric, then the functions return the numeric
value. The FPH2 fmaxs() and fmins() operations match this behavior.
Note: If you modify your floating point hardware configuration, you must regenerate and
rebuild your BSP to ensure that newlib is built correctly. For details, refer to "Building
the FPH2 Example Software".
Related Information
• Building the FPH2 Example Software on page 51
• -mcustom-<operation> on page 53
• Nios II FPH2 Pragmas on page 54
• Newlib Documentation page
57
6. Nios II Floating Point Hardware 2 Component
683242 | 2020.04.27
Related Information
• GCC Command Line Options
• Newlib Documentation page
• Built-in Functions and User-defined Macros on page 17
58
683242 | 2020.04.27
Send Feedback
Note: The FPH1 component is obsolete, starting Intel Quartus Prime software version 18.1.
When the FPH1 custom instructions are present in your target hardware, the Nios II
Software Build Tools (SBT) for Eclipse compile your code to use the custom
instructions for floating point operations, including the four primitive arithmetic
operations (addition, subtraction, multiplication and division) and the newlib math
library.
Note: For optimum performance and device footprint, Intel recommends using FPH2 rather
than FPH1.
The FPH1 parameter editor allows you to omit the floating point division hardware for
cases in which code running on your hardware design does not make heavy use of
floating point division. When you omit the floating point divide instruction, the Nios II
compiler implements floating point division in software.
Related Information
• Nios II Hardware Development Tutorial
How to define, generate, and compile Nios II systems
• Getting Started with the Graphical User Interface
Learn about Nios II software projects in the Nios II Software Developer's
Handbook
• Nios II Floating Point Hardware 2 Component on page 44
Intel Corporation. All rights reserved. Intel, the Intel logo, and other Intel marks are trademarks of Intel
Corporation or its subsidiaries. Intel warrants performance of its FPGA and semiconductor products to current
specifications in accordance with Intel's standard warranty, but reserves the right to make changes to any ISO
products and services at any time without notice. Intel assumes no responsibility or liability arising out of the 9001:2015
application or use of any information, product, or service described herein except as expressly agreed to in Registered
writing by Intel. Intel customers are advised to obtain the latest version of device specifications before relying
on any published information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
7. Nios II Floating Point Hardware (FPH1) Component
683242 | 2020.04.27
Intel provides several working Nios II reference designs which you can use as a
starting point for your own designs. After installing the Nios II EDS, refer to the <Nios
II EDS install path> /examples/verilog or the <Nios II EDS install path> /
examples/vhdl directory. Demonstration applications are also available in newer
development kit installations.
Related Information
Instantiating the Nios II Processor chapter of the Nios II Processor Reference Guide
60
7. Nios II Floating Point Hardware (FPH1) Component
683242 | 2020.04.27
Related Information
• Nios II FPH1 and the Newlib Library on page 63
• Adding FPH1 to the Design and Configuring the Device on page 60
61
7. Nios II Floating Point Hardware (FPH1) Component
683242 | 2020.04.27
3. Analyze the results report for each test. In each report, the FP CI <instruction>
entry lists the performance of the custom instruction, and the FP SW
<instruction> entry lists the performance of the software implementation. The
Time (sec) and Time (clock) columns represent the aggregate time spent
executing the floating point operations, in seconds and in Nios II clock cycles.
Total Time represents the duration of the test, expressed both in seconds and
in Nios II clock cycles. The % column represents the time spent executing the
floating point operation, as a percentage of the test total.
Note: You might have different speed results, depending on your target hardware
and on the actual values of the random operands.
The software uses the Nios II performance counter component to collect timing
information on the floating point operations. For more information, refer to the
Performance Counter Core chapter in volume 5 of the Intel Quartus Prime
Handbook.
Related Information
Embedded Peripherals IP User Guide
Refer to the Performance Counter Core chapter of the Embedded Peripherals IP
User Guide
62
7. Nios II Floating Point Hardware (FPH1) Component
683242 | 2020.04.27
The following pragmas direct the Nios II compiler to ignore the FPH1 custom
instructions and generate software implementations:
• #pragma no_custom_fadds—forces software implementation of floating point
add
• #pragma no_custom_fsubs—forces software implementation of floating point
subtract
• #pragma no_custom_fmuls—forces software implementation of floating point
multiply
• #pragma no_custom_fdivs—forces software implementation of floating point
divide
You can compile newlib from source code with options selected to match a specific
hardware configuration.
Intel recommends using FPH2, which provides better performance and a lower
footprint than FPH1.
Before using the FPH1 custom instructions, consider the following questions:
• Have you identified your performance bottlenecks? Make sure your performance
issues are caused by floating point arithmetic before you try to fix them with
floating point acceleration.
• Can you use integer arithmetic? While the FPH1 custom instructions are faster
than software-implemented floating point, they are slower than integer arithmetic.
A common integer technique is to represent numerical values with an implicit
scaling factor. As a simple example, if you are calculating milliamperes, you might
represent your values internally as microamperes.
• Are you taking full advantage of compiler optimization? You can increase the
Nios II compiler optimization level through the Properties dialog box of your
Nios II application and BSP projects.
• Have you hand-optimized your mathematical operations? Numerical analysis
textbooks offer simple, effective techniques for performing accurate calculations
with the minimum number of floating point operations.
63
7. Nios II Floating Point Hardware (FPH1) Component
683242 | 2020.04.27
If you have followed these suggestions, and you need further acceleration, the floating
point custom instructions are an appropriate solution.
Related Information
• Nios II Floating Point Hardware 2 Component on page 44
• AN391: Profiling Nios II Systems
Detailed information about Nios II performance profiling
• Reducing Code Footprint in Embedded Systems
Information about using compiler optimization in the Nios II Software
Developer’s Handbook .
In some cases, you can rewrite your code to minimize or even eliminate divide
operations. For example, if your algorithm requires division by a constant value, you
can precalculate its inverse and use a multiply operation in the speed-critical section
of your code.
The table below indicates which math library functions use floating point, and of those,
which use floating point division. If a function uses floating point, it runs faster with
floating point hardware. If a function uses floating point division, it runs even faster
with floating point division hardware.
cos() Yes No
sin() Yes No
frexp() Yes No
ldexp() Yes No
64
7. Nios II Floating Point Hardware (FPH1) Component
683242 | 2020.04.27
modf() Yes No
ceil() Yes No
fabs() No No
floor() Yes No
When you omit the FPH1 divide instruction, the Nios II SBT for Eclipse implements
floating point division in software.
65
683242 | 2020.04.27
Send Feedback
2020.04.27 Added:
• Link to design example for Intel Cyclone® 10 LP devices.
• Note about the FPH1 component being obsolete.
• Clarification about using the FPH2 component with newlib.
May 2007 1.4 Add title and core version number to page footers.
Intel Corporation. All rights reserved. Intel, the Intel logo, and other Intel marks are trademarks of Intel
Corporation or its subsidiaries. Intel warrants performance of its FPGA and semiconductor products to current
specifications in accordance with Intel's standard warranty, but reserves the right to make changes to any ISO
products and services at any time without notice. Intel assumes no responsibility or liability arising out of the 9001:2015
application or use of any information, product, or service described herein except as expressly agreed to in Registered
writing by Intel. Intel customers are advised to obtain the latest version of device specifications before relying
on any published information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.