The Rocc Doc V2: An Introduction To The Rocket Custom Coprocessor Interface
The Rocc Doc V2: An Introduction To The Rocket Custom Coprocessor Interface
The Rocc Doc V2: An Introduction To The Rocket Custom Coprocessor Interface
Goal
The RoCC interface enables the integration of custom coprocessors or accelerators to RISC-V
cores. The purpose of the document is to understand the RoCC interface signals from the
perspective of designing a RISC-V compatible accelerator in Verilog.
The default RoCC interface signals may be classified into the following groups of signals
1. Core control (CC): for co-ordination between an accelerator and Rocket core
2. Register mode (Core): for exchange of data between an accelerator and Rocket core
3. Memory mode (Mem): for communication between an accelerator and L1-D Cache
The extended RoCC interface can provide for the following groups of signals
1. Uncached Tile Link (UTL): for communication between an accelerator & L2 memory
2. Floating Point Unit (FPU): for an accelerator to send and receive data from an FPU
3. Control Status Register (CSR): used by Linux on the core to recognize the accelerator
4. Page Table Walker (PTW): for address translation from an accelerator
The system-level diagram in Figure 1 shows the RoCC interface for an accelerator. We prefix
acronyms assigned above to subgroups in verilog signal names through rest of the document.
RoCC Interface Description
The signal names for both the default and the extended RoCC interfaces are elaborated from
the verilog generated from an example Chisel-based accelerator. These verilog interface names
would be used by HLS tools to automatically generate the Verilog code for accelerators;
optionally a Verilog interface that employs structs could be used as well.
Please note that signal directions & descriptions are from the accelerator’s perspective.
The Register mode signals are composed of the RoCC Command and Response subgroups. All
the signal names are prefixed with “core_” to denote their source.
The RoCC Command signals are used by the core to send instructions to the accelerator and
are driven directly by the RoCC Instruction of the RISC-V ISA. The RoCC instruction is shown in
Figure 2 which indicates bit widths and positions above and below the respective fields within
the instruction. Table 2 contains a description of signals in the RoCC Command subgroup.
An RoCC response to the command (if expected) is sent by the accelerator using the response
interface. The response signals are described in Table 3.
31 25 24 20 19 15 14 13 12 T 6 0
output core_cmd_ready_o 0
Control lines
input core_cmd_valid_i 0
input core_resp_ready_i 0
Control lines
output core_resp_valid_o 0
The memory interface is composed of Memory request and Memory response subgroups. All
the signal names are prefixed with “mem_” to denote the group they belong to.
Memory requests from the accelerator are sent out using signal described in Table 4.
Memory responses to the accelerator are passed through interface described in Table 5. There
is a response for both load and store requests. Usually an accelerator may use just the tag and
data fields from the response. Please note that there is no ready signal from the accelerator to
acknowledge the acceptance of memory responses, which means its is expected that
accelerator should be ready to accept data in every cycle.
input mem_req_ready_i 0
Control lines
output mem_req_valid_o 0
The UTL group is composed of 2 subgroups namely aUTL which is arbitrated uncached tile link
signals and UTL signals.
aUTL is arbitrated with the I-cache link to L2 memory system. The aUTL signals composed of
AUTL acquire and grant subgroups as listed in Table 6 and 7 r espectively are prefixed with
“autl_” to denote the group they belong to.
The UTL signals are a vector of signals of the same type as aUTL. The size of the UTL vector is
equal to the number of independent memory channels, which is configurable in the rocket core.
However these signals are yet to be added in the document.
input autl_acquire_ready_i
output autl_acquire_valid_o
output[25:0] autl_acquire_bits_addr_block_o
output[2:0] autl_acquire_bits_client_xact_id_o
output[1:0] autl_acquire_bits_addr_beat_o
output autl_acquire_bits_is_builtin_type_o
output[2:0] autl_acquire_bits_a_type_o
output[16:0] autl_acquire_bits_union_o
output[127:0] autl_acquire_bits_data_o
output autl_grant_ready_o
input autl_grant_valid_i
input autl_grant_bits_is_builtin_type_i
This interface may be used by the accelerator if it has a floating point unit attached to it. All the
signal names are prefixed with “fpu_” to denote the group they belong to. The interface is
composed of FP request and FP response sub-groups as listed in Table 8 and Table 9
respectively.
input fpu_req_ready_i
output fpu_req_valid_o
output[4:0] fpu_req_bits_cmd_o
output fpu_req_bits_ldst_o
output fpu_req_bits_wen_o
output fpu_req_bits_ren1_o
output fpu_req_bits_ren2_o
output fpu_req_bits_ren3_o
output fpu_req_bits_swap12_o
output fpu_req_bits_swap23_o
output fpu_req_bits_single_o
output fpu_req_bits_fromint_o
output fpu_req_bits_toint_o
output fpu_req_bits_fastpipe_o
output fpu_req_bits_fma_o
output fpu_req_bits_div_o
output fpu_req_bits_sqrt_o
output fpu_req_bits_round_o
output fpu_req_bits_wflags_o
output[2:0] fpu_req_bits_rm_o
output[1:0] fpu_req_bits_typ_o
output[64:0] fpu_req_bits_in1_o
output[64:0] fpu_req_bits_in2_o
output[64:0] fpu_req_bits_in3_o
output fpu_resp_ready_o
input fpu_resp_valid_i
Control status registers may be used optionally within the accelerator. If they exist, the interface
signals in Table 10 can be used by the operating system on the core to memory map the
accelerator. Its mechanism is to be added and will be discussed in later versions of the
document. All the signal names are prefixed with “csr_” to denote the group they belong to.
input csr_wen_i
The page table walker signals are used for address translation from the accelerator (if the core
isn’t already handling it). These signals will be added as a part of future work. All the signal
names will be prefixed with “ptw_” to denote the group they belong to. Please stay tuned for
updates.
Resources
For more details about any information in the document, please do check some useful
resources of the Berkeley RoCC tutorials which were referred while making this document:
● http://www-inst.eecs.berkeley.edu/~cs250/sp16/disc/Disc02.pdf
● http://www-inst.eecs.berkeley.edu/~cs250/sp16/disc/Disc05.pdf
● http://www-inst.eecs.berkeley.edu/~cs250/sp16/assignments/lab4.pdf
● http://www-inst.eecs.berkeley.edu/~cs250/fa13/handouts/lab3-sumaccel.pdf
Moreover, there is an example that integrates the Secure Hashing Algorithm (SHA) accelerator
with the rocket core and is also tested against sample programs. The SHA repository (with fixes
by UCSD) and the rocket implementation are listed below. Follow the instructions in the readme
for simulations. We will discuss this in greater detail in the following versions of the document.
● https://github.com/ucb-bar/rocc-template
● https://github.com/ucb-bar/rocket-chip
Conclusion
This document will be helpful in designing an accelerator that is compatible with the RoCC
interface. In upcoming versions of the document, we will provide a tutorial on integrating this
accelerator to the core and actually running test programs with custom instructions to use the
accelerator.