Ocp Accelerator Module Design Specification - v1p1
Ocp Accelerator Module Design Specification - v1p1
Ocp Accelerator Module Design Specification - v1p1
Specification v1.1
Author:
Usage of this Specification is governed by the terms and conditions set forth in the Open Web Foundation Final Specification
Agreement (“OWFa 1.0”).
Note: The following clarifications, which distinguish technology licensed in the Contribution License and/or Specification
License from those technologies merely referenced (but not licensed), were accepted by the Incubation Committee of the OCP:
None.
NOTWITHSTANDING THE FOREGOING LICENSES, THIS SPECIFICATION IS PROVIDED BY OCP "AS IS" AND OCP EXPRESSLY
DISCLAIMS ANY WARRANTIES (EXPRESS, IMPLIED, OR OTHERWISE), INCLUDING IMPLIED WARRANTIES OF MERCHANTABILITY,
NON-INFRINGEMENT, FITNESS FOR A PARTICULAR PURPOSE, OR TITLE, RELATED TO THE SPECIFICATION. NOTICE IS HEREBY
GIVEN, THAT OTHER RIGHTS NOT GRANTED AS SET FORTH ABOVE, INCLUDING WITHOUT LIMITATION, RIGHTS OF THIRD
PARTIES WHO DID NOT EXECUTE THE ABOVE LICENSES, MAY BE IMPLICATED BY THE IMPLEMENTATION OF OR COMPLIANCE
WITH THIS SPECIFICATION. OCP IS NOT RESPONSIBLE FOR IDENTIFYING RIGHTS FOR WHICH A LICENSE MAY BE REQUIRED IN
ORDER TO IMPLEMENT THIS SPECIFICATION. THE ENTIRE RISK AS TO IMPLEMENTING OR OTHERWISE USING THE
SPECIFICATION IS ASSUMED BY YOU. IN NO EVENT WILL OCP BE LIABLE TO YOU FOR ANY MONETARY DAMAGES WITH RESPECT
TO ANY CLAIMS RELATED TO, OR ARISING OUT OF YOUR USE OF THIS SPECIFICATION, INCLUDING BUT NOT LIMITED TO ANY
LIABILITY FOR LOST PROFITS OR ANY CONSEQUENTIAL, INCIDENTAL, INDIRECT, SPECIAL OR PUNITIVE DAMAGES OF ANY
CHARACTER FROM ANY CAUSES OF ACTION OF ANY KIND WITH RESPECT TO THIS SPECIFICATION, WHETHER BASED ON BREACH
OF CONTRACT, TORT (INCLUDING NEGLIGENCE), OR OTHERWISE, AND EVEN IF OCP HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
A special acknowledgement to Molex LLC for their cross-functional support as it pertains to the Mirror
Mezz connector and its implementation into this module specification.
We also want to acknowledge the community’s great support for specification. After we first time talked
the common form factor accelerator module during the OCP server group monthly call and HPC group
monthly call in November 2018, we got a lot great feedback from the community, for us to enhance the
specification can be adopted by wider community.
To take advantage of the available industry-standard form factors to reduce the required time and effort
in producing suitable solutions, various implementations have selected PCIe CEM form factor as a quick
market entry.
Such solutions are not optimized for the upcoming AI workloads which require ever-growing bandwidth
and interconnect flexibility for data/model parallelism.
The state-of-the-art applications require multiple cards in a system with multiple inter-card links running
at highspeed interconnect bandwidth between cards.
Using PCIe CEM form factor to meet such interconnect requirement poses several challenges such as
excessive signal insertion loss from ASIC to PCIe connectors and on baseboard; inter-card cabling
complexity reducing robustness and serviceability; and limits the supported inter-ASIC topologies.
To enable flexible high-speed interconnect topologies for multi-ASIC solutions, this base specification
outlines an interoperable, modular hierarchy based on OAM form factor (OCP Accelerator Module), an
interconnect Baseboard, a Tray, and a Chassis.
Based on this base specification, various design and product implementations may maintain
interoperability while offering enhancements in each hierarchy level.
The OAM form factor facilitates scalability across accelerators by simplifying the system solution when
interconnecting communication links among modules in comparison with a PCIe Add-in card form
factor.
4.2 Acronyms
Acronym Definition
ASIC Application Specific Integrated Circuit
OAM OCP Accelerator Module
BGA Ball Grid Array
BMC Baseboard Management Controller
TDP Thermal Design Power
EDP Excursion Design Power
GPU Graphic Processing Unit
MPN Manufacturing Part Number
DXF Drawing eXchange Format
PCBA Printed Circuit Board Assembly
One or two x16 host link. E.g. PCIe Gen3/4/5 x16, or alternate
Host Interface
protocols.
Module to Module Up to 7 Links per module, each link has up to X16-X20 lanes
Interconnect Links Each link may be able to be configured to sub links.
Please refer to 2D DXF and 3D files for further details. 2D DXF and 3D files are included with the
contribution package as well as relevant reference drawings to mechanical components. Please note
that some features on the OAM are called out as required, but others are included merely for reference.
Example die
Top stiffener
The mate and unmate forces provided in the product specification are conservative. The specific
209311-1115 connector that the OAM uses has mate/unmate forces more in line with those found in
Table 1 Mate/Unmate Averaged Data for Molex Mirror Mezz 209311-1115 and in Figure 7 Measured
Mate Force per Pin for Molex Mirror Mezz 209311-1115. Note that the mate force per pin trends
upwards for initial 5 cycles before settling back towards the average of 0.21N/pin.
However, an equivalent spring shall have a spring constant of at least 70N/mm, and a compression of at
least 2.5mm. Inner diameter shall be 4.2mm and outer diameter shall be 7.8mm. These springs fit into
8mm diameter counterbores of 4mm depth in the bottom stiffener. Installation method is using glue
(3M DP810 or equivalent), applied with maximum thickness of 0.1mm.
Table 2 Spring constant and free length of die springs, shown compared to cycle count
Stage 1: Notch in top of heatsink providing visual guidance and orientation reference. Reference design
is shown with 1mm clearances (plastic top is 103mm with a 0.5mm bumper on each side of the module).
Stage 2: Alignment pins, two 3mm pins from the OAM into two 3.6mm SMT nuts on baseboard.
Figure 17 Side view (exploded) showing alignment pins being received by 1mm tall SMT nuts
Stage 3: Connector housing built-in engagement (Molex Mirror Mezz gatherability: 0.76mm).
Figure 18 Side view (exploded) showing mezzanine connectors doing final alignment
The below figure shows the reference model of heatsink with OAM assembly.
Notch
Top handle
Reference heatsink
Mounting screws
Baseboard PCB
In addition, the module should be able to remain unaffected at non-operational storage temperature
range of -20°C to 85°C.
Lower temperature limit, non-critical temperature limit, and critical temperature limit should be defined
for those temperature sensors to support throttling or shutdown features.
Before ASIC or Memory temperature readings reach throttling thresholds, they will be
maintained below the temperature limits.
When any ASIC or Memory temperature reading reaches a throttling threshold but not the
hardware shutdown limit, these components will remain functional to support reduced
functionality of the module.
Only one replaceable heatsink assembly (primary heatsink) is needed for the module, which can
be swapped in field.
The other heatsink parts (i.e. secondary heatsinks) and thermal interface materials will come
with the module, and do not need replacement over the module lifetime.
Reliability test reports will be provided to validate lifetime of the thermal interface materials. Shock
and Vibration test reports will be provided to validate robustness of the module assembly.
For operation at altitude, the same air temperature difference of 22°F is recommended.
For a single OAM that is shadowed by other components, the airflow/power ratio is calculated
with airflow through its heatsink, and the module power
For an OAM shadowing other components or multiple OAMs in serial, this calculation uses the
airflow through the flow channel, and the sum of the power of OAM modules and upstream
components.
For OAM modules with power lower than 300W, an airflow/power ratio of 0.1 CFM/W or lower
is usually achievable and recommended.
Performance of the reference heatsink is provided in Figure 23, the thermal resistance of which is
calculated based on:
𝑅 = ,
Die size and power density plays an important role in the thermal performance of OAM module. As a
general guidance, this chart provides curves of three different die (heater) sizes. Each product can make
preliminary estimation by referring to the curve with closest size.
If applicable, significant improvement can be achieved by implementing vapor chamber to assist heat
spreading in the base. The performance of Reference heatsink design V2 with vapor chamber base is
provided as follows:
Package size also have significant impact on the cooling capability of OAM modules. Figure 26 provides
the airflow needs of single OAM module at given approaching temperature, case temperature target,
thermal interface material and die powers. Beyond 120CFM more airflow towards OAM brings
diminishing return, which limits the max OAM power supported. This can be also used to estimate
cooling capability of system design and fan trays.
For a reference OAM in a typical platform with 8x OAMs, shadowing layout, it is observed that the
maximum module power that air cooling can support is approximately 450W. Beyond this power limit,
advanced cooling solutions are recommended to support its operation at the hotter part of the
operational boundary condition range. These advanced cooling solutions would also be recommended
for extended environment boundary conditions. Note that this limit may vary for different products,
depending on die size, power distribution, and junction temperature limits.
Open loop liquid cooling is one of the feasible cooling solutions to support modules of higher power. To
support typical open loop liquid cooling modules designed for a 1RU (height = 44.45mm) system, it is
recommended that OAM vendors limit the maximum distance from the lower surface of bottom
stiffener to the top surface of the die (ASIC/HBM) to within 13mm.
Figure 28 An Example of Open Loop Liquid Cooling setup concept for OAM
A typical open loop liquid cooling setup (cold plate) for the OAM may include the following parts:
With a proper coolant supply, open loop liquid cooling has the potential of delivering surface-to-coolant
thermal resistance lower than 0.05°C/W. However, it would require liquid supply and control systems to
be established as part of the data center infrastructure.
We recommend the mounting pressure range to be 30 ~ 60 psi for OAM with bare die packages. For
engineering samples without enough assembly yield rate learnings, we recommend starting with an
initial mounting pressure of 15~30 psi. For lid-covered OAM packages, the mounting pressure is yet to
be explored.
Maximum warpage of the package should not exceed 0.2mm. This could potentially lead to an average
bond line thickness of 0.1mm for the TIM. Varying for different die sizes, TIM could easily contribute
0.01 ⁰C/W ~ 0.08 ⁰C/W, up to 50% of total thermal resistance:
X16 X16
x16
SERDES 1 SERDES 7
SERDES B
x16 x16
x16
SERDES 2 SERDES 6
SERDES C
x16 x16
SERDES 3 SERDES 5
Host, x16
Power SERDES 4
Note:
1) When catastrophic thermal event (THERMTRIP#) occurred, OAM should shutdown all its on
board power rails.
2) Baseboard should do the following:
a. Turn off all input clocks
b. Tristate all OAM input GPIO
c. Turn off all input power rails to OAM
ASIC
Mezzanine Module
CONNECTOR #2 CONNECTOR #1
The OAM baseboard supplies power to the module through the Mirror Mezz Connector0 power pins.
There are 3 power rails defined in this document to accommodate both 12V and 48V (or 54V) modules.
The current capability and power status are as the table below. The power is available on state S0 only.
Only five P12V power pins are mandatory when the supply power is 48V (16 pins), and the rest of the
P12V pins can be NC. When the baseboard supply power is 12V, P48V can be NC. The baseboard can
supply all 3 power rails and supports both 12V and 48V modules.
EDP Duration
2x TDP <= 20µs
1.6x TDP <=2ms
1.5x TDP <=5ms
1.2x TDP <=10ms
1.1x TDP <=20ms
P12V
P3V3
HOST_PWRGD
PVREF
MODULE_PWRGD
Reference Clocks
PERST#
WARMRST# (Optional)
Notes:
1) If the OAMs with the baseboard are in the disaggregated design from the host system, the
HOST_PWRGD is the baseboard power good indication signal.
2) All voltages on the baseboard that OAM plugs into must be within specification before
HOST_PWRGD is asserted.
3) HOST PWRGD is the enable signal to the voltage regulators on the OAM.
4) As the voltage planes on the module ramp up, the reference clocks from the baseboard will
begin to run.
5) After all the voltages on the module are within specification, the module asserts
MODULE_PWRGD to the baseboard.
6) Baseboard should tristate all OAMs single ended input signals prior to MODULE_PWRGD being
asserted. Note that input signals required for power sequence should be driven accordingly.
7) At least 100ms after MODULE_PWRGD assertion, the baseboard will de-assert the PCIe reset
signal(PERST#) to the module.
8) The optional WARMRST# signal de-asserts at the same time or later than the PERST# signal is
de-asserted.
Example:
Example:
9.1 Module ID
The following figure shows the MODULE_ID[4:0] strapping for physical orientation of modules when 8
interconnected OAMs are used.
Detail port to port assignment is based on system placement and routing length. Module to module
interconnect may decrease to 4 ports if the module only supports 4. Module to module interconnect link
may only utilize 8 lanes if the module defines 8 lanes per link.
The following Figure shows the required MODULE_ID[4:0] assignments when only 4 modules are
connected as two rows of two.
Figure 42 Topology Example for Modules with 3/4/6 ports – Hybrid Cube Mesh
o 6 x16 links Chordal Ring (Almost Fully Connected) using links: 1, 2, 3, 4, 5, and 6
And here is routing suggestion: total 4 layer, two layers for TX two layers for RX. Port 4/6 are connected
through cables.
Here is the routing suggestion for 7X16 links fully connected topology:
Link 1,4,6,7 has total 16 lanes and link 2,3,5 has total 8 lanes in Figure 50:
Below figure shows the detail port mapping and routing guide:
Port 4/6(both 4L/6L and 4H/6H), port 5L, 7H are used for 6X8 HCM. This is how it’s embedded to this
combined topology:
9.2.6 4D Hypercube
As one single PCB cannot fit all 16 modules, this topology interconnect will have cable or backplane to
connect between PCBs. Depending on whether one single PCB holds 4 modules or 8 modules, the
interconnect path may be different. The system integrator may discuss with the module supplier for
details. The module defines two QSFPDD ports as the potential scale up solution.
Figure 53 4D Hypercube
9.3 LINK_CONFIG[4:0]
The 5 link configuration strapping bits are pulled up on modules that use them. These bits are strapped
to ground on the baseboard to select logic 0, or left floating on the baseboard to select logic 1. Some
OAMs use these LINK_CONFIG[4:0] strapping bits to determine the interconnect topology for the links
between modules and to determine the protocol of the “P” Link.
LINK_CONFIG[4:0] Definition
00000 Reserved for OAM. Test use by OAM Vendor.
xxxx0 (except for 00000) Indicates the “P” link is PCIe
01000 6 link HCM, 4 link HCM, and two 3 link fully
connected quads as connected in Figure 41.
00110 7 x16 fully connected
01010 6 x 16 link Chordal Ring (Almost Fully Connected)
as connected in Figure 44.
01011 6 x 16 link Chordal Ring (Almost Fully Connected)
using alternate host interface protocol as
connected in Figure 44.
*When MCTP over SMBus is used, the BMC shall support both master and slave modes.
**The master I2C/SMBus physical interface is required for scale out, baseboard FRU access .
The OAM communicates with the Baseboard Management Controller (BMC) by using:
SMBus:
o SMBus supporting 1MHz mode is preferred
o Standard Intelligence Platform Management Bus (IPMB) and Intelligent Platform
Management Interface (IPMI) commands.
o SMBus ARP protocol
o Management Component Transport Protocol (MCTP) over SMBus binding
(DMTF DSP0237)
UART (optional)
o Shall support 115200 Baud Rate
o For serial console access
JTAG
o For register dump, memory dump, debug access
PCIe (optional)
o To support MCTP over PCIe binding
summarizes the sensors reporting list. The report from these sensors improves the system
monitoring/management and allows the baseboard management device to access key components on
the module. It is recommended that the voltage/current/power sensor reporting accuracy is within ± 2%
and the temperature reporting accuracy is within ±3°C.
OAM Module shall support Sensor discovery via IPMI or Platform Level Data Model (PLDM) for Platform
Monitoring and Control (DMTF DSP0248). MC will follow mechanisms specified in DSP0248 to discover
sensors supported by OAM module and their threshold.
For error reporting, OAM shall support Platform Level Data Model (PLDM) for Platform Monitoring and
Control (DMTF DSP0248) which enables module to define state sensors and events for health
monitoring and reporting.
OAM supplier shall provide in band FW update utility to perform firmware update.
For out-of-band firmware update, OAM shall support Platform Level Data Model (PLDM) for firmware
update over MCTP as specified in DMTF DSP0267.
Update failure and failure types shall be communicated to BMC as specified in DSP0267.
FRU shall be accessed through Platform Level Data Model (PLDM) for FRU Data, follow DMTF DSP0257.
11.7 IO Calibration
System shall be able to get DDR/PCIe/interconnect training status and margin information. For in-band
access, specific tools and/or API shall be provided by OAM supplier.
For further details on implementation and detailed set of requirements, please refer to the OCP
Hardware Secure Boot document for more details.
11.8.2 Recovery
OAM must support recovery mechanism to restore the mutable firmware code to a state of integrity in
the event that any such firmware code or critical data are detected to have been corrupted or when
forced to recover through authorized mechanism.
OAM should also support two mutable firmware (active and recovery) regions where the
recovery firmware is the previously known good image
In case of failure of redundant copies, OAM should support recovery over sideband
interface.
o OAM shall support Platform Level Data Model (PLDM) firmware update over MCTP
as specified in DMTF DSP0267.
o Update failure and failure types shall be communicated to BMC as specified in
DSP0267
o Firmware updated out-of-band should still follow the boot-time verification process
of secure boot
For further details on implementation and detailed set of requirements, please refer to the OCP
Recovery document for more details.
11.8.4 Attestation
It is critical to dynamically verify the firmware running on the device and the device itself
cryptographically. This helps establish trust with the device. It is recommended to support device
attestation for OAM.
Here is a short list of generic requirements to support attestation for OAM:
Keys, seeds, and device identifiers
Provisioning Facility (Initial Provisioning Environment Operations and Equipment)
Device Ownership Provisioning
Authentication, Attestation, and Enrollment protocol
Measurement collection and storage
For further details on implementation and detailed set of requirements, please refer to the OCP
Attestation for System Components v1.0 for more details.
12.2 Regulation
The vendor needs to provide CB reports of the OAM. These documents are needed to have rack level CE.
The OAM should be compliant with RoHS and WEEE. The PCB should have a UL 94V-0 certificate.
Whitney Zhao Add power profiles, power sequence requirement Internal 2/4/2019
Siamak Tavallaei Update license information, overview release
Ben Wei Modify chapter 11 and update OAM management 1.1 6/30/2020
Jubin Mehta and security requirements.
Yuval Itkin
Whitney Zhao